1. 程式人生 > >python腳本分析nginx訪問日誌

python腳本分析nginx訪問日誌

char data uri Coding php utf8 客戶 read oot

日誌格式如下:

223.74.135.248 - - [11/May/2017:11:19:47 +0800] "POST /login/getValidateCode HTTP/1.1" 404 14227 "http://www.yidianchina.com/login/getValidateCode" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"

分別是IP,訪問時間,請求方法,請求URI,HTTP協議,響應狀態碼,響應體大小,referer,客戶瀏覽器。

除了HTTP協議不用截取,其他的都匹配後,存儲到數據庫,以備後續分析。

#!/usr/bin/python
# -*- coding:utf-8 -*- import re import datetime import time import MySQLdb as mdb import json import urllib import sys log = "/root/access_" + (datetime.datetime.now() - datetime.timedelta(days=1)).strftime(%Y-%m-%d) + ".log" line = open(log,r) con = mdb.connect(localhost,‘‘,‘‘,database,charset="
utf8") cur = con.cursor() try: for i in line: matchObj = re.match(r(.*) \[(.*)\] \"(.*) (\/.*) (.*)\" (.*) (.*) (.*) \"(.*)\" \"(.*)\", i, re.I) if matchObj != None: ip = matchObj.group(1) API = "http://ip.taobao.com/service/getIpInfo.php?ip=" + ip jsondata
= json.loads(urllib.urlopen(API).read()) address = jsondata[data][country] + jsondata[data][region] + jsondata[data][city] + jsondata[data][isp] time = matchObj.group(2) method = matchObj.group(3) request = matchObj.group(4) status = int(matchObj.group(6)) bytesSent = int(matchObj.group(7)) request_time = float(matchObj.group(8)) refer = matchObj.group(9) agent = matchObj.group(10) cur.execute(insert into nginx_access_log values("%s","%s","%s","%s","%s",%d,%d,%f,"%s","%s") % (ip,address,time,method,request,status,bytesSent,request_time,refer,agent)) finally: line.close() cur.close()

python腳本分析nginx訪問日誌