1. 程式人生 > >日誌分析代碼實現(字符串切割)

日誌分析代碼實現(字符串切割)

日誌

日誌分析代碼實現(字符串切割)

  • 思路

        不使用正則表達式處理:
            進行字符串切割
            將[]和"括起的內容特殊處理
            將每段數據轉換為對應格式
            代碼精簡,代碼效率檢查

import datetime


# 目標日誌
logline = ‘‘‘183.60.212.153 - - [19/Feb/2013:10:23:29 +0800] \
"GET /o2o/media.html?menu=3 HTTP/1.1" 200 16691 "-" \
"Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)"‘‘‘


clean_log = logline.split()
# list
#[‘183.60.212.153‘, ‘-‘, ‘-‘, ‘[19/Feb/2013:10:23:29‘, ‘+0800]‘,\
# ‘"GET‘, ‘/o2o/media.html?menu=3‘, ‘HTTP/1.1"‘, ‘200‘, ‘16691‘, \
# ‘"-"‘, ‘"Mozilla/5.0‘, ‘(compatible;‘, ‘EasouSpider;‘, ‘+http://www.easou.com/search/spider.html)"‘]


# 轉換時間格式
def convert_time(time:str):
return datetime.datetime.strptime(time, ‘%d/%b/%Y:%H:%M:%S %z‘)

# 將request字符串切分為三段
def convert_request(request:str):
return dict(zip((‘method‘,‘url‘,‘protocol‘),request.split()))

# 給予對應字段名
names = [
‘remote‘,‘‘,‘‘,‘time‘,
‘request‘,‘status‘,‘size‘,‘‘,
‘useragent‘
]

# 處理對應字段名的函數
operations = [
None,None,None,convert_time,
convert_request,int,int,None,
None
]

# 切割字符串為合適格式
def log_clean(line:str,ret=None):
if ret:
ret = []
tmp = ‘‘
flag = False
for word in line.split():
if word.startswith(‘[‘) or word.startswith(‘"‘):
tmp = word.strip(‘["‘)
if word.endswith(‘"‘) or word.endswith(‘]‘):
ret.append(tmp)
flag = False
continue
flag = True
continue

if flag:
tmp += ‘ ‘ + word
if word.endswith(‘"‘) or word.endswith(‘]‘):
ret.append(tmp.strip(‘"]‘))
flag = False
continue
else:
ret.append(word)


# 遍歷處理後日誌,根據對應字段,進行對應處理後再保存至新字典中
ret_d = {}
log_clean(logline)
for i, field in enumerate(ret):
key = names[i]
if operations[i]:
ret_d[key] = operations[i](field)
else:
ret_d[key] = field
print(ret_d)


本文出自 “12064120” 博客,請務必保留此出處http://12074120.blog.51cto.com/12064120/1980427

日誌分析代碼實現(字符串切割)