學習筆記(四):使用K近鄰演算法檢測WebShell
阿新 • • 發佈:2018-11-10
1.資料蒐集
載入ADFA-LD中正常樣本資料:
def load_adfa_training_files(rootdir): x=[] y=[] list = os.listdir(rootdir) for i in range(0, len(list)): path=os.path.join(rootdir,list[i]) if os.path.isfile(path): x.append(load_one_file(path)) y.append(0) return x,y
定義遍歷目錄下檔案的函式:
def dirlist(path, allfile): filelist = os.listdir(path) for filename in filelist: filepath = os.path.join(path,filename) if os.path.isdir(filepath): dirlist(filepath,allfile) else: allfile.append(filepath) return allfile
從攻擊資料集中篩選出和WebShell相關的資料:
def load_adfa_webshell_files(rootdir):
x=[]
y=[]
allfile=dirlist(rootdir,[])
for file in allfile:
if re.match(r" ..",file):
x.append(load_one_file(file))
y.append(1)
return x,y
2.特徵化
x1,y1 = load_adfa_training_file("...") x2,y2 = load_adfa_webshell_files("...") x = x1+x2 y = y1+y2 vectorizer = CountVectorizer(min_df=1) x = vectorizer.fit_transform(x) x = x.toarray()
3。訓練樣本與效果驗證與(三)一樣