python資料探勘入門與實踐--------電離層（Ionosphere）, scikit-learn估計器，K近鄰分類器，交叉檢驗，設定引數

阿新 • • 發佈：2018-12-19

ionosphere.data下載地址：http://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/

import numpy as np
import csv
data_filename="D:\\python27\\study\\code\\Chapter2\\ionosphere.data"
#初始化接受資料的陣列

X = np.zeros( (351, 34),dtype='float')
y = np.zeros((351,),dtype='bool')

#讀取檔案資訊
with open(data_filename,'r') as data:
reader = csv.reader(data)
for i, row in enumerate(reader): # 通過列舉函式獲得每行的索引號
X[i] = [ float(datum) for datum in row[:-1] ] # 獲取每一個個體的前34個值
y[i] = row[-1] == 'g' #把g轉換為0，1
from sklearn.cross_validation import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=14)

from sklearn.neighbors import KNeighborsClassifier # 匯入K近鄰分類器，並初始化一個例項
estimator = KNeighborsClassifier()

estimator.fit(X_train,y_train)
y_preditcted = estimator.predict(X_test)

accuracy = np.mean(y_test == y_preditcted) * 100
print("準確率:",accuracy)

from sklearn.cross_validation import cross_val_score # 交叉檢驗

scores = cross_val_score(estimator,X,y,scoring='accuracy')
avg_accuracy = np.mean(scores) * 100
print("平均準確率：",avg_accuracy)

# 設定引數，增強演算法的泛化能力，調整近鄰數量
avg_scores = []
all_scores = []

num_size = list(range(1,21)) # 包括20

for n_neighbors in num_size:
esimator = KNeighborsClassifier(n_neighbors=n_neighbors)
scores = cross_val_score(esimator,X,y,scoring='accuracy')
avg_scores.append(np.mean(scores))
all_scores.append(scores)

%matplotlib inline

import matplotlib.pyplot as plt

plt.plot(num_size,avg_scores,'-o',linewidth=5, markersize=12)

#隨著近鄰的增加準確率不斷的下降

python資料探勘入門與實踐--------電離層（Ionosphere）, scikit-learn估計器，K近鄰分類器，交叉檢驗，設定引數

python資料探勘入門與實踐--------電離層（Ionosphere）, scikit-learn估計器，K近鄰分類器，交叉檢驗，設定引數

分享《Python資料探勘入門與實踐》高清中文版+高清英文版+原始碼

python資料探勘入門與實踐----------特徵值，主成分分析

python資料探勘入門與實踐-----------通過親和力分析推薦電影（Apriori）

python資料探勘入門與實踐--------轉換器（資料與處理）與流水線

Python資料探勘入門與實踐--用轉換器抽取特徵

Python資料探勘入門與實踐---用決策樹預測獲勝球隊

《python資料探勘入門與實踐》筆記1

《python資料探勘》和《python資料探勘入門與實踐》兩本書讀後感

Python資料探勘入門與實踐------鳶尾花分類

Python資料探勘入門與實戰:第一章

python資料探勘入門與實戰——學習筆記（第3、4章）

python資料探勘實戰筆記——文字挖掘（1）：語料庫構建

python資料探勘實戰筆記——文字挖掘（4）：詞雲繪製

python資料探勘實戰筆記——文字分析（6）：關鍵詞提取

基於R的資料探勘方法與實踐（3）——決策樹分析

基於R的資料探勘方法與實踐（1）——資料準備

基於R的資料探勘方法與實踐（2）——關聯規則

《資料探勘導論》讀書筆記（一）—— 緒論

資料探勘基礎導論學習筆記（五）

python資料探勘入門與實踐--------電離層（Ionosphere）, scikit-learn估計器，K近鄰分類器，交叉檢驗，設定引數

相關推薦