1. 程式人生 > >機器學習之決策樹例項篇

機器學習之決策樹例項篇

1. python 2.  Python機器學習的庫:scikit-learn       2.1: 特性:
簡單高效的資料探勘和機器學習分析 對所有使用者開放,根據不同需求高度可重用性 基於Numpy, SciPy和matplotlib 開源,商用級別:獲得 BSD許可
     2.2 覆蓋問題領域:           分類(classification), 迴歸(regression), 聚類(clustering), 降維(dimensionality reduction)           模型選擇(model selection), 預處理(preprocessing) 3. 使用用scikit-learn      安裝scikit-learn: pip, easy_install, windows installer      安裝必要package:numpy, SciPy和matplotlib, 可使用Anaconda (包含numpy, scipy等科學計算常用      package)      安裝注意問題:Python直譯器版本(2.7 or 3.4?), 32-bit or 64-bit系統
4. 例子:           具體程式碼如下:
from sklearn.feature_extraction import DictVectorizer
import csv
from sklearn import tree
from sklearn import preprocessing

# 從csv中讀取資料
allElectronicsData = open(r'D:\BaiduNetdiskDownload\程式碼與素材\程式碼與素材(1)\01DTree\AllElectronics.csv', 'rt')
reader = csv.reader(allElectronicsData)
headers = next(reader)
# print(headers)

# 把特徵資料以字典的形式讀取到featureList,把標籤資料儲存到labelList
featureList = []
labelList = []
for row in reader:
    labelList.append(row[len(row)-1])
    rowDict = {}
    for i in range(1, len(row)-1):
        rowDict[headers[i]] = row[i]
    featureList.append(rowDict)
# print(featureList)
# print(labelList)

# Vetorize features
vec = DictVectorizer()
dummyX = vec.fit_transform(featureList) .toarray()
print("dummyX: " + str(dummyX))
print(vec.get_feature_names())
print("labelList: " + str(labelList))

# vectorize class labels
lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)
print("dummyY: " + str(dummyY))

# Using decision tree for classification
clf = tree.DecisionTreeClassifier(criterion='entropy')
clf = clf.fit(dummyX, dummyY)
print("clf: " + str(clf))


# Visualize model
with open("allElectronicInformationGainOri.dot", 'w') as f:
    f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f)

# 構造一行資料
oneRowX = dummyX[0, :]
print("oneRowX: " + str(oneRowX))
newRowX = oneRowX
newRowX[0] = 1
newRowX[2] = 0
print("newRowX: " + str(newRowX))
# 預測
predictedY = clf.predict(newRowX)
print("predictedY: " + str(predictedY))

      配置環境變數       轉化dot檔案至pdf視覺化決策樹:dot -Tpdf iris.dot -o outpu.pdf