1. 程式人生 > >《機器學習實戰》筆記 第二章(2)

《機器學習實戰》筆記 第二章(2)

《機器學習實戰》筆記 第二章 (2)

2.2 約會網站配對

程式碼實現

這裡原作者給出的資料集的標籤不是int,實現程式碼的時候,出現了問題,給出兩種解決方案。以下是書上的原始碼

#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
    fr = open(filename)
arrayOlines = fr.readlines() #得到檔案的行數 numberOfLines = len(arrayOlines) #建立返回Numpy的矩陣 returnMat = zeros((numberOfLines,3)) classLabelVector = [] index = 0 #解析檔案資料到列表 for line in arrayOlines: line = line.strip() listFromLine = line.split('\t') returnMat[
index,:] = listFromLine[0:3] classLabelVector.append(int(listFromLine[-1])) index += 1 return returnMat,classLabelVector

解決方案①

替換classLabelVector.append(int(listFromLine[-1]))

#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines(
) #得到檔案的行數 numberOfLines = len(arrayOlines) #建立返回Numpy的矩陣 returnMat = zeros((numberOfLines,3)) classLabelVector = [] index = 0 #解析檔案資料到列表 for line in arrayOlines: line = line.strip() listFromLine = line.split('\t') returnMat[index,:] = listFromLine[0:3] if listFromLine[-1] == 'did_not_Like': classLabelVector.append(1) elif listFromLine[-1] == 'small_Doses': classLabelVector.append(2) elif listFromLine[-1] == 'large_Doses': classLabelVector.append(3) index += 1 return returnMat,classLabelVector

注意,Python2可直接輸入reload()
但Python3必須先import importlib匯入!
在ipython下

>>>import importlib
>>>importlib.reload(kNN)
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet.txt')

解決方案②

程式碼同書

#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    #得到檔案的行數
    numberOfLines = len(arrayOlines)
    #建立返回Numpy的矩陣
    returnMat = zeros((numberOfLines,3))
    classLabelVector = []
    index = 0
    #解析檔案資料到列表
    for line in arrayOlines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

在ipython下引用把標籤格式改為int的datingTestSet2.txt

>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet2.txt')

輸出datingDataMat和datingLabels

In[1]: datingDataMat
Out[1]: 
array([[4.0920000e+04, 8.3269760e+00, 9.5395200e-01],
       [1.4488000e+04, 7.1534690e+00, 1.6739040e+00],
       [2.6052000e+04, 1.4418710e+00, 8.0512400e-01],
       ...,
       [2.6575000e+04, 1.0650102e+01, 8.6662700e-01],
       [4.8111000e+04, 9.1345280e+00, 7.2804500e-01],
       [4.3757000e+04, 7.8826010e+00, 1.3324460e+00]])
In[2]: datingLabels[0:20]
Out[2]: [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]

建立散點圖

需要匯入matplotlib來建立散點圖

import matplotlib
import matplotlib.pyplot as plt

開始構圖

fig=plt.figure()  
ax=fig.add_subplot(111)  
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*array(datingLabels),15.0*array(datingLabels))   
plt.show()

結果如圖所示,圖中橫軸表示玩視訊遊戲所耗時間百分比,豎軸表示每週所消費的冰淇淋公升數
橫軸表示玩視訊遊戲所耗時間百分比,豎軸表示每週所消費的冰淇淋公升數
特別提醒:如果把書上程式碼classLabelVector.append(int(listFromLine[-1]))改為classLabelVector.append(listFromLine[-1])會發生無法預料的錯誤,建議採用本文所訴的兩種解題方式

歸一化資料

書上程式碼無誤建議手寫一遍

def autoNorm(dataSet):
    minVals = dataSet.min(0)
    maxVals = dataSet.max(0)
    ranges = maxVals - minVals
    normDataSet = zeros(shape(dataSet))
    m = dataSet.shape[0]
    normDataSet = dataSet - tile(minVals, (m,1))
    #特徵值相除
    normDataSet = normDataSet/tile(ranges, (m,1))
    return normDataSet, ranges, minVals

作為完整程式驗證分類器

原始碼

def datingClassTest():
    hoRatio = 0.10
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio)
    errorCount = 0.0
    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:], \
                                     datingLabels[numTestVecs:m],3)
        print("the classfier came back with: %d,the real answer is : %d" \
                                     % (classifierResult,datingLabels[i]))
        if (classifierResult != datingLabels[i]): errorCount += 1.0
    print("the total error rate is: %f" % (errorCount/float(numTestVecs)))

轉到ipython

In[1]: import kNN

In[2]: datingDataMat,datingLabels = kNN.file2matrix('datingTestSet2.txt')

In[3]: normMat, ranges, minVals = kNN.autoNorm(datingDataMat)

In[4]: normMat
Out[4]: 
array([[0.44832535, 0.39805139, 0.56233353],
       [0.15873259, 0.34195467, 0.98724416],
       [0.28542943, 0.06892523, 0.47449629],
       ...,
       [0.29115949, 0.50910294, 0.51079493],
       [0.52711097, 0.43665451, 0.4290048 ],
       [0.47940793, 0.3768091 , 0.78571804]])

In[5]: ranges
Out[5]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])

In[6]: minVals
Out[6]: array([0.      , 0.      , 0.001156])

構建完整系統

原始碼

def classifyPerson():
    resultList = ['not at all','in small doses','in large doses']
    percentTats = float(input("percentage of time spent playing video games?"))
    ffMiles = float(input("frequent flier miles earned per year?"))
    iceCream = float(input("liters of ice cream consumed per years?"))
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    inArr = array([ffMiles, percentTats, iceCream])
    classifierResult = classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
    print("You will probably like this person: ",resultList[classifierResult - 1])

轉到ipython

In[1]: import kNN

In[2]: kNN.datingClassTest()
Out[2]: 
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the re