1. 程式人生 > >k-近鄰演算法改進約會網站的配對效果

k-近鄰演算法改進約會網站的配對效果

在上一篇的基礎上增加如下程式碼:

'''
將文字記錄轉換到NumPy的解析程式
輸入為檔名字串
輸出為訓練樣本矩陣和類標籤向量
'''
def file2matrix(filename):
    fr = open(filename)
    arrayOLine = fr.readlines()  
    numberOfLines = len(arrayOLine)  #得到文字行數
    returnMat = zeros((numberOfLines, 3)) #建立以0填充的NumPy矩陣
    '''
    解析文字資料到列表,文字資料有4列,分別表示
    每年獲得的飛行常客里程數
    玩視訊遊戲所消耗的時間百分比
    每週消費的冰淇淋公升數
    標籤,以整型表示:不喜歡的人,魅力一般的人,極具魅力的人
    '''
classLabelVector = [] index = 0 for line in arrayOLine: line = line.strip() #strip,預設刪除空白符(包括'\n', '\r', '\t', ' ') listFromLine = line.split('\t') returnMat[index, :] = listFromLine[0: 3] #選取前3個元素儲存到特徵矩陣 classLabelVector.append(int(listFromLine[-1])) #-1表示最後一列元素,如果不用int(),將當做字串處理
index += 1 return returnMat, classLabelVector #歸一化特徵值 def autoNorm(dataSet): minVals = dataSet.min(0) #存放每一列的最小值,min(0)引數0可以從列中選取最小值,而不是當前行最小值 maxVals = dataSet.max(0) #存放每一列的最大值 ranges = maxVals - minVals #1 * 3 矩陣 normDataSet = zeros(shape(dataSet)) #列 m = dataSet.shape[0
] #行 normDataSet = dataSet - tile(minVals, (m, 1)) #tile(A, (row, col)) normDataSet = normDataSet/tile(ranges, (m, 1)) return normDataSet, ranges, minVals #分類器針對約會網站的測試程式碼 def dataingClassTest(): hoRatio = 0.1 datingDataMat, datingLabels = file2matrix('datingTestSet2.txt') normMat, ranges, minVals = autoNorm(datingDataMat) m = normMat.shape[0] numTestVecs = int(m*hoRatio) #用於測試的資料條數 errorCount = 0.0 #錯誤率 for i in range(numTestVecs): classifierResult = classify0(normMat[i,:], normMat[numTestVecs:m,:],\ datingLabels[numTestVecs:m], 3) print "the classifier came back with: %d, the real answer is: %d"\ %(classifierResult, datingLabels[i]) if(classifierResult != datingLabels[i]): errorCount += 1.0 print "the total error rate is: %f" %(errorCount/float(numTestVecs))

測試:

... ...
the classifier came back with: 3, the real answer is: 3
the classifier came back with: 2, the real answer is: 2
the classifier came back with: 1, the real answer is: 1
the classifier came back with: 3, the real answer is: 1
the total error rate is: 0.050000

錯誤率為5%
新增以下函式,進行預測

#約會網站預測函式
def classifyPerson():
    resultList = ['not at all', 'in small doses', 'in large doses']
    percentTats = float(raw_input("percentage of time spent playing video games?"))
    ffMiles = float(raw_input("frequent flier miles earned per year?"))
    iceCream = float(raw_input("liters of ice cream consumed per year?"))
    datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    inArr = array([ffMiles, percentTats, iceCream])
    classifierResult = classify0((inArr-minVals)/ranges, normMat, datingLabels, 3)
    print "You will probably like this person:", resultList[classifierResult-1]
>>> import KNN
>>> classifyPerson()
percentage of time spent playing video games?20
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.6
You will probably like this person: in large doses