1. 程式人生 > >K近鄰分類演算法實現 in Python

K近鄰分類演算法實現 in Python

K近鄰(KNN):分類演算法

* KNN是non-parametric分類器(不做分佈形式的假設,直接從資料估計概率密度),是memory-based learning.

* KNN不適用於高維資料(curse of dimension)

* Machine Learning的Python庫很多,比如mlpy更多packages),這裡實現只是為了掌握方法

* KNN演算法複雜度高(可用KD樹優化,C中可以用libkdtree或者ANN

* k越小越容易過擬合,但是k很大會降分類精度(設想極限情況:k=1和k=N(樣本數))

本文不介紹理論了,註釋見程式碼。

KNN.py

from numpy import *
import operator

class KNN:
    def createDataset(self):
        group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
        labels = ['A','A','B','B']
        return group,labels

    def KnnClassify(self,testX,trainX,labels,K):
        [N,M]=trainX.shape
    
    #calculate the distance between testX and other training samples
        difference = tile(testX,(N,1)) - trainX # tile for array and repeat for matrix in Python, == repmat in Matlab
        difference = difference ** 2 # take pow(difference,2)
        distance = difference.sum(1) # take the sum of difference from all dimensions
        distance = distance ** 0.5
        sortdiffidx = distance.argsort()
    
    # find the k nearest neighbours
        vote = {} #create the dictionary
        for i in range(K):
            ith_label = labels[sortdiffidx[i]];
            vote[ith_label] = vote.get(ith_label,0)+1 #get(ith_label,0) : if dictionary 'vote' exist key 'ith_label', return vote[ith_label]; else return 0
        sortedvote = sorted(vote.iteritems(),key = lambda x:x[1], reverse = True)
        # 'key = lambda x: x[1]' can be substituted by operator.itemgetter(1)
        return sortedvote[0][0]

k = KNN() #create KNN object
group,labels = k.createDataset()
cls = k.KnnClassify([0,0],group,labels,3)
print cls
-------------------
執行:

1. 在Python Shell 中可以執行KNN.py

>>>import os

>>>os.chdir("/Users/mba/Documents/Study/Machine_Learning/Python/KNN")

>>>execfile("KNN.py")

輸出B

(B表示類別)

2. 或者terminal中直接執行

$ python KNN.py

3. 也可以不在KNN.py中寫輸出,而選擇在Shell中獲得結果,i.e.,

>>>import KNN

>>> KNN.k.KnnClassify([0,0],KNN.group,KNN.labels,3)

關於Python更多的學習資料將繼續更新,敬請關注本部落格和新浪微博Rachel Zhang