1. 程式人生 > >A Review on Multi-Label Learning Algorithms

A Review on Multi-Label Learning Algorithms

在多標籤分類中,有一種方法就是按照KNN的類似演算法去求出每一個維度的結果。也是看周志華老師的review突然就想實現以下,然後實現了一個相當簡單的。

首先我們需要進行計算的是在近鄰數目為k的情況下的貝葉斯分佈的可能。

也就是,首先對於每一個樣本求其近鄰,然後按照近鄰在這一維度上的分類進行樸素貝葉斯的統計,遇到一個新樣本,首先按照最近鄰來計算近鄰的集合,然後在每一個維度上根據其樸素貝葉斯的統計來進行計算。

下面是一個玩具版的程式碼實現,首先根據樣本來計算貝葉斯分佈的結果,同時計算近鄰的集合。
有了一個例項樣本之後,可以計算近鄰集合並且根據近鄰集合計算樣本結果。

import numpy as
np def NB(X,Y,k,NN): NBdis = []; for i in range(0,Y.shape[1]): NBdis.append( (np.sum(Y[:,i])+1) /float(Y.shape[0]+2)); NBtable = []; for i in range(0,Y.shape[1]): dis = np.zeros((k+1,2)); for j in range(0,X.shape[0]): neighbours = NN[j]; tmpX = np.sum(Y[neighbours,i]); if
Y[j,i] == 0: dis[tmpX,1] += 1; else: dis[tmpX,0] += 1; smooth = 1; dis = dis+1 / np.sum(dis+1,axis = 1,keepdims = True); NBtable.append(dis); return (NBdis,NBtable); def findKNN(X,k): NN = []; for x in X: tmpX = X.copy(); tmpX -= x; tmpX = tmpX * tmpX; distance = np.sum(tmpX,axis = 1
); NN.append(np.argsort(distance)[1:k+1]); return(NN); def predictFindNN(X,x,k): tmpX = X.copy(); tmpX -= x; tmpX = tmpX * tmpX; distance = np.sum(tmpX,axis = 1); return(np.argsort(distance)[0:k]); def predictLabel(nn,NBdis,NBtable,Y): tmpY = Y[nn]; tmpY = np.sum(tmpY,axis = 0); labels = np.zeros((1,Y.shape[1])); for i in range(labels.shape[1]): if NBdis[i]*NBtable[i][tmpY[i],0] > (1-NBdis[i])*NBtable[i][tmpY[i],1]: labels[0][i] = 1; else: labels[0][i] = 0; return labels; X = np.array([[1,0,1,1,0],[0,1,1,1,0],[1,0,1,0,1]]) Y = np.array([[1,0,1,1],[1,0,1,0],[1,0,0,0]]); k = 2; NN = findKNN(X,k); print(NN); print('\n'); (NBdis,NBtable) = NB(X,Y,k,NN); print('\nNaive Bayes probs'); print(NBdis); print('\nNaive Bayes table:'); for table in NBtable: print(table); print('\n'); x = ([0,1,1,1,0]); nn = predictFindNN(X,x,k); print('\nNearest neighbours'); print(nn); label = predictLabel(nn,NBdis,NBtable,Y); print('\nLabel predict'); print(label);