tensorflow基本演算法(2)：最近鄰演算法nearest neighbor

阿新 • • 發佈：2019-01-27

參考維基百科：

在模式識別領域中，最近鄰居法（KNN演算法，又譯K-近鄰演算法）是一種用於分類和迴歸的非引數統計方法。在這兩種情況下，輸入包含特徵空間中的k個最接近的訓練樣本。

在k-NN分類中，輸出是一個分類族群。一個物件的分類是由其鄰居的“多數表決”確定的，k個最近鄰居（k為正整數，通常較小）中最常見的分類決定了賦予該物件的類別。若k = 1，則該物件的類別直接由最近的一個節點賦予。

在k-NN迴歸中，輸出是該物件的屬性值。該值是其k個最近鄰居的值的平均值。

最近鄰居法採用向量空間模型來分類，概念為相同類別的案例，彼此的相似度高，而可以藉由計算與已知類別案例之相似度，來評估未知類別案例可能的分類。

K-NN是一種基於例項的學習，或者是區域性近似和將所有計算推遲到分類之後的惰性學習。k-近鄰演算法是所有的機器學習演算法中最簡單的之一。

無論是分類還是迴歸，衡量鄰居的權重都非常有用，使較近鄰居的權重比較遠鄰居的權重大。例如，一種常見的加權方案是給每個鄰居權重賦值為1/ d，其中d是到鄰居的距離。

鄰居都取自一組已經正確分類（在迴歸的情況下，指屬性值正確）的物件。雖然沒要求明確的訓練步驟，但這也可以當作是此演算法的一個訓練樣本集。

k-近鄰演算法的缺點是對資料的區域性結構非常敏感。本演算法與K-平均演算法（另一流行的機器學習技術）沒有任何關係，請勿與之混淆。

演算法：

訓練樣本是多維特徵空間向量，其中每個訓練樣本帶有一個類別標籤。演算法的訓練階段只包含儲存的

特徵向量和訓練樣本的標籤。

在分類階段，k是一個使用者定義的常數。一個沒有類別標籤的向量（查詢或測試點）將被歸類為最接近該點的k個樣本點中最頻繁使用的一類。

一般情況下，將歐氏距離作為距離度量，但是這是隻適用於連續變數。在文字分類這種離散變數情況下，另一個度量——重疊度量（或海明距離）可以用來作為度量。例如對於基因表達微陣列資料，k-NN也與Pearson和Spearman相關係數結合起來使用。通常情況下，如果運用一些特殊的演算法來計算度量的話，k近鄰分類精度可顯著提高，如運用大間隔最近鄰居或者鄰里成分分析法。

“多數表決”分類會在類別分佈偏斜時出現缺陷。也就是說，出現頻率較多的樣本將會主導測試點的預測結果，因為他們比較大可能出現在測試點的K鄰域而測試點的屬性又是通過k

鄰域內的樣本計算出來的。解決這個缺點的方法之一是在進行分類時將樣本到k個近鄰點的距離考慮進去。k近鄰點中每一個的分類（對於迴歸問題來說，是數值）都乘以與測試點之間距離的成反比的權重。另一種克服偏斜的方式是通過資料表示形式的抽象。例如，在自組織對映（SOM）中，每個節點是相似的點的一個叢集的代表（中心），而與它們在原始訓練資料的密度無關。K-NN可以應用到SOM中。

引數選擇：

如何選擇一個最佳的K值取決於資料。一般情況下，在分類時較大的K值能夠減小噪聲的影響，但會使類別之間的界限變得模糊。一個較好的K值能通過各種啟發式技術（見超引數優化）來獲取。

噪聲和非相關性特徵的存在，或特徵尺度與它們的重要性不一致會使K近鄰演算法的準確性嚴重降低。對於選取和縮放特徵來改善分類已經作了很多研究。一個普遍的做法是利用進化演算法優化功能擴充套件，還有一種較普遍的方法是利用訓練樣本的互資訊進行選擇特徵。

在二元（兩類）分類問題中，選取k為奇數有助於避免兩個分類平票的情形。在此問題下，選取最佳經驗k值的方法是自助法。

總結：

KNN演算法主要是選定引數K後，對待測樣本進行最近鄰對k個參考點進行距離計算，看K個參考點中哪一類佔大多數，並將該待測樣本劃分為該類。

具體步驟：

1）計算測試資料與各個訓練資料之間的距離；

2）按照距離的遞增關係進行排序；

3）選取距離最小的K個點；

4）確定前K個點所在類別的出現頻率；

5）返回前K個點中出現頻率最高的類別作為測試資料的預測分類。

tensorflow實現：

以mnist資料集為例：

import numpy as np
import tensorflow as tf

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Xtr, Ytr = mnist.train.next_batch(5000) #5000 for training (nn candidates)
Xte, Yte = mnist.test.next_batch(200) #200 for testing

# tf Graph Input
xtr = tf.placeholder("float", [None, 784])
xte = tf.placeholder("float", [784])

# Nearest Neighbor calculation using L1 Distance
# Calculate L1 Distance
distance = tf.reduce_sum(tf.abs(tf.add(xtr, tf.negative(xte))), reduction_indices=1)
# Prediction: Get min distance index (Nearest neighbor)
pred = tf.argmin(distance, 0)

accuracy = 0.

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    sess.run(init)

    # loop over test data
    for i in range(len(Xte)):
        # Get nearest neighbor
        nn_index = sess.run(pred, feed_dict={xtr: Xtr, xte: Xte[i, :]})
        # Get nearest neighbor class label and compare it to its true label
        print("Test", i, "Prediction:", np.argmax(Ytr[nn_index]), \
            "True Class:", np.argmax(Yte[i]))
        # Calculate accuracy
        if np.argmax(Ytr[nn_index]) == np.argmax(Yte[i]):
            accuracy += 1./len(Xte)
    print("Done!")
    print("Accuracy:", accuracy)

結果輸出：

Test 0 Prediction: 3 True Class: 3
Test 1 Prediction: 9 True Class: 4
Test 2 Prediction: 1 True Class: 1
Test 3 Prediction: 9 True Class: 9
Test 4 Prediction: 6 True Class: 6
Test 5 Prediction: 9 True Class: 9
Test 6 Prediction: 6 True Class: 6
Test 7 Prediction: 9 True Class: 9
Test 8 Prediction: 3 True Class: 3
Test 9 Prediction: 6 True Class: 6
Test 10 Prediction: 2 True Class: 2
Test 11 Prediction: 3 True Class: 3
Test 12 Prediction: 0 True Class: 0
Test 13 Prediction: 9 True Class: 9
Test 14 Prediction: 9 True Class: 9
Test 15 Prediction: 1 True Class: 1
Test 16 Prediction: 1 True Class: 1
Test 17 Prediction: 8 True Class: 8
Test 18 Prediction: 2 True Class: 2
Test 19 Prediction: 9 True Class: 9
Test 20 Prediction: 6 True Class: 6
Test 21 Prediction: 3 True Class: 3
Test 22 Prediction: 1 True Class: 1
Test 23 Prediction: 7 True Class: 7
Test 24 Prediction: 1 True Class: 1
Test 25 Prediction: 3 True Class: 3
Test 26 Prediction: 4 True Class: 4
Test 27 Prediction: 0 True Class: 0
Test 28 Prediction: 3 True Class: 3
Test 29 Prediction: 1 True Class: 1
Test 30 Prediction: 1 True Class: 1
Test 31 Prediction: 4 True Class: 4
Test 32 Prediction: 8 True Class: 8
Test 33 Prediction: 6 True Class: 6
Test 34 Prediction: 1 True Class: 1
Test 35 Prediction: 1 True Class: 1
Test 36 Prediction: 0 True Class: 0
Test 37 Prediction: 8 True Class: 8
Test 38 Prediction: 8 True Class: 4
Test 39 Prediction: 9 True Class: 9
Test 40 Prediction: 7 True Class: 7
Test 41 Prediction: 9 True Class: 7
Test 42 Prediction: 8 True Class: 8
Test 43 Prediction: 7 True Class: 7
Test 44 Prediction: 0 True Class: 0
Test 45 Prediction: 1 True Class: 1
Test 46 Prediction: 7 True Class: 7
Test 47 Prediction: 7 True Class: 2
Test 48 Prediction: 3 True Class: 3
Test 49 Prediction: 2 True Class: 2
Test 50 Prediction: 1 True Class: 1
Test 51 Prediction: 4 True Class: 4
Test 52 Prediction: 1 True Class: 1
Test 53 Prediction: 8 True Class: 8
Test 54 Prediction: 6 True Class: 6
Test 55 Prediction: 2 True Class: 2
Test 56 Prediction: 1 True Class: 1
Test 57 Prediction: 6 True Class: 6
Test 58 Prediction: 6 True Class: 6
Test 59 Prediction: 5 True Class: 5
Test 60 Prediction: 6 True Class: 6
Test 61 Prediction: 7 True Class: 2
Test 62 Prediction: 8 True Class: 8
Test 63 Prediction: 2 True Class: 2
Test 64 Prediction: 7 True Class: 7
Test 65 Prediction: 9 True Class: 9
Test 66 Prediction: 9 True Class: 9
Test 67 Prediction: 7 True Class: 7
Test 68 Prediction: 7 True Class: 0
Test 69 Prediction: 2 True Class: 2
Test 70 Prediction: 5 True Class: 5
Test 71 Prediction: 8 True Class: 8
Test 72 Prediction: 1 True Class: 1
Test 73 Prediction: 3 True Class: 8
Test 74 Prediction: 6 True Class: 6
Test 75 Prediction: 8 True Class: 8
Test 76 Prediction: 4 True Class: 4
Test 77 Prediction: 0 True Class: 0
Test 78 Prediction: 5 True Class: 5
Test 79 Prediction: 7 True Class: 7
Test 80 Prediction: 0 True Class: 0
Test 81 Prediction: 6 True Class: 6
Test 82 Prediction: 9 True Class: 9
Test 83 Prediction: 1 True Class: 1
Test 84 Prediction: 0 True Class: 0
Test 85 Prediction: 3 True Class: 3
Test 86 Prediction: 7 True Class: 7
Test 87 Prediction: 7 True Class: 7
Test 88 Prediction: 6 True Class: 6
Test 89 Prediction: 1 True Class: 1
Test 90 Prediction: 8 True Class: 8
Test 91 Prediction: 7 True Class: 7
Test 92 Prediction: 6 True Class: 6
Test 93 Prediction: 8 True Class: 8
Test 94 Prediction: 9 True Class: 9
Test 95 Prediction: 5 True Class: 5
Test 96 Prediction: 1 True Class: 1
Test 97 Prediction: 6 True Class: 6
Test 98 Prediction: 3 True Class: 3
Test 99 Prediction: 7 True Class: 7
Test 100 Prediction: 7 True Class: 7
Test 101 Prediction: 0 True Class: 0
Test 102 Prediction: 2 True Class: 2
Test 103 Prediction: 7 True Class: 7
Test 104 Prediction: 0 True Class: 0
Test 105 Prediction: 7 True Class: 7
Test 106 Prediction: 0 True Class: 0
Test 107 Prediction: 5 True Class: 3
Test 108 Prediction: 6 True Class: 6
Test 109 Prediction: 8 True Class: 8
Test 110 Prediction: 3 True Class: 3
Test 111 Prediction: 3 True Class: 3
Test 112 Prediction: 7 True Class: 7
Test 113 Prediction: 2 True Class: 2
Test 114 Prediction: 4 True Class: 4
Test 115 Prediction: 9 True Class: 9
Test 116 Prediction: 5 True Class: 5
Test 117 Prediction: 2 True Class: 2
Test 118 Prediction: 7 True Class: 7
Test 119 Prediction: 7 True Class: 7
Test 120 Prediction: 6 True Class: 6
Test 121 Prediction: 1 True Class: 1
Test 122 Prediction: 1 True Class: 1
Test 123 Prediction: 9 True Class: 9
Test 124 Prediction: 5 True Class: 5
Test 125 Prediction: 1 True Class: 1
Test 126 Prediction: 6 True Class: 6
Test 127 Prediction: 9 True Class: 9
Test 128 Prediction: 3 True Class: 3
Test 129 Prediction: 0 True Class: 0
Test 130 Prediction: 0 True Class: 0
Test 131 Prediction: 4 True Class: 4
Test 132 Prediction: 1 True Class: 1
Test 133 Prediction: 3 True Class: 3
Test 134 Prediction: 9 True Class: 9
Test 135 Prediction: 0 True Class: 0
Test 136 Prediction: 4 True Class: 4
Test 137 Prediction: 8 True Class: 8
Test 138 Prediction: 5 True Class: 5
Test 139 Prediction: 0 True Class: 0
Test 140 Prediction: 2 True Class: 2
Test 141 Prediction: 8 True Class: 8
Test 142 Prediction: 6 True Class: 6
Test 143 Prediction: 9 True Class: 9
Test 144 Prediction: 3 True Class: 3
Test 145 Prediction: 8 True Class: 8
Test 146 Prediction: 7 True Class: 7
Test 147 Prediction: 9 True Class: 9
Test 148 Prediction: 0 True Class: 0
Test 149 Prediction: 6 True Class: 6
Test 150 Prediction: 6 True Class: 6
Test 151 Prediction: 3 True Class: 3
Test 152 Prediction: 6 True Class: 6
Test 153 Prediction: 1 True Class: 1
Test 154 Prediction: 1 True Class: 1
Test 155 Prediction: 5 True Class: 5
Test 156 Prediction: 6 True Class: 6
Test 157 Prediction: 1 True Class: 1
Test 158 Prediction: 1 True Class: 1
Test 159 Prediction: 7 True Class: 7
Test 160 Prediction: 1 True Class: 1
Test 161 Prediction: 1 True Class: 1
Test 162 Prediction: 3 True Class: 2
Test 163 Prediction: 1 True Class: 1
Test 164 Prediction: 5 True Class: 5
Test 165 Prediction: 9 True Class: 9
Test 166 Prediction: 2 True Class: 2
Test 167 Prediction: 9 True Class: 9
Test 168 Prediction: 9 True Class: 4
Test 169 Prediction: 7 True Class: 7
Test 170 Prediction: 3 True Class: 3
Test 171 Prediction: 7 True Class: 7
Test 172 Prediction: 1 True Class: 1
Test 173 Prediction: 6 True Class: 6
Test 174 Prediction: 7 True Class: 7
Test 175 Prediction: 4 True Class: 4
Test 176 Prediction: 8 True Class: 8
Test 177 Prediction: 9 True Class: 9
Test 178 Prediction: 9 True Class: 4
Test 179 Prediction: 3 True Class: 3
Test 180 Prediction: 5 True Class: 8
Test 181 Prediction: 8 True Class: 8
Test 182 Prediction: 7 True Class: 7
Test 183 Prediction: 6 True Class: 6
Test 184 Prediction: 3 True Class: 3
Test 185 Prediction: 5 True Class: 5
Test 186 Prediction: 1 True Class: 3
Test 187 Prediction: 9 True Class: 9
Test 188 Prediction: 0 True Class: 0
Test 189 Prediction: 1 True Class: 1
Test 190 Prediction: 7 True Class: 7
Test 191 Prediction: 7 True Class: 7
Test 192 Prediction: 5 True Class: 5
Test 193 Prediction: 0 True Class: 0
Test 194 Prediction: 1 True Class: 1
Test 195 Prediction: 5 True Class: 5
Test 196 Prediction: 3 True Class: 3
Test 197 Prediction: 1 True Class: 1
Test 198 Prediction: 8 True Class: 8
Test 199 Prediction: 1 True Class: 6
Done!
Accuracy: 0.9300000000000007

tensorflow基本演算法(2)：最近鄰演算法nearest neighbor

參考維基百科：

演算法：

引數選擇：

總結：

tensorflow實現：

tensorflow基本演算法(2)：最近鄰演算法nearest neighbor

tensorflow100天—第5天：最近鄰演算法

tensorflow基本教程2：with結構

圖說十大資料探勘演算法(一)K最近鄰演算法

A.pro讀演算法の2：高精度演算法

資料探勘十大經典演算法之K最近鄰演算法

演算法題2：最長公共字首（python3實現）

影象演算法（一）：最近鄰插值，雙線性插值，三次插值

演算法優化：最大欄位和，雙指標遍歷(n^2)，分治法(nlogn)，動態規劃(n)

TensorFlow的最近鄰演算法

程式碼註釋：機器學習實戰第2章 k-近鄰演算法

《機器學習實戰》讀書筆記2：K-近鄰(kNN)演算法 & 原始碼分析

《機器學習實戰》第二章：k-近鄰演算法（2）約會物件分類

機器學習：K近鄰演算法，kd樹

KNN最近鄰演算法numpy版本——深度學習

Python3《機器學習實戰》學習筆記（一）：k-近鄰演算法

機器學習實戰筆記一：K-近鄰演算法在約會網站上的應用

《機器學習實戰》筆記（一）：K-近鄰演算法

每週一演算法(2)：費式數列

垃圾回收2：垃圾收集演算法

tensorflow基本演算法(2)：最近鄰演算法nearest neighbor

參考維基百科：

演算法：

引數選擇：

總結：

tensorflow實現：

相關推薦