《機器學習實戰》學習筆記（1）——k-近鄰演算法

阿新 • • 發佈：2018-12-30

1 k-近鄰演算法概述

k-近鄰演算法，採用測量不同特徵值之間的距離方法進行分類。

工作原理：
存在一個樣本資料集，也成為訓練樣本集，並且樣本集中每個資料都存在標籤，即我們知道樣本集中的每一資料與所屬分類的對應關係。輸入沒有標籤的新資料後，將新資料的每個特徵與樣本集中資料對應的特徵進行比較，然後演算法提取樣本集中特徵最相似資料（最近鄰）的分類標籤。

2 k-近鄰演算法虛擬碼

對未知類別屬性的資料集中的每個點依次執行以下操作：
（1）計算已知類別資料集中的點與當前點之間的距離；
（2）按照距離遞增次序排序；
（3）選取與當前點距離最小的k個點；
（4）確定前k個點所在類別的出現頻率；
（5）返回前k個點出現頻率最高的類別作為當前點的預測分類。

3 歐氏距離(Euclidean Distance)

歐氏距離(Euclidean Distance)

歐氏距離是最易於理解的一種距離計算方法，源自歐氏空間中兩點間的距離公式。

(1)二維平面上兩點a(x1,y1)與b(x2,y2)間的歐氏距離：

(2)三維空間兩點a(x1,y1,z1)與b(x2,y2,z2)間的歐氏距離：

(3)兩個n維向量a(x11,x12,…,x1n)與 b(x21,x22,…,x2n)間的歐氏距離：

　　也可以用表示成向量運算的形式：

4 k-近鄰演算法的優點與缺點

（1）優點

精度高、對異常值不敏感、無資料輸入假定。

（2）缺點

計算複雜度高、空間複雜度高

（3）缺陷

k-近鄰演算法是基於例項的學習，使用演算法時，必須有接近實際資料的訓練樣本資料，必須儲存全部資料集，如果訓練資料集過大，必須使用大量的儲存空間

由於必須對資料集中的每個資料計算距離值，實際使用時，可能非常耗時

無法給出任何資料的基礎結構資訊，因此我們也無法知曉平均例項樣本和典型例項樣本具有什麼樣的特徵

5 Python程式碼實現

（1）建立資料集

def create_data_set():
    group = array([[1.0, 1.1], [1.0, 1.0], [0, 0 
], [0, 0.1]])
    labels = ['A', 'A', 'B', 'B']
    return group, labels

（2）構造 kNN 分類器

def classify0(inX, dataSet, labels, k):
    """
    分類器 v1.0
    :param inX: 用於分類的輸入向量
    :param dataSet: 輸入的訓練樣本集
    :param labels: 標籤向量（標籤向量的元素數目和矩陣 dataset 的行數相同）
    :param k: 用於選擇最近鄰居的數目
    :return: 排序首位的 label

    對未知類別屬性的資料集中的每個點依次執行以下操作：
    1、計算已知類別資料集中的點與當前點之間的距離
    2、按照距離遞增次序排序
    3、選取與當前點距離最小的 k 個點
    4、確定前 k 個點所在類別的出現頻率
    5、返回前 k 個點出現頻率最高的類別作為當前點的預測分類
    """
    # ndarray.shape 陣列維度的元組，ndarray.shape[0]表示陣列行數，ndarray.shape[1]表示列數
    dataSetSize = dataSet.shape[0]
    # print(dataSetSize)

    # 將輸入的 inX（1*2） 進行擴充套件，擴充套件為 4*2 矩陣，使其與訓練樣本集中的資料（4*2）矩陣作減法
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    # print(diffMat)

    # 將 差值矩陣 的每一項乘方
    sqDiffMat = diffMat**2
    # print(sqDiffMat)

    # 在指定的軸向上求得陣列元素的和（橫向）（行）
    sqDistances = sqDiffMat.sum(axis=1)
    # print(sqDistances)

    # 開方
    distances = sqDistances**0.5
    # print(distances)

    # 將 distances 陣列的元素排序 返回由其索引組成的 list
    sortedDistIndicies = distances.argsort()
    # print(sortedDistIndicies)

    # classCount 字典用於類別統計
    classCount = {}

    # 遍歷 sortedDistIndicies list，依次獲取最近的 k 個鄰居對應的 label
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        # print(voteIlabel)

        # 若 classCount 字典中不存在 當前 voteIlabel ，則置該 key voteIlabel 對應的 value 為 0
        # 否則 +1
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
        # print(classCount)

    # print(classCount)

    # 將 classCount 字典進行排序，按照 items 的值，倒序（從大到小排列）
    sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
    # print(sortedClassCount)

    # 將排序首位的 label 作為返回值
    return sortedClassCount[0][0]

# print(classify0([0, 0], group, labels, 3))

6 示例：約會網站相親物件與手寫數字識別系統

（1）約會網站資料

"""
在約會網站上使用 kNN
1.收集資料： 提供文字檔案
2.準備資料： 使用 Python 解析文字檔案
3.分析資料： 使用 Matplotlib 畫二維擴散圖
4.訓練演算法： 此步驟不適合 k-近鄰演算法
5.測試演算法：
    測試樣本與非測試樣本的區別在於：
        測試樣本是已經完成分類的資料，如果預測分類與實際類別不用，則標記為一個錯誤
6.使用演算法： 產生簡單的命令列程式，然後可以輸入一些特徵資料以判斷對方是否為自己喜歡的型別
"""
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from kNN import classify0


def file2matrix(filename):
    """
    將讀取的檔案轉換為矩陣
    :param filename: 檔名
    :return: 轉換後的矩陣
    """
    # 開啟檔案
    fr = open(filename)

    # 將檔案內容按行讀取為一個 list
    arrayOLines = fr.readlines()

    # 獲取 list 的長度，即檔案內容的行數
    numberOfLines = len(arrayOLines)

    # 生成一個 numberOfLines*3 並以 0 ，進行填充的矩陣
    returnMat = np.zeros((numberOfLines, 3))

    # 分類標籤 向量
    classLabelVector = []

    #
    index = 0

    # 遍歷讀入檔案的每一行
    for line in arrayOLines:
        # 擷取掉所有的回車符
        line = line.strip()

        # 將 line 以空格符進行分割
        listFromLine = line.split('\t')

        # index 行的所有元素替換為 listFromLine 中的 [0:3]
        returnMat[index,:] = listFromLine[0:3]

        # 分類標籤向量 list 中新增 listFromLine 中的最後一項
        classLabelVector.append(int(listFromLine[-1]))

        #
        index += 1
    return returnMat, classLabelVector

datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')


def get_figure(datingDataMat, datingLabels):
    """
    直接瀏覽文字檔案方法非常不友好，一般會採用圖形化的方式直觀地展示資料
    :param datingDataMat:
    :param datingLabels:
    :return:
    """
    fig = plt.figure()
    ax = fig.add_subplot(111)

    # 使用 datingDataMat 矩陣的第二、第三列資料
    # 分別表示特徵值“玩視訊遊戲所消耗時間百分比”和“每週所消費的冰淇淋公升數”
    # ax.scatter(datingDataMat[:, 1], datingDataMat[:, 2])

    # 利用變數 datingLabels 儲存的類標籤屬性，在散點圖上繪製色彩不等，尺寸不同的點
    # scatter plot 散點圖
    ax.scatter(datingDataMat[:,0], datingDataMat[:,1],
               15.0*np.array(datingLabels), 15.0*np.array(datingLabels))
    plt.show()

# get_figure(datingDataMat, datingLabels)


def autoNorm(dataSet):
    """
    方程中數字差值最大的屬性對計算結果的影響最大，在處理這種不同範圍的特徵值時，採用將數值歸一化的方法
    :param dataSet: 輸入的資料集
    :return: 歸一化後的資料集
    """
    # dataSet.min(0) 中的引數 0 使得函式可以從列中選取最小值，而不是選當前行的最小值
    # minVals 儲存每列中的最小值
    minVals = dataSet.min(0)

    # maxVals 儲存每行中的最小值
    maxVals = dataSet.max(0)

    # 求得差值
    ranges = maxVals - minVals

    #
    # normDataSet = np.zeros(np.shape(dataSet))

    # 將資料集 dataSet 的行數放入 m
    m = dataSet.shape[0]

    # 歸一化
    normDataSet = dataSet - np.tile(minVals, (m,1))
    normDataSet = normDataSet/np.tile(ranges, (m,1))
    return normDataSet, ranges, minVals

# normDataSet, ranges, minVals = autoNorm(datingDataMat)

def datingClassTest():

    # 選擇 10% 的資料作為測試資料，90% 的資料為訓練資料
    hoRatio = 0.10

    # 將輸入的檔案轉換為 矩陣形式
    datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')

    # 將特徵值歸一化
    normDataSet, ranges, minVals = autoNorm(datingDataMat)

    # 計算測試向量的數量
    m = normDataSet.shape[0]
    numTestVecs = int(m*hoRatio)

    # 錯誤數量統計
    errorCount = 0.0

    # 遍歷 測試向量
    for i in range(numTestVecs):

        # # 取 資料集 的後 10% 作為測試資料，錯誤率為 5%

        # 呼叫 classify0() 函式
        # 以歸一化後的的資料集 normDataSet 的第 i 行資料作為測試資料，
        # 以 numTestVecs:m 行資料作為訓練資料，
        # datingLabels[numTestVecs:m] 作為標籤向量，
        # 選擇最近的 3 個鄰居
        classifierResult = classify0(normDataSet[i,:], normDataSet[numTestVecs:m,:],
                                     datingLabels[numTestVecs:m], 3)

        # 列印 預測結果 與 實際結果
        print("the classifier came back with: %d,"
              "the real answer is: %d " % (classifierResult, datingLabels[i]))

        # 當預測失敗時，錯誤數量 errorCount += 1
        if classifierResult != datingLabels[i]:
            errorCount += 1.0

        # # -----------------------------------------------------------------------
        # # 取 資料集 的後 10% 作為測試資料，錯誤率為 6%
        # classifierResult = classify0(normDataSet[m-numTestVecs+i, :], normDataSet[:m-numTestVecs, :],
        #                              datingLabels[:m-numTestVecs], 3)
        #
        # print("the classifier came back with: %d,"
        #       "the real answer is: %d " % (classifierResult, datingLabels[m-numTestVecs+i]))
        #
        # if classifierResult != datingLabels[m-numTestVecs+i]:
        #     errorCount += 1.0
        # # -----------------------------------------------------------------------


    print("the total error rate is : %f" % (errorCount/float(numTestVecs)))

# datingClassTest()

def classifyPerson():
    # 預測結果 list
    resultList = ['not at all', 'in small doses', 'in large doess']

    # 獲取使用者輸入
    percentTats = float(input('percentage of time spent playing video games?'))
    iceCream = float(input('liters of ice cream consumed per year?'))
    ffMile = float(input('frequent flier miles earned per year?'))

    # 歸一化資料集
    normDataSet, ranges, minVals = autoNorm(datingDataMat)

    # 將使用者輸入轉化為一個 Matrix
    inArr = np.array([ffMile, percentTats, iceCream])

    # 呼叫 classify0() ，將使用者輸入矩陣歸一化後進行運算
    classifierResult = classify0((inArr - minVals)/ranges, normDataSet, datingLabels, 3)

    # 列印預測結果
    print('You will probably like this person: ', resultList[classifierResult])

classifyPerson()

（2）手寫數字識別系統

import numpy as np
from os import listdir
from kNN import classify0

def img2vector(filename):
    returnVect = np.zeros((1,1024))
    fr = open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0, 32*i + j] = int(lineStr[j])
    return returnVect

# print(img2vector('testDigits/0_13.txt')[0, 32:63])
# print(listdir('testDigits'))

def handwritingClassTest():
    # 標籤向量
    hwLabels = []

    # trainingDigits 目錄下的檔案 list
    traingingFileList = listdir('trainingDigits')

    # trainingDigits 目錄下的檔案個數
    m = len(traingingFileList)

    # 1*1024 由 0 填充的矩陣
    traingingMat = np.zeros((m,1024))

    # 遍歷 trainingDigits 下的所有檔案
    for i in range(m):
        # 獲取當前檔案的檔名
        fileNameStr = traingingFileList[i]

        # 獲取當前文字所代表的數值
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])

        # 在標籤向量 list 中 新增此數值
        hwLabels.append(classNumStr)

        # 訓練矩陣第 i 行填充當前開啟檔案的 1024 個字元
        traingingMat[i,:] = img2vector('trainingDigits/{}'.format(fileNameStr))

    # testDigits 目錄下的檔名稱 list
    testFileList = listdir('testDigits')

    # 錯誤率
    errorCount = 0.0

    # 測試檔案的數量
    mTest = len(testFileList)

    # 遍歷測試檔案
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('testDigits/{}'.format(fileNameStr))
        classifierResult = classify0(vectorUnderTest, traingingMat, hwLabels, 3)
        print('the classifier came back with: %d,'
              'the real answer is: %d' % (classifierResult, classNumStr))
        if classifierResult != classNumStr:
            errorCount += 1.0
    print('the total number of errors is: %d' % errorCount)
    print('the total error rate is: %f' % (errorCount/float(mTest)))

# handwritingClassTest()


# the total number of errors is: 10
# the total error rate is: 0.010571
# 錯誤率 1.06%

def my_handwritingClassTest():
    hwLabels = []
    traingingFileList = listdir('trainingDigits')
    m = len(traingingFileList)
    traingingMat = np.zeros((m,1024))
    for i in range(m):
        fileNameStr = traingingFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        hwLabels.append(classNumStr)
        traingingMat[i,:] = img2vector('trainingDigits/{}'.format(fileNameStr))
    testFileList = listdir('test_data')
    errorCount = 0.0
    mTest = len(testFileList)
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('test_data/{}'.format(fileNameStr))
        classifierResult = classify0(vectorUnderTest, traingingMat, hwLabels, 3)
        print('the classifier came back with: %d,'
              'the real answer is: %d' % (classifierResult, classNumStr))
        if classifierResult != classNumStr:
            errorCount += 1.0
    print('the total number of errors is: %d' % errorCount)
    print('the total error rate is: %f' % (errorCount/float(mTest)))

my_handwritingClassTest()

"""
the classifier came back with: 3,the real answer is: 3
the classifier came back with: 6,the real answer is: 6
the classifier came back with: 7,the real answer is: 7
the classifier came back with: 8,the real answer is: 8
the classifier came back with: 1,the real answer is: 9
the total number of errors is: 1
the total error rate is: 0.200000

可能是因為 9 寫的太細長了，以至於長得像 1？
"""

7 使用 pandas 和 scikit-learn 實現書上的例子

（1）建立資料集

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
def createDataSet():
    group = DataFrame([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]], columns=['feature_1', 'feature_2'])
    labels = DataFrame(['A','A','B','B'], columns=['labels'])
    data_set = group.join(labels)
    return data_set
dataSet = createDataSet()

feature_1	feature_2	labels
0	1.0	1.1	A
1	1.0	1.0	A
2	0.0	0.0	B
3	0.0	0.1	B

（2）應用 scikit-learn 中的 KNeighborsClassifier

from sklearn.neighbors import KNeighborsClassifier

# 定義一個knn分類器物件
knn = KNeighborsClassifier(algorithm='brute')

# 呼叫該物件的訓練方法，主要接收兩個引數：訓練資料集及其樣本標籤
knn.fit(x_train, y_train)

In [2]:

from sklearn.neighbors import KNeighborsClassifier

import numpy as np

from pandas import Series, DataFrame

×…In [6]:

def createDataSet():

    group = DataFrame([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1],[0.1,0]], columns=['feature_1', 'feature_2'])

    labels = DataFrame(['A','A','B','B','B'], columns=['labels'])

    data_set = group.join(labels)

    return data_set

dataSet = createDataSet()

x_train = dataSet.iloc[:, :2].values

y_train = dataSet.iloc[:, -1].values

×…In [7]:

# 定義一個knn分類器物件

knn = KNeighborsClassifier(algorithm='brute')

×…In [8]:

# 呼叫該物件的訓練方法，主要接收兩個引數：訓練資料集及其樣本標籤

knn.fit(x_train, y_train)

×Out[8]:

KNeighborsClassifier(algorithm='brute', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

…In [9]:

x_test = np.array([0, 0.3])

×…In [10]:

y_predict = knn.predict(x_test.reshape(1,-1))

y_predict

×Out[10]:

array(['B'], dtype=object)

…In [13]:

probility = knn.predict_proba(x_test.reshape(1,-1))

probility

×Out[13]:

array([[ 0.4,  0.6]])

…In [15]:

probility.argmax()

×Out[15]:

…In [18]:

# 距離升序排列

knn.kneighbors(x_test.reshape(1,-1),5,False)

×Out[18]:

array([[3, 2, 4, 1, 0]], dtype=int64)

…

（3）約會網站配對資料集測試

In [3]:

import numpy as np

import pandas as pd

from pandas import Series, DataFrame

×…In [9]:

group = DataFrame([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]], columns=['feature_1', 'feature_2'])

group

×Out[9]:

feature_1	feature_2
0	1.0	1.1
1	1.0	1.0
2	0.0	0.0
3	0.0	0.1

…In [11]:

labels = DataFrame(['A','A','B','B'], columns=['labels'])

labels

×Out[11]:

labels
0	A
1	A
2	B
3	B

…In [17]:

# 效果相同

# data_set = pd.merge(group, labels, how='outer', left_index=True, right_index=True)

data_set = group.join(labels)

data_set

×Out[17]:

feature_1	feature_2	labels
0	1.0	1.1	A
1	1.0	1.0	A
2	0.0	0.0	B
3	0.0	0.1	B

…In [18]:

data_set['feature_1']

×Out[18]:

0    1.0
1    1.0
2    0.0
3    0.0
Name: feature_1, dtype: float64

…In [19]:

data_set.ix[0]

×Out[19]:

feature_1      1
feature_2    1.1
labels         A
Name: 0, dtype: object

…In [20]:

data_set['feature_1'][0]

×Out[20]:

1.0

…In [22]:

data_set.ix[0]['feature_1']

×Out[22]:

1.0

…In [24]:

data_set.iloc[0]

×Out[24]:

feature_1      1
feature_2    1.1
labels         A
Name: 0, dtype: object

…In [25]:

data_set.iloc[0, :]

×Out[25]:

feature_1      1
feature_2    1.1
labels         A
Name: 0, dtype: object

…In [26]:

data_set.iloc[:, 0]

×Out[26]:

0    1.0
1    1.0
2    0.0
3    0.0
Name: feature_1, dtype: float64

…In [27]:

data_set.iloc[:, 0].values

×Out[27]:

array([ 1.,  1.,  0.,  0.])

…In [29]:

data_set.shape

×Out[29]:

(4, 3)

…In [30]:

len(data_set.columns)

×Out[30]:

…In [31]:

data_set.values

×Out[31]:

array([[1.0, 1.1, 'A'],
       [1.0, 1.0, 'A'],
       [0.0, 0.0, 'B'],
       [0.0, 0.1, 'B']], dtype=object)

…In [33]:

data_set.iloc[:, :2].values

×Out[33]:

array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])

…In [28]:

def createDataSet():

    group = DataFrame([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]], columns=['feature_1', 'feature_2'])

    labels = DataFrame(['A','A','B','B'], columns=['labels'])

    data_set = group.join(labels)

    return data_set

createDataSet()

×Out[28]:

feature_1	feature_2	labels
0	1.0	1.1	A
1	1.0	1.0	A
2	0.0	0.0	B
3	0.0	0.1	B

…

《機器學習實戰》學習筆記（1）——k-近鄰演算法

1 k-近鄰演算法概述 k-近鄰演算法，採用測量不同特徵值之間的距離方法進行分類。工作原理：存在一個樣本資料集，也成為訓練樣本集，並且樣本集中每個資料都存在標籤，即我們知道樣本集中的每一資料與所屬分類的對應關係。輸入沒有標籤的新資料後，將新資

R語言與機器學習學習筆記（分類演算法）（1）K-近鄰演算法

前言最近在學習資料探勘，對資料探勘中的演算法比較感興趣，打算整理分享一下學習情況，順便利用R來實現一下資料探勘演算法。資料探勘裡我打算整理的內容有：分類，聚類分析，關聯分析，異常檢測四大部分。其中分類演算法主要介紹：K-近鄰演算法，決策樹演算法，樸素

Python3 機器學習實戰自我講解（二） K-近鄰法-海倫約會-手寫字型識別

第二章 k近鄰法 2.1 概念 2.1.1 k近鄰法簡介 k近鄰法(k-nearest neighbor, k-NN)是1967年由Cover T和Hart P提出的一種基本分類與迴歸方法。它的工作原理是：存在一個樣本資料集合，也稱作為訓練樣

統計學習方法筆記（三）K近鄰演算法

K近鄰法（KNN）是一種基本的分類和迴歸的方法，KNN的基本思想是給出一定數量帶有標籤的訓練樣本，使用這些訓練樣本將特徵空間劃分成許多的子空間，當一個新的測試樣本進來以後，這個測試樣本一定會落在一個超矩形區域內部，然後找到距離這個測試樣本最近的K個訓練樣本，用這些訓練樣本的

機器學習筆記（1）感知機演算法之實戰篇

我們在上篇筆記中介紹了感知機的理論知識，討論了感知機的由來、工作原理、求解策略、收斂性。這篇筆記中，我們親自動手寫程式碼，使用感知機演算法解決實際問題。先從一個最簡單的問題開始，用感知機演算法解決OR邏輯的分類。 import numpy as np import matplotlib.pyplot as

《機器學習實戰》筆記（三）：樸素貝葉斯

4.1 基於貝葉斯決策理論的分類方法樸素貝葉斯是貝葉斯決策理論的一部分，貝葉斯決策理論的的核心思想，即選擇具有最高概率的決策。若p1(x,y)和p2(x,y)分別代表資料點(x,y)屬於類別1,2的概率，則判斷新資料點(x,y)屬於哪一類別的規則是： 4.3 使用條件概率來分類

《機器學習實戰》筆記（一）：K-近鄰演算法

一、K-近鄰演算法 1.1 k-近鄰演算法簡介簡單的說，K-近鄰演算法採用測量不同特徵值之間的距離的方法進行分類。 1.2 原理存在一個樣本資料集合，也稱作訓練樣本集，並且樣本集中每個資料都存在標籤，即我們知道樣本集中每一資料與所屬分類的對應關係。輸入沒有標籤的新資料

機器學習實戰讀書筆記（四）：樸素貝葉斯演算法

樸素貝葉斯優點: 在資料較少的情況下仍然有效可以處理多類別問題缺點：對輸入的資料的準備方式較為敏感適用資料型別：標稱型資料 p1(x,y)>p2(x,y) 那麼類別是1 p2(x,y)>p1(x,y) 那麼類別是2 貝葉斯決策的核心是選擇具有最高概率的決策

周志華《機器學習》讀書筆記（1）

越來越覺得一個碼農應該學點機器學習相關的東西了。希望畢業前能看完這本書。— 2017年11月22日第一章緒論資料集中的每條記錄是關於一個時間或者物件的描述，成為一個“示例”（instance）或”樣本“（sample），反應時間或物件在

吳恩達（Andrew Ng）《機器學習》課程筆記（1）第1周——機器學習簡介，單變數線性迴歸

吳恩達（Andrew Ng）在 Coursera 上開設的機器學習入門課《Machine Learning》：目錄一、引言一、引言 1.1、機器學習（Machine Learni

Day1----Python學習之路筆記（1）

文件名常見 python3 3.2 HP lob 計算機硬件至少數字學習路線 Day1　　　　Day2　　　　Day3　　　　Day4　　　　Day5　　　　...待續　　　　一、了解開發語言　　1、高級語言：Python，Java，C++，C#，PHP，

機器學習實戰（一）k-近鄰演算法kNN（k-Nearest Neighbor）

目錄 0. 前言簡單案例學習完機器學習實戰的k-近鄰演算法，簡單的做個筆記。文中部分描述屬於個人消化後的理解，僅供參考。如果這篇文章對你有一點小小的幫助，請給個關注喔~我會非常開心的~ 0. 前言 k-近鄰演算法kNN（k-Neare

機器學習實戰（2）—— k-近鄰演算法

老闆：來了，老弟！我：來了來了。老闆：今天你要去看看KNN了，然後我給你安排一個工作！我：好嘞！就是第二章嗎？老闆：對！去吧！可惡的老闆又給我安排任務了！《機器學習實戰》這本書中的第二章為我們介紹了K-近鄰演算法，這是本書中第一個機器學習演算法，它非常有效而且易於

機器學習筆記（3）——K近鄰法

K-nearest neighbor（KNN） k近鄰法一種基本的分類與迴歸方法，原理和實現都比較直觀。其輸入為樣本的特徵向量，輸出為樣本的類別，可以進行多類別分類。k近鄰法是通過統計與未知樣本最近點的訓練樣本的類別來投票決定未知樣本的類別，不具有顯式的學習過

機器學習實戰（一）--k近鄰演算法

機器學習實戰（一）–k近鄰演算法最近在學習機器學習，順便做個記錄，一方面給自己加深印象，另一方面與大家共勉，希望能給大家一些幫助，我也是剛入門的新手，有不對的地方還請多多指教。我用的Python3.5，有些程式碼與書上不太一樣。程式清單2-1 k-近

OpenResty 最佳實踐學習--實戰演習筆記（3）

本篇總結來自 OpenResty(Nginx+Lua)開發入門，基本的程式碼沒有改動，主要是自己實際動手操作，測試 Nginx Lua API ！我們需要接收請求、處理並輸出響應。而對於請求我們需要獲取如請求引數、請求頭、Body體等資訊；而對於處理就

機器學習（6）K近鄰演算法

k-近鄰，通過離你最近的來判斷你的類別例子：定義：如果一個樣本在特徵空間中的k個最相似（即特徵空間中最鄰近的樣本中大多數屬於某一類別），則該樣本屬於這個類別 K近鄰需要做標準化處理例如： import numpy as npimport pandas as pdfrom mat

《統計學習方法》學習筆記（三）——K近鄰法

　　K近鄰法對於已標記類別，在新的例項樣本進行分類時，根據離其最近的K個訓練樣本例項，統計每類的相應的個數，通過多數表決等方式進行預測。舉個最簡單的例子，就是當K=1時，就是我們所熟悉的最近鄰方法（NN）。　　首先，我們需要判斷離新的例項樣本最近的K個訓

【機器學習實戰之一】：C++實現K-近鄰演算法KNN

本文不對KNN演算法做過多的理論上的解釋，主要是針對問題，進行演算法的設計和程式碼的註解。 KNN演算法：優點：精度高、對異常值不敏感、無資料輸入假定。缺點：計算複雜度高、空間複雜度高。適用資料範圍：數值型和標稱性。工作原理：存在一個樣本資料集合，也稱作訓練樣本集，

20180813視頻筆記深度學習基礎上篇（1）之必備基礎知識點深度學習基礎上篇（2）神經網絡模型視頻筆記：深度學習基礎上篇（3）神經網絡案例實戰和深度學習基礎下篇

計算概念人臉識別大量 png 技巧表現 lex github 深度學習基礎上篇（3）神經網絡案例實戰 https://www.bilibili.com/video/av27935126/?p=1 第一課:開發環境的配置 Anaconda的安裝庫的安裝 Windo

《機器學習實戰》學習筆記（1）——k-近鄰演算法

1 k-近鄰演算法概述

2 k-近鄰演算法虛擬碼

3 歐氏距離(Euclidean Distance)

4 k-近鄰演算法的優點與缺點

5 Python程式碼實現

6 示例：約會網站相親物件與手寫數字識別系統

7 使用 pandas 和 scikit-learn 實現書上的例子

相關推薦