室內定位系列（四）——位置指紋法的實現（測試各種機器學習分類器）

阿新 • • 發佈：2019-01-15

位置指紋法中最常用的演算法是k最近鄰（kNN）。本文的目的學習一下python機器學習scikit-learn的使用，嘗試了各種常見的機器學習分類器，比較它們在位置指紋法中的定位效果。

匯入資料

# 匯入資料
import numpy as np
import scipy.io as scio
offline_data = scio.loadmat('offline_data_random.mat')
online_data = scio.loadmat('online_data.mat')
offline_location, offline_rss = offline_data['offline_location'], offline_data['offline_rss']
trace, rss = online_data['trace'][0:1000, :], online_data['rss'][0:1000, :]
del offline_data
del online_data

# 定位準確度定義
def accuracy(predictions, labels):
    return np.mean(np.sqrt(np.sum((predictions - labels)**2, 1)))

knn迴歸

# knn迴歸
from sklearn import neighbors
knn_reg = neighbors.KNeighborsRegressor(40, weights='uniform', metric='euclidean')
%time knn_reg.fit(offline_rss, offline_location)
%time predictions = knn_reg.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"

Wall time: 92 ms
Wall time: 182 ms
accuracy:  2.24421479398 m

Logistic regression （邏輯斯蒂迴歸）

# 邏輯斯蒂迴歸是用來分類的
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
from sklearn.linear_model import LogisticRegressionCV
clf_l2_LR_cv = LogisticRegressionCV(Cs=20, penalty='l2', tol=0.001)
predict_labels = clf_l2_LR.fit(offline_rss, labels).predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

accuracy:  3.08581348591 m

Support Vector Machine for Regression （支援向量機）

from sklearn import svm
clf_x = svm.SVR(C=1000, gamma=0.01)
clf_y = svm.SVR(C=1000, gamma=0.01)
%time clf_x.fit(offline_rss, offline_location[:, 0])
%time clf_y.fit(offline_rss, offline_location[:, 1])
%time x = clf_x.predict(rss)
%time y = clf_y.predict(rss)
predictions = np.column_stack((x, y))
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"

Wall time: 9min 27s
Wall time: 12min 42s
Wall time: 1.06 s
Wall time: 1.05 s
accuracy:  2.2468400825 m

Support Vector Machine for Classification （支援向量機）

from sklearn import svm
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
clf_svc = svm.SVC(C=1000, tol=0.01, gamma=0.001)
%time clf_svc.fit(offline_rss, labels)
%time predict_labels = clf_svc.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

Wall time: 1min 16s
Wall time: 15 s
accuracy:  2.50931890608 m

random forest regressor （隨機森林）

from sklearn.ensemble import RandomForestRegressor
estimator = RandomForestRegressor(n_estimators=150)
%time estimator.fit(offline_rss, offline_location)
%time predictions = estimator.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

Wall time: 58.6 s
Wall time: 196 ms
accuracy:  2.20778352008 m

random forest classifier （隨機森林）

from sklearn.ensemble import RandomForestClassifier
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
estimator = RandomForestClassifier(n_estimators=20, max_features=None, max_depth=20) # 記憶體受限，tree的數量有點少
%time estimator.fit(offline_rss, labels)
%time predict_labels = estimator.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

Wall time: 39.6 s
Wall time: 113 ms
accuracy:  2.56860790666 m

Linear Regression （線性迴歸）

from sklearn.linear_model import LinearRegression
predictions = LinearRegression().fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

accuracy:  3.83239841667 m

Ridge Regression （嶺迴歸）

from sklearn.linear_model import RidgeCV
clf = RidgeCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

accuracy:  3.83255676918 m

Lasso迴歸

from sklearn.linear_model import MultiTaskLassoCV
clf = MultiTaskLassoCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

accuracy:  3.83244688001 m

Elastic Net （彈性網迴歸）

from sklearn.linear_model import MultiTaskElasticNetCV
clf = MultiTaskElasticNetCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'

accuracy:  3.832486036 m

Bayesian Ridge Regression （貝葉斯嶺迴歸）

from sklearn.linear_model import BayesianRidge
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(BayesianRidge())
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"

accuracy:  3.83243319129 m

Gradient Boosting for regression （梯度提升）

from sklearn import ensemble
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(ensemble.GradientBoostingRegressor(n_estimators=100, max_depth=10))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"

Wall time: 43.4 s
Wall time: 17 ms
accuracy:  2.22100945095 m

Multi-layer Perceptron regressor （神經網路多層感知器）

from sklearn.neural_network import MLPRegressor
clf = MLPRegressor(hidden_layer_sizes=(100, 100))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"

Wall time: 1min 1s
Wall time: 6 ms
accuracy:  2.4517504109 m

總結

上面的幾個線性迴歸模型顯然效果太差，這裡彙總一下其他的一些迴歸模型：

演算法	定位精度
knn	2.24m
logistic regression	3.09m
support vector machine	2.25m
random forest	2.21m
Gradient Boosting for regression	2.22m
Multi-layer Perceptron regressor	2.45m

從大致的定位精度上看，KNN、SVM、RF、GBDT這四個模型比較好（上面很多演算法並沒有仔細地調引數，這個結果也比較粗略，神經網路完全不知道怎麼去調...）。此外要注意的是，SVM訓練速度慢，調參太麻煩，KNN進行預測時的時間複雜度應該是和訓練資料量成正比的，從定位的實時性上應該不如RF和GBDT。

作者：rubbninja
出處：http://www.cnblogs.com/rubbninja/
關於作者：目前主要研究領域為機器學習與無線定位技術，歡迎討論與指正！
版權宣告：本文版權歸作者和部落格園共有，轉載請註明出處。

室內定位系列（四）——位置指紋法的實現（測試各種機器學習分類器）

匯入資料

knn迴歸

Logistic regression （邏輯斯蒂迴歸）

Support Vector Machine for Regression （支援向量機）

Support Vector Machine for Classification （支援向量機）

random forest regressor （隨機森林）

random forest classifier （隨機森林）

Linear Regression （線性迴歸）

Ridge Regression （嶺迴歸）

Lasso迴歸

Elastic Net （彈性網迴歸）

Bayesian Ridge Regression （貝葉斯嶺迴歸）

Gradient Boosting for regression （梯度提升）

Multi-layer Perceptron regressor （神經網路多層感知器）

總結

室內定位系列（四）——位置指紋法的實現（測試各種機器學習分類器）

室內定位系列（三）——位置指紋法的實現（KNN）

室內定位系列（一）——WiFi位置指紋（譯）

室內定位系列（六）——目標跟蹤（粒子濾波）

室內定位系列（二）——模擬獲取RSS資料

室內定位系列（五）——目標跟蹤（卡爾曼濾波）

Andrew Ng機器學習筆記+Weka相關算法實現（四）SVM和原始對偶問題

充值系列——充值系統業務邏輯層實現（三）

【POJ - 2533】Longest Ordered Subsequence（四種方法解決最長上升子序列含二分優化版本）

機器學習系列之K-近鄰演算法（監督學習-分類問題）

四皇后問題的程式碼實現（java）

快速排序（Java隨機位置快排實現）

機器學習系列之樸素貝葉斯演算法（監督學習-分類問題）

Regularized least-squares classification（正則化最小二乘法分類器）取代SVM

HDU 2586 How far away ？（LCA在線算法實現）

斯坦福大學公開課機器學習：Neural Networks，representation: non-linear hypotheses（為什麽需要做非線性分類器）

系統環境變量（就是不需要切換目錄，敲擊“python”就可以進入編碼器）

Android項目實戰（十六）：QQ空間實現（一）—— 展示說說中的評論內容並有相應點擊事件

SPOJ ADAFIELD Ada and Field（STL的使用：set，multiset，map的叠代器）題解

Algorand算法實現（一）

室內定位系列（四）——位置指紋法的實現（測試各種機器學習分類器）

匯入資料

knn迴歸

Logistic regression （邏輯斯蒂迴歸）

Support Vector Machine for Regression （支援向量機）

Support Vector Machine for Classification （支援向量機）

random forest regressor （隨機森林）

random forest classifier （隨機森林）

Linear Regression （線性迴歸）

Ridge Regression （嶺迴歸）

Lasso迴歸

Elastic Net （彈性網迴歸）

Bayesian Ridge Regression （貝葉斯嶺迴歸）

Gradient Boosting for regression （梯度提升）

Multi-layer Perceptron regressor （神經網路多層感知器）

總結

相關推薦