室内定位系列(四)——位置指纹法的实现(测试各种机器学习分类器)
位置指纹法中最常用的算法是k最近邻(kNN)。本文的目的学习一下python机器学习scikit-learn的使用,尝试了各种常见的机器学习分类器,比较它们在位置指纹法中的定位效果。
导入数据
数据来源说明:http://www.cnblogs.com/rubbninja/p/6118430.html
# 导入数据
import numpy as np
import scipy.io as scio
offline_data = scio.loadmat('offline_data_random.mat')
online_data = scio.loadmat('online_data.mat')
offline_location, offline_rss = offline_data['offline_location'], offline_data['offline_rss']
trace, rss = online_data['trace'][0:1000, :], online_data['rss'][0:1000, :]
del offline_data
del online_data
# 定位准确度定义
def accuracy(predictions, labels):
return np.mean(np.sqrt(np.sum((predictions - labels)**2, 1)))
knn回归
# knn回归
from sklearn import neighbors
knn_reg = neighbors.KNeighborsRegressor(40, weights='uniform', metric='euclidean')
%time knn_reg.fit(offline_rss, offline_location)
%time predictions = knn_reg.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 92 ms
Wall time: 182 ms
accuracy: 2.24421479398 m
Logistic regression (逻辑斯蒂回归)
# 逻辑斯蒂回归是用来分类的
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
from sklearn.linear_model import LogisticRegressionCV
clf_l2_LR_cv = LogisticRegressionCV(Cs=20, penalty='l2', tol=0.001)
predict_labels = clf_l2_LR.fit(offline_rss, labels).predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy: 3.08581348591 m
Support Vector Machine for Regression (支持向量机)
from sklearn import svm
clf_x = svm.SVR(C=1000, gamma=0.01)
clf_y = svm.SVR(C=1000, gamma=0.01)
%time clf_x.fit(offline_rss, offline_location[:, 0])
%time clf_y.fit(offline_rss, offline_location[:, 1])
%time x = clf_x.predict(rss)
%time y = clf_y.predict(rss)
predictions = np.column_stack((x, y))
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 9min 27s
Wall time: 12min 42s
Wall time: 1.06 s
Wall time: 1.05 s
accuracy: 2.2468400825 m
Support Vector Machine for Classification (支持向量机)
from sklearn import svm
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
clf_svc = svm.SVC(C=1000, tol=0.01, gamma=0.001)
%time clf_svc.fit(offline_rss, labels)
%time predict_labels = clf_svc.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 1min 16s
Wall time: 15 s
accuracy: 2.50931890608 m
random forest regressor (随机森林)
from sklearn.ensemble import RandomForestRegressor
estimator = RandomForestRegressor(n_estimators=150)
%time estimator.fit(offline_rss, offline_location)
%time predictions = estimator.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 58.6 s
Wall time: 196 ms
accuracy: 2.20778352008 m
random forest classifier (随机森林)
from sklearn.ensemble import RandomForestClassifier
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
estimator = RandomForestClassifier(n_estimators=20, max_features=None, max_depth=20) # 内存受限,tree的数量有点少
%time estimator.fit(offline_rss, labels)
%time predict_labels = estimator.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 39.6 s
Wall time: 113 ms
accuracy: 2.56860790666 m
Linear Regression (线性回归)
from sklearn.linear_model import LinearRegression
predictions = LinearRegression().fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy: 3.83239841667 m
Ridge Regression (岭回归)
from sklearn.linear_model import RidgeCV
clf = RidgeCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy: 3.83255676918 m
Lasso回归
from sklearn.linear_model import MultiTaskLassoCV
clf = MultiTaskLassoCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy: 3.83244688001 m
Elastic Net (弹性网回归)
from sklearn.linear_model import MultiTaskElasticNetCV
clf = MultiTaskElasticNetCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy: 3.832486036 m
Bayesian Ridge Regression (贝叶斯岭回归)
from sklearn.linear_model import BayesianRidge
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(BayesianRidge())
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
accuracy: 3.83243319129 m
Gradient Boosting for regression (梯度提升)
from sklearn import ensemble
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(ensemble.GradientBoostingRegressor(n_estimators=100, max_depth=10))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 43.4 s
Wall time: 17 ms
accuracy: 2.22100945095 m
Multi-layer Perceptron regressor (神经网络多层感知器)
from sklearn.neural_network import MLPRegressor
clf = MLPRegressor(hidden_layer_sizes=(100, 100))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 1min 1s
Wall time: 6 ms
accuracy: 2.4517504109 m
总结
上面的几个线性回归模型显然效果太差,这里汇总一下其他的一些回归模型:
算法 | 定位精度 |
---|---|
knn | 2.24m |
logistic regression | 3.09m |
support vector machine | 2.25m |
random forest | 2.21m |
Gradient Boosting for regression | 2.22m |
Multi-layer Perceptron regressor | 2.45m |
从大致的定位精度上看,KNN、SVM、RF、GBDT这四个模型比较好(上面很多算法并没有仔细地调参数,这个结果也比较粗略,神经网络完全不知道怎么去调...)。此外要注意的是,SVM训练速度慢,调参太麻烦,KNN进行预测时的时间复杂度应该是和训练数据量成正比的,从定位的实时性上应该不如RF和GBDT。
作者:[rubbninja](http://www.cnblogs.com/rubbninja/) 出处:[http://www.cnblogs.com/rubbninja/](http://www.cnblogs.com/rubbninja/) 关于作者:目前主要研究领域为机器学习与无线定位技术,欢迎讨论与指正! 版权声明:本文版权归作者和博客园共有,转载请注明出处。