最小二乘法的多元線性迴歸
阿新 • • 發佈:2019-02-12
方法介紹
具體問題程式碼實現
資料
程式碼
本文使用scipy的leastsq函式實現,程式碼如下。
from scipy.optimize import leastsq
import numpy as np
def main():
# data provided
x = np.array([[1, 50, 5, 200], [1, 50, 5, 400], [1, 50, 5, 600], [1, 50, 5, 800], [1, 50, 5, 1000],
[1, 50, 10, 200], [1, 50, 10, 400], [1, 50, 10, 600], [1 , 50, 10, 800], [1, 50, 10, 1000],
[1, 60, 5, 200], [1, 60, 5, 400], [1, 60, 5, 600], [1, 60, 5, 800], [1, 60, 5, 1000],
[1, 60, 10, 200], [1, 60, 10, 400], [1, 60, 10, 600], [1, 60, 10, 800], [1, 60, 10, 1000],
[1, 70, 5, 200], [1, 70, 5, 400], [1, 70, 5, 600], [1, 70, 5, 800], [1, 70, 5, 1000],
[1 , 70, 10, 200], [1, 70, 10, 400]])
y = np.array([7.434, 3.011, 1.437, 0.6728, 0.00036,
5.518, 2.556, 1.341, 0.6824, 0.0001,
18.22, 7.344, 4.066, 1.799, 1.218,
16.11, 9.448, 4.752, 2.245, 1.539,
18.14, 12.88, 7.29, 3.449, 2.533,
15.76, 16.24])
# here, create lambda functions for Line fit
# tpl is a tuple that contains the parameters of the fit
funcLine=lambda tpl,x: np.dot(x, tpl)
# func is going to be a placeholder for funcLine,funcQuad or whatever
# function we would like to fit
func = funcLine
# ErrorFunc is the diference between the func and the y "experimental" data
ErrorFunc = lambda tpl, x, y: func(tpl, x)-y
#tplInitial contains the "first guess" of the parameters
tplInitial=[1.0, 1.0, 1.0, 1.0]
# leastsq finds the set of parameters in the tuple tpl that minimizes
# ErrorFunc=yfit-yExperimental
tplFinal, success = leastsq(ErrorFunc, tplInitial, args=(x, y))
print('linear fit', tplFinal)
print(funcLine(tplFinal, x))
if __name__ == "__main__":
main()
實驗結果及分析
實驗結果
# tplFinal值
[-8.43371266 0.3787503 0.11744081 -0.01485372]
# y預測值
[ 8.12026253 5.1495184 2.17877428 -0.79196984 -3.76271396
8.70746659 5.73672247 2.76597835 -0.20476577 -3.17550989
11.90776557 8.93702145 5.96627733 2.99553321 0.02478909
12.49496964 9.52422552 6.5534814 3.58273728 0.61199315
15.69526862 12.7245245 9.75378038 6.78303626 3.81229214
16.28247269 13.31172857]
分析總結
a) 從結果可以看出使用線性模型擬合的效果並不是特別好,可進一步嘗試使用二次曲線等較複雜模型。
b) 擬合直線應首先自己觀察一下給定資料x、y之間是否有什麼關係。比如上述所給資料明顯是一個基於控制變數的對照組實驗,先觀察一下其自變數(特徵)與因變數(目標)之間的關係,你會明顯發現自變數x3(200, 400, 600...)
與y值成負相關。這樣至少心裡有個底兒。
c) 感覺使用scipy的leastsq函式來做並不是那麼方便,下次可嘗試使用sklearn包。