1. 程式人生 > >最小二乘法的多元線性迴歸

最小二乘法的多元線性迴歸

方法介紹

具體問題程式碼實現

資料

這裡寫圖片描述

程式碼

本文使用scipy的leastsq函式實現,程式碼如下。

from scipy.optimize import leastsq
import numpy as np


def main():
    # data provided
    x = np.array([[1, 50, 5, 200], [1, 50, 5, 400], [1, 50, 5, 600], [1, 50, 5, 800], [1, 50, 5, 1000],
                 [1, 50, 10, 200], [1, 50, 10, 400], [1, 50, 10, 600], [1
, 50, 10, 800], [1, 50, 10, 1000], [1, 60, 5, 200], [1, 60, 5, 400], [1, 60, 5, 600], [1, 60, 5, 800], [1, 60, 5, 1000], [1, 60, 10, 200], [1, 60, 10, 400], [1, 60, 10, 600], [1, 60, 10, 800], [1, 60, 10, 1000], [1, 70, 5, 200], [1, 70, 5, 400], [1, 70, 5, 600], [1, 70, 5, 800], [1, 70, 5, 1000], [1
, 70, 10, 200], [1, 70, 10, 400]]) y = np.array([7.434, 3.011, 1.437, 0.6728, 0.00036, 5.518, 2.556, 1.341, 0.6824, 0.0001, 18.22, 7.344, 4.066, 1.799, 1.218, 16.11, 9.448, 4.752, 2.245, 1.539, 18.14, 12.88, 7.29, 3.449, 2.533, 15.76, 16.24]) # here, create lambda functions for Line fit
# tpl is a tuple that contains the parameters of the fit funcLine=lambda tpl,x: np.dot(x, tpl) # func is going to be a placeholder for funcLine,funcQuad or whatever # function we would like to fit func = funcLine # ErrorFunc is the diference between the func and the y "experimental" data ErrorFunc = lambda tpl, x, y: func(tpl, x)-y #tplInitial contains the "first guess" of the parameters tplInitial=[1.0, 1.0, 1.0, 1.0] # leastsq finds the set of parameters in the tuple tpl that minimizes # ErrorFunc=yfit-yExperimental tplFinal, success = leastsq(ErrorFunc, tplInitial, args=(x, y)) print('linear fit', tplFinal) print(funcLine(tplFinal, x)) if __name__ == "__main__": main()

實驗結果及分析

實驗結果

# tplFinal值
[-8.43371266  0.3787503   0.11744081 -0.01485372]
# y預測值
[  8.12026253   5.1495184    2.17877428  -0.79196984  -3.76271396
   8.70746659   5.73672247   2.76597835  -0.20476577  -3.17550989
  11.90776557   8.93702145   5.96627733   2.99553321   0.02478909
  12.49496964   9.52422552   6.5534814    3.58273728   0.61199315
  15.69526862  12.7245245    9.75378038   6.78303626   3.81229214
  16.28247269  13.31172857]

分析總結

a) 從結果可以看出使用線性模型擬合的效果並不是特別好,可進一步嘗試使用二次曲線等較複雜模型。
b) 擬合直線應首先自己觀察一下給定資料x、y之間是否有什麼關係。比如上述所給資料明顯是一個基於控制變數的對照組實驗,先觀察一下其自變數(特徵)與因變數(目標)之間的關係,你會明顯發現自變數x3(200, 400, 600...)與y值成負相關。這樣至少心裡有個底兒。
c) 感覺使用scipy的leastsq函式來做並不是那麼方便,下次可嘗試使用sklearn包。