【Machine Learning 】線性迴歸
阿新 • • 發佈:2018-12-14
線性迴歸
- 我們可以通過測量損耗來衡量線路的適合程度。
- 線性迴歸的目標是最小化損失。
- 為了找到最佳擬合線,我們嘗試找到最小化損失的
b
值(截距)和m
值(斜率)。 - 收斂是指引數在每次迭代時停止變化時的引數
- 學習率是指每次迭代時引數的變化程度。
- 我們可以使用Scikit-learn的
LinearRegression()
模型對一組點進行線性迴歸。
Scikit-Learn庫
line_fitter = LinearRegression() 建立模型
line_fitter.fit(temperature, sales) 傳入引數
sales_predict = line_fitter.predict(temperature) 預測模型
import codecademylib3_seaborn from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt import numpy as np temperature = np.array(range(60, 100, 2)) temperature = temperature.reshape(-1, 1) sales = [65, 58, 46, 45, 44, 42, 40, 40, 36, 38, 38, 28, 30, 22, 27, 25, 25, 20, 15, 5] line_fitter = LinearRegression() line_fitter.fit(temperature, sales) sales_predict = line_fitter.predict(temperature) plt.plot(temperature, sales, 'o') plt.plot(temperature,sales_predict) plt.show()
原理
預測直線 直線上會有loss
import codecademylib3_seaborn import matplotlib.pyplot as plt months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184] #slope: m = 12 #intercept: b = 35 plt.plot(months, revenue, "o") y = [m*month + b for month in months] plt.plot(months,y) plt.show()
LOSS
計算loss時 要使用平方距離 如下圖 A的loss是 9(3^2) B的loss是1(1^2)
總loss=10 如果發現一條線路使loss小於10 那麼這條線路會成為更好的線路
for i in range(len(y)):
total_loss+=(y_predicted[i]-y[i])**2
減少loss
grandient descent 梯度下降
找到一點斜率向下的方向說明可以減少損失,所以應該漸變向下
公式
N
is the number of points we have in our datasetm
is the current gradient guess 斜率b
is the current intercept guess 截距
找到當截距=b時的梯度的函式
def get_gradient_at_b(x,y,m,b):
diff=0
for i in range(len(x)):
diff+=(y[i]-(m*x[i]+b))
b_gradient=diff*(-2)/len(x)
return b_gradient
公式
N
is the number of points we have in our datasetm
is the current gradient guess 斜率b
is the current intercept guess 截距
找到當斜率=m時的梯度的函式
def get_gradient_at_m(x, y, m, b):
diff = 0
N = len(x)
for i in len(x):
diff += x[i]*(y[i]-(m*x[i]+b))
m_gradient = -2/N * diff
return m_gradient
得到合適的梯度
def get_gradient_at_b(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += (y_val - ((m * x_val) + b))
b_gradient = -(2/N) * diff
return b_gradient
def get_gradient_at_m(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += x_val * (y_val - ((m * x_val) + b))
m_gradient = -(2/N) * diff
return m_gradient
#Your step_gradient function here
def step_gradient(x, y, b_current, m_current):
b_gradient = get_gradient_at_b(x, y, b_current, m_current)
m_gradient = get_gradient_at_m(x, y, b_current, m_current)
b = b_current - (0.01 * b_gradient)
m = m_current - (0.01 * m_gradient)
return [b, m]
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]
# current intercept guess:
b = 0
# current slope guess:
m = 0
b, m = step_gradient(months, revenue, b, m)
print(b, m)