1. 程式人生 > >【西瓜書學習筆記】第3章:線性模型

【西瓜書學習筆記】第3章:線性模型

課程前言:

arg max的引數是函式最大化的某個函式的域的點,與全域性最大值相比引數函式的最大輸出,arg max指的是函式輸出儘可能大的輸入或引數

閉式解:

給出任意自變數,就可以求出因變數

最小二乘法:

通過最小化誤差的平方和尋找資料的最佳函式匹配

\sum_{j=1}^{n}x_ij \beta_j=y_i,(i=1,2,3,...,m)

x \beta =y

S(\beta)=||x\beta-y||^{2}=(x\beta-y)^{T}(x\beta-y)

S(\beta)微分得,x^{T}x\hat{\beta}=x^{T}y,

如果x^{T}x非奇異,則\beta有唯一解。

\hat \beta=(x^{T}x)^{-1}x^{T}y

式(3.7)

\left( w^{*},b^{*}\right) = argmin \sum_{i=1}^{m}\left ( f(x_{i})-y_{i} \right )^{2}=argmin \sum_{i=1}^{m}\left ( y_{i}-wx_{i}-b \right )^{2}

分別對w,b求偏導,得:

\frac{\partial E(x,b)}{\partial w}=2(w\sum_{i=1}^{m}x_{i}^{2}-\sum_{i=1}^{m}(y_{i}-b)x_{i}),

\frac{\partial E(w,b)}{\partial b}=2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i}))

令偏導數為0,得:

w\sum_{i=1}^{m}x_{i}^{2}=\sum_{i=1}^{m}(y_{i}-b)x_{i},(1)

mb=\sum_{i=1}^{m}(y_{i}-wx_{i}),(2)

把(2)帶入(1)得:

w\sum_{i=1}^{m}x_{i}^{2}=\sum_{i=1}^{m}(y_{i}- \frac{1}{m} \sum_{j=1}^{m}(y_j-wx_j))x_{i}

               =\sum_{i=1}^{m}y_{i}x_{i}- \frac{1}{m}\sum_{i=1}^{m} \sum_{j=1}^{m}(y_j-wx_j)x_{i}

               =\sum_{i=1}^{m}y_{i}x_{i}- \frac{1}{m}\sum_{i=1}^{m} x_{i} \sum_{j=1}^{m}(y_j-wx_j)

               =\sum_{i=1}^{m}y_{i}x_{i}- \bar{x} \sum_{j=1}^{m}(y_j-wx_j)

               =\sum_{i=1}^{m}y_{i}x_{i}- \bar{x} \sum_{j=1}^{m}y_j+ \bar{x}\sum_{j=1}^{m}wx_j

               =\sum_{i=1}^{m}y_{i}(x_{i}- \bar{x} )+w \cdot ( \frac{1}{m}\sum_{i=1}^{m}x_i)\cdot \sum_{i=1}^{m}x_i

               =\sum_{i=1}^{m}y_{i}(x_{i}- \bar{x} )+w \cdot ( \frac{1}{m}\sum_{i=1}^{m}x_i)^{2}

w=\frac{=\sum_{i=1}^{m}y_{i}(x_{i}- \bar{x} )}{\sum_{i=1}^{m}x_i^{2}-\frac{1}{m}(\sum_{i=1}{m}x_i)^2}

式(3.12)

f(x)=x_i^Tw

由最小二乘法易得,若x^{T}x非奇異,w有唯一解。

w=(x^{T}x)^{-1}x^Ty,f(x)=x_i^T(x^{T}x)^{-1}x^T y

然而現實任務中,x^{T}x往往不是滿秩矩陣

式(3.27)

p(y_i|x_i;w,b)=\left\{\begin{matrix} P_1 \left( \hat x_i; \beta \right),y_i=1 & & \\ P_0 \left( \hat x_i; \beta \right),y_i=0 \end{matrix}\right.

合併可得式(3.26)

\imath (\beta)=\sum_{i=1}^{m} lnP(y_i|x_i;w,b)=\sum_{i=1}^{m}ln(y_iP_1 \left( \hat x_i; \beta \right)+(1-y_i)P_0 \left( \hat x_i; \beta \right) )

                                              =\sum_{i=1}^{m}ln(y_i \frac{e^{w^{T}x_{i}+b}}{1+e^{w^{T}x_i+b}}+(1-y_i)\frac{1}{1+e^{w^{T}x_i+b}})

                                              =\sum_{i=1}^{m}(ln(y_i e^{\beta^T \hat x_i}+(1-y_i))-ln(1+e^{\beta^T \hat x_i})

y_i = 0時,

\imath (\beta)=\sum_{i=1}^{m}-ln(1+e^{\beta^T \hat x_i})

y_i=1時,

\imath (\beta)==\sum_{i=1}^{m}(y_i e^{\beta^T \hat x_i}-ln(1+e^{\beta^T \hat x_i}))

合併得:

\imath (\beta)==\sum_{i=1}^{m}(y_i e^{\beta^T \hat x_i}-ln(1+e^{\beta^T \hat x_i}))

式(3.32)

將資料投影到直線上,則兩類樣本的中心在直線上的投影分別為w^{T}u_0,w^{T}u_1,其中u_0為樣本1的均值,u_1為樣本2的均值。

\sum_{x \in D_i}( w^{T}x-w^{T}u_i)^2=\sum_{x \in D_i}( w^{T}(x-u_i))^2=\sum_{x \in D_i}w^{T}(x-u_i)^{T}(x-u_i)w

                                 =w^{T}\sum_{x \in D_i}[(x-u_i)^{T}(x-u_i)]w

其中(x-u_i)^{T}(x-u_i)\sum_{i}投影前的協方差矩陣,故樣本協方差為w^T\sum_i w

式(3.35)

J=\frac{w^T S_b w}{w^T S_w w}=\frac{w^T(u_0-u_1)(u_0-u_1)^Tw}{w^T(\sum_{0}+\sum_{1})w}

   =\frac{w^T(u_0-u_1)(u_0-u_1)^T w}{w^T (\sum_{x \in x_0}(x-u_0)(x-u_0)^T+\sum_{x \in x_1}(x-u_1)(x-u_1)^T)w}

式(3.37)

\underset{w}{min}-w^{T}S_{b}w,s.t. w^{T}S_{w}w=1,由拉格朗日公式易得,

w^{T}S_{b}w+\lambda \left ( 1-w^{T}S_{w}w \right ) =0,

ww^{T}S_{b}w=w\lambda \left ( w^{T}S_{w}w-1 \right )

ww^{T}為常數,若w為一個解,則aw 也為一個常數,忽略常數項,得,

S_{b}w=\lambda S_{w}w

S_{b}w=\lambda (u_{0}-u_{1})(u_{0}-u_{1})^{T}w,由於(u_{0}-u_{1})^{T}w為標量,故,S_{b}w=\lambda (u_{0}-u_{1}),帶入(3.37)得,w=S_{w}^{-1}(u_{0}-u_{1})

式(3.48)

由於決策是基於\frac{y}{1-y}>1,由於存在類別不平衡問題,故我們只要分類器的預測機率高於觀測機率就判定為正例,即\frac{y}{1-y}>\frac{m^{+}}{m^{-}},於是可得,\frac{y}{1-y} * \frac{m^{-}}{m^{+}}>\frac{m^{+}}{m^{-}}* \frac{m^{-}}{m^{+}}=1,因此\frac{y'}{1-y'}=\frac{y}{1-y} * \frac{m^{-}}{m^{+}}