1. 程式人生 > >臺大李巨集毅

臺大李巨集毅

1:Regression-Case Study

為什麼在Loss function中,只考慮對w的正則化,而不考慮對b的正則化?

因為b是一條水平線,b對Loss function是否平滑幾乎不產生影響。

 

1-Regression Demo

Ada-Gradient時會詳細講解這個技巧:小的learning rate導致要很多次迭代才能達到最優解,大的learning rate有可能會有巨幅震盪,也無法達到最優解。有一個調參的技巧,就是對w和b剋制化的learning rate。

lr = 1

....................................

lr_b = 0

lr_w = 0

....................................

lr_b = lr_b + b_grad ** 2

lr_w = lr_w + w_grad ** 2

.................................

# update parameters.

b = b - lr/np.sqrt(lr_b)* b_grad

w = w- lr/np.sqrt(lr_w)* w_grad

 

2:Where does the error come from?
 

error due to “bias” and error due to “variance”。

簡單的model(model set比較小,這個小的model set可能根本不包含真實的target model),bias大,variance小;

複雜的model(model set比較大,這個大的model set可能就包含真實的target model),bias小,variance大。

 

如果error來自於variance很大,那麼就是overfitting;

如果error來自於bias很大,那麼就是underfitting;

 


What to do with large bias?

1、Diagnosis:

(1) If your model cannot even fit the training examples, then you have large bias.----> Underfitting.

(2) If you can fit the training data, but large error on testing data, then you probably have large variance. ----> Overfitting.

2、For bias, redesign your model:

(1) Add more features as input;

(2) A more complex model

 

What to do with large variance?

1、 More data(very effective, but not always practical)可以自己做訓練資料,例如翻轉、加噪聲等。

2、 Regularization (希望引數變化較小,曲線變平滑),但是可能會使你的model set 不包含target model,可能會傷害bias。