1. 程式人生 > >keras 模型用於預測時的注意事項

keras 模型用於預測時的注意事項

為什麼訓練誤差比測試誤差高很多?

一個Keras的模型有兩個模式:訓練模式測試模式一些正則機制,如Dropout,L1/L2正則項在測試模式下將不被啟用

另外,訓練誤差是訓練資料每個batch的誤差的平均。在訓練過程中,每個epoch起始時的batch的誤差要大一些,而後面的batch的誤差要小一些。另一方面,每個epoch結束時計算的測試誤差是由模型在epoch結束時的狀態決定的,這時候的網路將產生較小的誤差。

【Tips】可以通過定義回撥函式將每個epoch的訓練誤差和測試誤差並作圖,如果訓練誤差曲線和測試誤差曲線之間有很大的空隙,說明你的模型可能有過擬合的問題。當然,這個問題與Keras無關。

在keras中文文件中指出了這一誤區,筆者認為產生這一問題的原因在於網路實現的機制。即dropout層有前向實現和反向實現兩種方式,這就決定了概率p是在訓練時候設定還是測試的時候進行設定

利用預訓練的權值進行Fine tune時的注意事項:

不能把自己新增的層進行將隨機初始化後直接連線到前面預訓練後的網路層

  • in order to perform fine-tuning, all layers should start with properly trained weights: for instance you should not slap a randomly initialized fully-connected network on top of a pre-trained convolutional base
    . This is because the large gradient updates triggered by the randomly initialized weights would wreck the learned weights in the convolutional base. In our case this is why we first train the top-level classifier, and only then start fine-tuning convolutional weights alongside it.
  • we choose to only fine-tune the last convolutional block rather than the entire network in order to prevent overfitting, since the entire network would have a very large entropic capacity and thus a strong tendency to overfit. The features learned by low-level convolutional blocks are more general, less abstract than those found higher-up, so it is sensible to keep the first few blocks fixed (more general features) and only fine-tune the last one (more specialized features).
  • fine-tuning should be done with a very slow learning rate, and typically with the SGD optimizer rather than an adaptative learning rate optimizer such as RMSProp. This is to make sure that the magnitude of the updates stays very small, so as not to wreck the previously learned features.