從頭推導與實現 BP 網絡
從頭推導與實現 BP 網絡
目標
學習 \(y = 2x\)
方法
模型
單隱層、單節點的 BP 神經網絡
策略
Mean Square Error 均方誤差
\[
MSE = \frac{1}{2}(\hat{y} - y)^2
\]
模型的目標是 \(\min \frac{1}{2} (\hat{y} - y)^2\)
算法
樸素梯度下降。在每個 epoch 內,使模型對所有的訓練數據都誤差最小化。
網絡結構
Forward Propagation Derivation
\[ E = \frac{1}{2}(\hat{Y}-Y)^2 \\hat{Y} = \beta \\beta = W b \b = sigmoid(\alpha) \\alpha = V x \]
Back Propagation Derivation
模型的可學習參數為 \(w,v?\) ,更新的策略遵循感知機模型:
參數 w 的更新算法
\[
w \leftarrow w + \Delta w \\Delta w = - \eta \frac{\partial E}{\partial w} \\frac{\partial E}{\partial w} = \frac{\partial E}{\partial \hat{Y}} \frac{\partial \hat{Y}}{\partial \beta} \frac{\partial \beta}{\partial w} \ = (\hat{Y} - Y) \cdot 1 \cdot b
\]
參數 v 的更新算法
\[
v \leftarrow v + \Delta v \\Delta v = -\eta \frac{\partial E}{\partial v} \\frac{\partial E}{\partial v} = \frac{\partial E}{\partial \hat{Y}} \frac{\partial \hat{Y}}{\partial \beta} \frac{\partial \beta}{\partial b}
\frac{\partial \beta}{\partial \alpha} \frac{\partial \alpha}{\partial v} \= (\hat{Y} - Y) \cdot 1 \cdot w \cdot \frac{\partial \beta}{\partial \alpha} \cdot x \\frac{\partial \beta}{\partial \alpha} = sigmoid(\alpha) [ 1 - sigmoid(\alpha) ] \sigmoid(\alpha) = \frac{1}{1+e^{-\alpha}}
\]
代碼實現
C++ 實現
#include <iostream>
#include <cmath>
using namespace std;
class Network {
public :
Network(float eta) :eta(eta) {}
float predict(int x) { // forward propagation
this->alpha = this->v * x;
this->b = this->sigmoid(alpha);
this->beta = this->w * this->b;
float prediction = this->beta;
return prediction;
}
void step(int x, float prediction, float label) {
this->w = this->w
- this->eta
* (prediction - label)
* this->b;
this->alpha = this->v * x;
this->v = this->v
- this->eta
* (prediction - label)
* this->w
* this->sigmoid(this->alpha) * (1 - this->sigmoid(this->alpha))
* x;
}
private:
float sigmoid(float x) {return (float)1 / (1 + exp(-x));}
float v = 1, w = 1, alpha = 1, beta = 1, b = 1, prediction, eta;
};
int main() { // Going to learn the linear relationship y = 2*x
float loss, pred;
Network model(0.01);
cout << "x is " << 3 << " prediction is " << model.predict(3) << " label is " << 2*3 << endl;
for (int epoch = 0; epoch < 500; epoch++) {
loss = 0;
for (int i = 0; i < 10; i++) {
pred = model.predict(i);
loss += pow((pred - 2*i), 2) / 2;
model.step(i, pred, 2*i);
}
loss /= 10;
cout << "Epoch: " << epoch << " Loss:" << loss << endl;
}
cout << "x is " << 3 << " prediction is " << model.predict(3) << " label is " << 2*3 << endl;
return 0;
}
C++ 運行結果
初始網絡權重,對數據 x=3, y=6的 預測結果為 \(\hat{y} = 0.952534\) 。
訓練了 500 個 epoch 以後,平均損失下降至 7.82519,對數據 x=3, y=6的 預測結果為 \(\hat{y} = 11.242\) 。
PyTorch 實現
# encoding:utf8
# 極簡的神經網絡,單隱層、單節點、單輸入、單輸出
import torch as t
import torch.nn as nn
import torch.optim as optim
class Model(nn.Module):
def __init__(self, in_dim, out_dim):
super(Model, self).__init__()
self.hidden_layer = nn.Linear(in_dim, out_dim)
def forward(self, x):
out = self.hidden_layer(x)
out = t.sigmoid(out)
return out
if __name__ == '__main__':
X, Y = [[i] for i in range(10)], [2*i for i in range(10)]
X, Y = t.Tensor(X), t.Tensor(Y)
model = Model(1, 1)
optimizer = optim.SGD(model.parameters(), lr=0.01)
criticism = nn.MSELoss(reduction='mean')
y_pred = model.forward(t.Tensor([[3]]))
print(y_pred.data)
for i in range(500):
optimizer.zero_grad()
y_pred = model.forward(X)
loss = criticism(y_pred, Y)
loss.backward()
optimizer.step()
print(loss.data)
y_pred = model.forward(t.Tensor([[3]]))
print(y_pred.data)
PyTorch 運行結果
初始網絡權重,對數據 x=3, y=6的 預測結果為 $\hat{y} =0.5164 $ 。
訓練了 500 個 epoch 以後,平均損失下降至 98.8590,對數據 x=3, y=6的 預測結果為 \(\hat{y} = 0.8651\) 。
結論
居然手工編程的實現其學習效果比 PyTorch 的實現更好,真是奇怪!但是我估計差距就產生於學習算法的不同,PyTorch采用的是 SGD。
從頭推導與實現 BP 網絡