從頭推導與實現 BP 網絡

阿新 • • 發佈：2019-03-18

super 運行 for arrow print work res step namespace

從頭推導與實現 BP 網絡

目標

學習 $y = 2x$

方法

模型

單隱層、單節點的 BP 神經網絡

策略

Mean Square Error 均方誤差
\[ MSE = \frac{1}{2}(\hat{y} - y)^2 \]

模型的目標是 $\min \frac{1}{2} (\hat{y} - y)^2$

算法

樸素梯度下降。在每個 epoch 內，使模型對所有的訓練數據都誤差最小化。

網絡結構

技術分享圖片

Forward Propagation Derivation

\[ E = \frac{1}{2}(\hat{Y}-Y)^2 \\hat{Y} = \beta \\beta = W b \b = sigmoid(\alpha) \\alpha = V x \]

Back Propagation Derivation

模型的可學習參數為 $w,v?$ ，更新的策略遵循感知機模型：

參數 w 的更新算法
\[ w \leftarrow w + \Delta w \\Delta w = - \eta \frac{\partial E}{\partial w} \\frac{\partial E}{\partial w} = \frac{\partial E}{\partial \hat{Y}} \frac{\partial \hat{Y}}{\partial \beta} \frac{\partial \beta}{\partial w} \ = (\hat{Y} - Y) \cdot 1 \cdot b \]

參數 v 的更新算法
\[ v \leftarrow v + \Delta v \\Delta v = -\eta \frac{\partial E}{\partial v} \\frac{\partial E}{\partial v} = \frac{\partial E}{\partial \hat{Y}} \frac{\partial \hat{Y}}{\partial \beta} \frac{\partial \beta}{\partial b} \frac{\partial \beta}{\partial \alpha} \frac{\partial \alpha}{\partial v} \= (\hat{Y} - Y) \cdot 1 \cdot w \cdot \frac{\partial \beta}{\partial \alpha} \cdot x \\frac{\partial \beta}{\partial \alpha} = sigmoid(\alpha) [ 1 - sigmoid(\alpha) ] \sigmoid(\alpha) = \frac{1}{1+e^{-\alpha}} \]

代碼實現

C++ 實現

#include <iostream>
#include <cmath>

using namespace std;

class Network {
public :
    Network(float eta) :eta(eta) {}

    float predict(int x) {  // forward propagation
        this->alpha = this->v * x;
        this->b = this->sigmoid(alpha);
        this->beta = this->w * this->b;
        float prediction = this->beta;
        return prediction;
    }

    void step(int x, float prediction, float label) {
        this->w = this->w 
            - this->eta 
            * (prediction - label) 
            * this->b;
        this->alpha = this->v * x;
        this->v = this->v 
            - this->eta 
            * (prediction - label) 
            * this->w 
            * this->sigmoid(this->alpha) * (1 - this->sigmoid(this->alpha)) 
            * x;
    }
private:
    float sigmoid(float x) {return (float)1 / (1 + exp(-x));}
    float v = 1, w = 1, alpha = 1, beta = 1, b = 1, prediction, eta;
};

int main() {  // Going to learn the linear relationship y = 2*x
    float loss, pred;
    Network model(0.01);
    cout << "x is " << 3 << " prediction is " << model.predict(3) << " label is " << 2*3 << endl;
    for (int epoch = 0; epoch < 500; epoch++) {
        loss = 0;
        for (int i = 0; i < 10; i++) {
            pred = model.predict(i);
            loss += pow((pred - 2*i), 2) / 2;
            model.step(i, pred, 2*i);
        }
        loss /= 10;
        cout << "Epoch: " << epoch << "  Loss:" << loss << endl;
    }
    cout << "x is " << 3 << " prediction is " << model.predict(3) << " label is " << 2*3 << endl;
    return 0;
}

C++ 運行結果

初始網絡權重，對數據 x=3, y=6的預測結果為 $\hat{y} = 0.952534$ 。

訓練了 500 個 epoch 以後，平均損失下降至 7.82519，對數據 x=3, y=6的預測結果為 $\hat{y} = 11.242$ 。

PyTorch 實現

# encoding:utf8
# 極簡的神經網絡，單隱層、單節點、單輸入、單輸出

import torch as t
import torch.nn as nn
import torch.optim as optim


class Model(nn.Module):
    def __init__(self, in_dim, out_dim):
        super(Model, self).__init__()
        self.hidden_layer = nn.Linear(in_dim, out_dim)

    def forward(self, x):
        out = self.hidden_layer(x)
        out = t.sigmoid(out)
        return out


if __name__ == '__main__':
    X, Y = [[i] for i in range(10)], [2*i for i in range(10)]
    X, Y = t.Tensor(X), t.Tensor(Y)
    model = Model(1, 1)
    optimizer = optim.SGD(model.parameters(), lr=0.01)
    criticism = nn.MSELoss(reduction='mean')
    y_pred = model.forward(t.Tensor([[3]]))
    print(y_pred.data)
    for i in range(500):
        optimizer.zero_grad()
        y_pred = model.forward(X)
        loss = criticism(y_pred, Y)
        loss.backward()
        optimizer.step()
        print(loss.data)
    y_pred = model.forward(t.Tensor([[3]]))
    print(y_pred.data)

PyTorch 運行結果

初始網絡權重，對數據 x=3, y=6的預測結果為 $\hat{y} =0.5164 $ 。

訓練了 500 個 epoch 以後，平均損失下降至 98.8590，對數據 x=3, y=6的預測結果為 $\hat{y} = 0.8651$ 。

結論

居然手工編程的實現其學習效果比 PyTorch 的實現更好，真是奇怪！但是我估計差距就產生於學習算法的不同，PyTorch采用的是 SGD。

從頭推導與實現 BP 網絡

super 運行 for arrow print work res step namespace 從頭推導與實現 BP 網絡目標學習 $y = 2x$ 方法模型單隱層、單節點的 BP 神經網絡策略 Mean Square Error 均方誤差 \[ MSE =

從頭推導與實現 BP 網絡

從頭推導與實現 BP 網絡

目標

方法

模型

策略

算法

網絡結構

Forward Propagation Derivation

Back Propagation Derivation

代碼實現

C++ 實現

C++ 運行結果

PyTorch 實現

PyTorch 運行結果

結論

從頭推導與實現 BP 網絡

docker+openvswitch實現主機與容器的網絡通信

Redis密碼設置與訪問限制(網絡安全)

[k8s]kube-router替代kube-proxy實現svc網絡和pod網絡

使用nmcli 實現 bond0 網絡組網橋三種模式

路由：實現不同網絡互通

單臂路由、擴展ACL與NAT的網絡地址轉換的配置實驗

華為eNSP交換機實現不同網絡之間的通信

Javascript實現BP神經網絡

僅主機模式下vmware虛擬機中win7如何使宿主機與寄生機網絡互聯互通

python netmiko實現cisco網絡設備配置備份

yum源的搭建與yum的網絡服務

Ubuntu中網絡配置interfaces與界面網絡配置NetworkManager

centos6實現雙網絡卡繫結

RNN求解過程推導與實現

vmware虛擬機器實現雙網絡卡固定ip

mac系統如何實現雙網絡卡同時訪問？

mac蘋果電腦實現USB網絡卡和有線同時訪問內網和網際網路

資料包接收系列 — 上半部實現（網絡卡驅動）

ubuntu raring 實現雙網絡卡雙路由

從頭推導與實現 BP 網絡

從頭推導與實現 BP 網絡

目標

方法

模型

策略

算法

網絡結構

Forward Propagation Derivation

Back Propagation Derivation

代碼實現

C++ 實現

C++ 運行結果

PyTorch 實現

PyTorch 運行結果

結論

相關推薦