西瓜書5.5 程式設計實現BP神經網路——標準BP演算法、累積BP演算法

阿新 • • 發佈：2019-01-08

這裡照著書上的公式，實現了一下標準BP演算法，和累積BP演算法，BP是error Back Propagation的意思，誤差逆傳播。BP網路通常是指用BP演算法訓練的多層前饋神經網路。程式碼是照著書本公式自己寫的，沒有參考網上的其他版本。

資料和程式碼地址：https://github.com/qdbszsj/BP

具體的理論證明和公式推導，見西瓜書P101-104。這裡重點說一下我的資料處理和一些程式碼細節，以及一些重點知識。

西瓜資料集3.0，裡面有離散屬性，也有連續屬性，除了密度、含糖量這樣的屬性，還有一些文字性描述的離散屬性，因此我們先把離散屬性轉化為數字表示的屬性。比如“色澤”這個屬性下有三種屬性：淺白、青綠、烏黑，我認為這三個屬性有遞進關係，類似於低中高，瘦均胖，因此就用一個值來表示他們，{0,0.5,1}這三個值表示這三個屬性。同理，其他屬性都用這種方法處理成0~1之間的小數。這裡的屬性都是有序的，沒有無序的屬性，如果有無序的屬性，那麼通常就要用一個K維的向量來表示，比如屬性“瓜類”下分為“西瓜”、“黃瓜”、“南瓜”，顯然這三個瓜是無序的，因此就用（0,0,1）、（0,1,0）、（0,0,1）這樣的值來表示他們，其實就相當於把資料集拓寬了幾列，列名由一個“瓜類”變為“是西瓜？”、“是黃瓜？”、“是南瓜？”，然後元素值是1和0。這裡跟NLP的詞向量處理方法有些接近，很多做NLP的詞向量都是這樣的。

然後是根據書P104的虛擬碼，先初始化了兩組權值（輸入->隱層、隱層->輸出）和兩組閾值（隱層、輸出），都是隨機的0-1的小數，這裡命名方式我都是按照書本上的變數名，P101都有。然後書上有一個公式沒寫出來，那就是b=f(alpha-gamma)，這個跟公式5.3道理是一樣的，自己的輸出=sigmoid（自己接受到的輸入-自己的閾值），這個“自己”可以是隱層或者是輸出層。

這裡標準BP和累積BP我都實現了一下，區別很小，標準BP就是對於每一個輸入的X個體，都更新一下網路，而累積BP就是把整個X集合都跑一遍，把各種要變化的值累加起來，再更新，累積BP類似於隨機梯度下降法，每跑一遍整個集合，更新一次。

隱層神經元的個數：這裡我是用的d+1個，就是比輸入結點的個數多一個，這個個數目前沒有定論，通常是靠試錯法來決定，幾個結點表現好，就用幾個。

針對過擬合：通常有兩種策略，一種就是“早停”，一邊訓練一邊用測試集測試，如果發現訓練集誤差降低，而測試集誤差升高，那麼就停止訓練。還有一種策略是“正則化”，根據書上公式5.17，誤差評估時引入一個概念：網路的複雜程度。我們認為權值越小，網路約簡單平滑，不容易過擬合，因此統計誤差和網路複雜度都在誤差評估的時候佔一個百分比，which是一個可以調整的引數。

這裡發現numpy真好用，各種矩陣相乘、相加相減想乘，都一行程式碼搞定

import pandas as pd
import numpy as np
dataset = pd.read_csv('/home/parker/watermelonData/watermelon_3.csv', delimiter=",")

#according to P54--3.2
#process the dataset
attributeMap={}
attributeMap['淺白']=0
attributeMap['青綠']=0.5
attributeMap['烏黑']=1
attributeMap['蜷縮']=0
attributeMap['稍蜷']=0.5
attributeMap['硬挺']=1
attributeMap['沉悶']=0
attributeMap['濁響']=0.5
attributeMap['清脆']=1
attributeMap['模糊']=0
attributeMap['稍糊']=0.5
attributeMap['清晰']=1
attributeMap['凹陷']=0
attributeMap['稍凹']=0.5
attributeMap['平坦']=1
attributeMap['硬滑']=0
attributeMap['軟粘']=1
attributeMap['否']=0
attributeMap['是']=1
del dataset['編號']
dataset=np.array(dataset)
m,n=np.shape(dataset)
for i in range(m):
    for j in range(n):
        if dataset[i,j] in attributeMap:
            dataset[i,j]=attributeMap[dataset[i,j]]
        dataset[i,j]=round(dataset[i,j],3)

trueY=dataset[:,n-1]
X=dataset[:,:n-1]
m,n=np.shape(X)

#according to P101, init the parameters
import random
d=n   #the dimension of the input vector
l=1   #the dimension of the  output vector
q=d+1   #the number of the hide nodes
theta=[random.random() for i in range(l)]   #the threshold of the output nodes
gamma=[random.random() for i in range(q)]   #the threshold of the hide nodes
# v size= d*q .the connection weight between input and hide nodes
v=[[random.random() for i in range(q)] for j in range(d)]
# w size= q*l .the connection weight between hide and output nodes
w=[[random.random() for i in range(l)] for j in range(q)]
eta=0.2    #the training speed
maxIter=5000 #the max training times

import math
def sigmoid(iX,dimension):#iX is a matrix with a dimension
    if dimension==1:
        for i in range(len(iX)):
            iX[i] = 1 / (1 + math.exp(-iX[i]))
    else:
        for i in range(len(iX)):
            iX[i] = sigmoid(iX[i],dimension-1)
    return iX


# do the repeat----standard BP
while(maxIter>0):
    maxIter-=1
    sumE=0
    for i in range(m):
        alpha=np.dot(X[i],v)#p101 line 2 from bottom, shape=1*q
        b=sigmoid(alpha-gamma,1)#b=f(alpha-gamma), shape=1*q
        beta=np.dot(b,w)#shape=(1*q)*(q*l)=1*l
        predictY=sigmoid(beta-theta,1)   #shape=1*l ,p102--5.3
        E = sum((predictY-trueY[i])*(predictY-trueY[i]))/2    #5.4
        sumE+=E#5.16
        #p104
        g=predictY*(1-predictY)*(trueY[i]-predictY)#shape=1*l p103--5.10
        e=b*(1-b)*((np.dot(w,g.T)).T) #shape=1*q , p104--5.15
        w+=eta*np.dot(b.reshape((q,1)),g.reshape((1,l)))#5.11
        theta-=eta*g#5.12
        v+=eta*np.dot(X[i].reshape((d,1)),e.reshape((1,q)))#5.13
        gamma-=eta*e#5.14
    # print(sumE)

# #accumulated BP
# trueY=trueY.reshape((m,l))
# while(maxIter>0):
#     maxIter-=1
#     sumE=0
#     alpha = np.dot(X, v)#p101 line 2 from bottom, shape=m*q
#     b = sigmoid(alpha - gamma,2)  # b=f(alpha-gamma), shape=m*q
#     beta = np.dot(b, w)  # shape=(m*q)*(q*l)=m*l
#     predictY = sigmoid(beta - theta,2)  # shape=m*l ,p102--5.3
#
#     E = sum(sum((predictY - trueY) * (predictY - trueY))) / 2  # 5.4
#     # print(round(E,5))
#     g = predictY * (1 - predictY) * (trueY - predictY)  # shape=m*l p103--5.10
#     e = b * (1 - b) * ((np.dot(w, g.T)).T)  # shape=m*q , p104--5.15
#     w += eta * np.dot(b.T, g)  # 5.11 shape (q*l)=(q*m) * (m*l)
#     theta -= eta * g  # 5.12
#     v += eta * np.dot(X.T, e)  # 5.13 (d,q)=(d,m)*(m,q)
#     gamma -= eta * e  # 5.14


def predict(iX):
    alpha = np.dot(iX, v)  # p101 line 2 from bottom, shape=m*q
    b=sigmoid(alpha-gamma,2)#b=f(alpha-gamma), shape=m*q
    beta = np.dot(b, w)  # shape=(m*q)*(q*l)=m*l
    predictY=sigmoid(beta - theta,2)  # shape=m*l ,p102--5.3
    return predictY

print(predict(X))

西瓜書5.5 程式設計實現BP神經網路——標準BP演算法、累積BP演算法

西瓜書5.5 程式設計實現BP神經網路——標準BP演算法、累積BP演算法

python實現《機器學習》西瓜書習題5.6自適應學習率的BP改進演算法

python實現《機器學習》西瓜書習題5.7RBF網路解決異或問題

【西瓜書第5章】用例項理解神經網路前向傳播和反向傳播

MATLAB神經網路程式設計（七）——BP神經網路的實現

使用python實現深度神經網路 5

BP神經網路基於Tensorflow的實現（程式碼註釋詳細）

神經網路學習（3）————BP神經網路以及python實現

單隱層BP神經網路C++實現

【人工智慧】NCC S1 5.6Tops高算力神經網路計算卡

Tensorflow實現BP神經網路

keras實現網路流量分類功能的BP神經網路

python的神經網路實現之BP神經網路

機器學習之BP神經網路演算法實現影象分類

[純C#實現]基於BP神經網路的中文手寫識別演算法

BP神經網路實現分類問題（python）

BP神經網路與MATLAB實現案例一

今天開始學模式識別與機器學習(PRML)，章節5.1，Neural Networks神經網路-前向網路。

130行程式碼實現BP神經網路原理及應用舉例

簡單BP神經網路的python實現

西瓜書5.5 程式設計實現BP神經網路——標準BP演算法、累積BP演算法

相關推薦