神經網路例程-使用(3-1)結構的神經網路實現與、或、異或三種邏輯運算
以下程式碼來自Deep Learning for Computer Vision with Python第十章。
本例程需要在同一檔案內新建四個檔案。分別是1、perceptron.py;2、perceptron_or.py;3、perceptron_and.py;4、perceptron_xor.py。
1、perceptron.py
# import the necessary packages import numpy as np class Perceptron: def __init__(self, N, alpha=0.1): # initialize the weight matrix and store the learning rate self.W = np.random.randn(N + 1) / np.sqrt(N) self.alpha = alpha def step(self, x): # apply the step function return 1 if x > 0 else 0 def fit(self, X, y, epochs=10): # insert a column of 1's as the last entry in the feature # matrix -- this little trick allows us to treat the bias # as a trainable parameter within the weight matrix X = np.c_[X, np.ones((X.shape[0]))] # loop over the desired number of epochs for epoch in np.arange(0, epochs): # loop over each individual data point for (x, target) in zip(X, y): # take the dot product between the input features # and the weight matrix, then pass this value # through the step function to obtain the prediction p = self.step(np.dot(x, self.W)) #print("[training] self.W={}, x={}, target={}".format(self.W, x, target)) # only perform a weight update if our prediction # does not match the target if p != target: # determine the error error = p - target # update the weight matrix self.W += -self.alpha * error * x def predict(self, X, addBias=True): # ensure our input is a matrix X = np.atleast_2d(X) # check to see if the bias column should be added if addBias: # insert a column of 1's as the last entry in the feature # matrix (bias) X = np.c_[X, np.ones((X.shape[0]))] # take the dot product between the input features and the # weight matrix, then pass the value through the step # function return self.step(np.dot(X, self.W))
分析:Perception類是一個(3-1)結構的神經網路,(3-1)代表有輸入層有3個神經元(其中兩個神經元用於處理輸入引數x1和x2,另外一個神經元輸入固定為1),輸出層有1個神經元。示意圖見下圖。
神經網路權重檔案除了權重(w1和w2),還加上了偏置(b)。本身輸入引數只有兩個,對應的權值是w1和w2,輸入層神經元為3的目的是把偏置b也加入權重矩陣中。當訓練權重矩陣時,偏置b也得以更新。輸出引數的表示式是y=step(x1*w1+x2*w2+b)。
神經元的啟用函式是Step函式。Step函式包含一個輸入引數和一個輸出引數。當輸入小於0,則輸出為0;當輸入大於0,則輸出1。
fit函式用於訓練,使用的是隨機梯度下降法。predict函式作用是測試樣品,把目標樣品經過本神經網路,獲得預測結果。
2、perceptron_or.py
# import the necessary packages from perceptron import Perceptron import numpy as np # construct the OR dataset X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) y = np.array([[0], [1], [1], [1]]) # define our perceptron and train it print("[INFO] training perceptron...") p = Perceptron(X.shape[1], alpha=0.1) p.fit(X, y, epochs=20) # now that our perceptron is trained we can evaluate it print("[INFO] testing perceptron...") # now that our network is trained, loop over the data points for (x, target) in zip(X, y): # make a prediction on the data point and display the result # to our console pred = p.predict(x) print("[INFO] data={}, ground-truth={}, pred={}".format( x, target[0], pred))
在或運算中,兩個輸入引數和一個輸出引數的關係見下表。
輸入引數1:x1 | 輸入引數2:x2 | 輸出引數:y |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Perceptron函式用於新建一個2層神經網路。第一個輸入引數X.shape[1]是X中每個樣品的引數個數。alpha是梯度下降、更新權值的速度。越接近1,速度越快,但是越大越容易錯過區域性最大值。
fit函式第一個引數是樣品的輸入引數矩陣,第二個引數是樣品的輸入引數的輸出矩陣(真實值),第三個引數是迭代次數。data是樣品的輸入引數,ground-truth表示真實值,pred是預測結果。
用python執行perceptron_or.py,可得到以下結果:
============= RESTART: E:\FENG\workspace_python\perceptron_or.py =============
[INFO] training perceptron...
[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=0
[INFO] data=[0 1], ground-truth=1, pred=1
[INFO] data=[1 0], ground-truth=1, pred=1
[INFO] data=[1 1], ground-truth=1, pred=1
3、perceptron_and.py
# import the necessary packages
from perceptron import Perceptron
import numpy as np
# construct the OR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [0], [0], [1]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)
# now that our perceptron is trained we can evaluate it
print("[INFO] testing perceptron...")
# now that our network is trained, loop over the data points
for (x, target) in zip(X, y):
# make a prediction on the data point and display the result
# to our console
pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))
這個檔案和上面那份檔案差別不大,因此不分析了。
4、perceptron_xor.py
# import the necessary packages
from perceptron import Perceptron
import numpy as np
# construct the OR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)
# now that our perceptron is trained we can evaluate it
print("[INFO] testing perceptron...")
# now that our network is trained, loop over the data points
for (x, target) in zip(X, y):
# make a prediction on the data point and display the result
# to our console
pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))
結果如下:
============ RESTART: E:\FENG\workspace_python\perceptron_xor.py ============
[INFO] training perceptron...
[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=1
[INFO] data=[0 1], ground-truth=1, pred=1
[INFO] data=[1 0], ground-truth=1, pred=0
[INFO] data=[1 1], ground-truth=0, pred=0
可見,異或的預測結果並不準確。主要因為,只具有2層神經元、而不具備隱含層的神經網路並無法非線性的對樣品分類。
上圖說明的是,與和或樣本的空間分佈,因為可以用一條直線把輸出0和輸出1的樣本分類,因此比較簡單。經過實踐發現,使用2層神經網路的分類器也能實現預期效果。但是,異或的樣本邏輯比較複雜。
為了正確分類異或樣本,必須改變神經網路結構,進一步增加隱含層,嘗試重新訓練以及測試。