1. 程式人生 > >神經網路引數初始化問題程式碼測試

神經網路引數初始化問題程式碼測試

背景:

神經網路的引數初始化,一般是採用隨機初始化的方式。如果是初始化為全0,會導致每層的多個神經元退化為一個,即在每層中的多個神經元是完全失效的。雖然層與層之間仍然是有效的,但是每層一個神經元的多層神經網路,你真的覺得有意思?有什麼想法,歡迎留言。

程式碼測試:

2層神經網路的全0初始化

# -*- coding: utf-8 -*-
__author__ = 'jasonliu'
#探究神經網路初始化值的影響
#初始化為0
#初始化為相同值,但是不為0

import numpy as np

def nonlin(x,deriv=False):
    if(deriv==True
): return x*(1-x) return 1/(1+np.exp(-x)) X = np.array([[0.5,0.9,1], [2,1,1], [0.3,0.6,1], [1.5,0.9,0.6]]) #此時X是在行方式疊其樣本數 Y = np.array([[1], [3], [2], [0]]) #此時Y是在行方向疊其樣本數 np.random.seed(1) # randomly initialize our weights with mean 0
# syn0 = 2*np.random.random((3,4)) - 1 # syn1 = 2*np.random.random((4,1)) - 1 W1 = 2*np.zeros((3,4))# + 1 W2 = 2*np.zeros((4,1))# + 1 for j in range(60000): # Feed forward through layers 0, 1, and 2 A0 = X Z1 = np.dot(A0, W1) A1 = nonlin(Z1) Z2 = np.dot(A1, W2) A2 = nonlin(Z2) # how much did we miss the target value?
dZ_2 = Y - A2#Loss if (j% 10000) == 0: print("Error:" + str(np.mean(np.abs(dZ_2)))) # in what direction is the target value? # were we really sure? if so, don't change too much. l2_delta = dZ_2*nonlin(A2, deriv=True)#dZ_1 # how much did each l1 value contribute to the l2 error (according to the weights)? l1_error = l2_delta.dot(W2.T) # in what direction is the target l1? # were we really sure? if so, don't change too much. l1_delta = l1_error * nonlin(A1, deriv=True) W2 += A1.T.dot(l2_delta) W1 += A0.T.dot(l1_delta) print("Output After Training:") print("W1=", W1) print("W2=", W2) #從結果可以看出,W1在列方向是重複的。 #注意行和列方向的維度資訊,也注意樣本是在行方向的排列還是列方向

輸出結果:

Error:1.25
Error:1.0000091298568936
Error:1.0000044798865095
Error:1.000002957418707
Error:1.0000022037278755
Error:1.0000017545861548
Output After Training:
W1= [[0.58078498 0.58078498 0.58078498 0.58078498]
 [0.72845083 0.72845083 0.72845083 0.72845083]
 [1.33742659 1.33742659 1.33742659 1.33742659]]
W2= [[3.52357914]
 [3.52357914]
 [3.52357914]
 [3.52357914]]

可以看出,出現了重複,W1在列方向是重複的,即該層的每個神經元的權重是相同的。

2層神經網路的全2初始化

輸出結果如下:

Error:1.0001879134151608
Error:1.0000064142342748
Error:1.0000032676762678
Error:1.0000021930282932
Error:1.0000016505669969
Error:1.0000013233782656
Output After Training:
W1= [[2.0085157  2.0085157  2.0085157  2.0085157 ]
 [2.02205683 2.02205683 2.02205683 2.02205683]
 [2.03953857 2.03953857 2.03953857 2.03953857]]
W2= [[3.30069379]
 [3.30069379]
 [3.30069379]
 [3.30069379]]

結果是類似的,在列方向的神經元都是一樣的。這種對稱性依然存在。

隨機初始化

W1 = 2*np.random.random((3,4)) - 1
W2 = 2*np.random.random((4,1)) - 1

輸出結果:

W1= [[ 0.08581783  1.08039398 -1.16536044  0.27396062]
 [-0.48584844  0.29602972 -0.86136823  0.54469744]
 [ 0.24509319  2.23500284 -0.5412316   2.23673393]]
W2= [[1.23731123]
 [6.40888963]
 [0.09966753]
 [5.78541642]]