1. 程式人生 > >對 caffe 中Xavier, msra 權值初始化方式的解釋

If you work through the Caffe MNIST tutorial, you’ll come across this curious line

weight_filler { type: "xavier" }

and the accompanying explanation

For the weight filler, we will use the xavier algorithm that automatically determines the scale of initialization based on the number of input and output neurons.

Unfortunately, as of the time this post was written, Google hasn’t heard much about “the xavier algorithm”. To work out what it is, you need to poke around the Caffe source until you find the right docstring and then read the referenced paper, Xavier Glorot & Yoshua Bengio’s Understanding the difficulty of training deep feedforward neural networks


Why’s Xavier initialization important?

In short, it helps signals reach deep into the network.

  • If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
  • If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.

Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.

To go any further than this, you’re going to need a small amount of statistics - specifically you need to know about random distributions and their variance.

Okay, hit me with it. What’s Xavier initialization?

In Caffe, it’s initializing the weights in your network by drawing them from a distribution with zero mean and a specific variance,


where WW is the initialization distribution for the neuron in question, and ninnin is the number of neurons feeding into it. The distribution used is typically Gaussian or uniform.

It’s worth mentioning that Glorot & Bengio’s paper originally recommended using


where noutnout is the number of neurons the result is fed to. We’ll come to why Caffe’s scheme might be different in a bit.

And where did those formulas come from?

Suppose we have an input XX with nn components and a linear neuron with random weights WW that spits out a number YY. What’s the variance of YY? Well, we can write


And from Wikipedia we can work out that WiXiWiXi is going to have variance



caffe Xavier msra 初始方式解釋

首先介紹一下Xavier等初始化方法比直接用高斯分佈進行初始化W的優勢所在:  一般的神經網路在前向傳播時神經元輸出值的方差會不斷增大,而使用Xavier等方法理論上可以保證每層神經元輸入輸出方差一致。  這裡先介紹一個方差相乘的公式,以便理解Xavier: Xavie


基礎知識 首先介紹一下Xavier等初始化方法比直接用高斯分佈進行初始化W的優勢所在: 一般的神經網路在前向傳播時神經元輸出值的方差會不斷增大,而使用Xavier等方法理論上可以保證每層神經元輸入輸出方差一致。 這裡先介紹一個方差相乘的公式,以便理解Xav

初始 - XavierMSRA方法

設計好神經網路結構以及loss function 後,訓練神經網路的步驟如下: 初始化權值引數 選擇一個合適的梯度下降演算法(例如:Adam,RMSprop等) 重複下面的迭代過程: 輸入的正向傳播 計算loss function 的值 反向傳播,計算loss function 相對於權值引數的梯度值 根


網絡 mac tro 推導 6.4 linear diff ati soft from:http://blog.csdn.net/u013989576/article/details/76215989 權值初始化的方法主要有:常量初始化(constant)、高斯分布初始化(



感知器 初始

>>> import numpy as np >>> a = np.zeros(3) >>> b = np.random.random(3) >>> a array([0., 0., 0.]) >>> b ar


前饋神經網路(Feedforward Neural Networks, FNNs)在眾多學習問題,例如特徵選擇、函式逼近、以及多標籤學習中有著不錯的應用。 針對訓練前饋網路的學習演算法,目前已經有不少研究者提出了新穎的研究結果,但是其它相關問題的研究卻不多,例


        在上篇文章深層神經網路的搭建中,我們提到關於超引數權值的初始化至關重要。今天我們就來談談其重要性以及如何選擇恰當的數值來初始化這一引數。1. 權值初始化的意義     一個好的權值初始值,有以下優點:加快梯度下降的收斂速度增加梯度下降到最小訓練誤差的機率2.


