對 caffe 中Xavier， msra 權值初始化方式的解釋

If you work through the Caffe MNIST tutorial, you’ll come across this curious line

weight_filler { type: "xavier" }

and the accompanying explanation

For the weight filler, we will use the xavier algorithm that automatically determines the scale of initialization based on the number of input and output neurons.

Unfortunately, as of the time this post was written, Google hasn’t heard much about “the xavier algorithm”. To work out what it is, you need to poke around the Caffe source until you find the right docstring and then read the referenced paper, Xavier Glorot & Yoshua Bengio’s Understanding the difficulty of training deep feedforward neural networks

Why’s Xavier initialization important?

In short, it helps signals reach deep into the network.

If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.

Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.

To go any further than this, you’re going to need a small amount of statistics - specifically you need to know about random distributions and their variance.

Okay, hit me with it. What’s Xavier initialization?

In Caffe, it’s initializing the weights in your network by drawing them from a distribution with zero mean and a specific variance,

Var(W)=1ninVar(W)=1nin

where W is the initialization distribution for the neuron in question, and nin is the number of neurons feeding into it. The distribution used is typically Gaussian or uniform.

It’s worth mentioning that Glorot & Bengio’s paper originally recommended using

Var(W)=2nin+noutVar(W)=2nin+nout

where nout is the number of neurons the result is fed to. We’ll come to why Caffe’s scheme might be different in a bit.

And where did those formulas come from?

Suppose we have an input X with n components and a linear neuron with random weights W that spits out a number Y. What’s the variance of Y? Well, we can write

Y=W1X1+W2X2+⋯+WnXnY=W1X1+W2X2+⋯+WnXn

And from Wikipedia we can work out that WiXi is going to have variance

Var(

對 caffe 中Xavier， msra 權值初始化方式的解釋

Why’s Xavier initialization important?

Okay, hit me with it. What’s Xavier initialization?

And where did those formulas come from?

對 caffe 中Xavier， msra 權值初始化方式的解釋

權值初始化方法之Xavier與MSRA

卷積神經網路（三）：權值初始化方法之Xavier與MSRA

權值初始化 - Xavier和MSRA方法

神經網絡中權值初始化的方法

深度學習剖根問底:權值初始化xavier

感知器權值初始化

前饋神經網路的權值初始化方法

深層神經網路的權值初始化問題

使用vue中的axios後，對例項中的data進行賦值的問題

對給定的一組權值構造相應的哈夫曼樹，計算權值

關於js對象中的，屬性的增刪改查問題

判斷一個可滾動元素是否滾動到了底部，將源對象合並到目標對象中去，判斷是否為字符串

python實現對caffe的訓練，初始權重訓練和繼續訓練

深度學習框架Caffe-權值視覺化[重啟]

caffe權值視覺化,特徵視覺化,網路模型視覺化

HNUM1370: 巍巍嶽麓解題報告---（所有生成樹情況中最大邊權值的最小值）

java對HashMap中的key或者value值進行排序！

hiho1576 子樹中的最小權值【dfs序】

[c#.net]遍歷一個對象中所有的屬性和值

對 caffe 中Xavier， msra 權值初始化方式的解釋

Why’s Xavier initialization important?

Okay, hit me with it. What’s Xavier initialization?

And where did those formulas come from?

相關推薦