CAFFE原始碼學習筆記之初始化Filler

阿新 • • 發佈：2019-02-03

一、前言
為什麼CNN中的初始化那麼重要呢？

我想總結的話就是因為他更深一點，相比淺層學習，比如logistics或者SVM,最終問題都轉換成了凸優化，函式優化的目標唯一，所以引數初始化隨便設定為0都不影響，因為跟著梯度走，總歸是會走向最小值的附近的。

但是CNN不一樣：
1、多層神經網路加上各種非線性變換的啟用函式，最終的目標函式是個非凸函式，也就是有多個區域性最小值。

2、如果使用sigmod類的啟用函式，會因為深層累積導致梯度彌散等問題；使用relu等啟用函式，又因為對資料壓縮不充分造成資料隨著層數增加，資料間的方差過大或者過小。

filler.hpp提供了7種權值初始化的方法，分別為：常量初始化（constant）、均勻分佈初始化（uniform）、高斯分佈初始化（gaussian）、positive_unitball初始化、xavier初始化、msra初始化、雙線性初始化（bilinear）。

二、常量初始化

常量初始化主要是初始化偏置的。

1、引數

  optional string type = 1 [default = 'constant'];
  optional float value = 2 [default = 0]; //

2、原始碼

/// 把權值或著偏置初始化為一個常數，預設為0
template <typename Dtype>
class ConstantFiller : public Filler<Dtype> {
 public:
  explicit ConstantFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual 
 void Fill(Blob<Dtype>* blob) {
    Dtype* data = blob->mutable_cpu_data();
    const int count = blob->count();//每個點
    const Dtype value = this->filler_param_.value();
    CHECK(count);
    for (int i = 0; i < count; ++i) {
      data[i] = value;
    }
    CHECK_EQ(this->filler_param_.sparse(), -1 
)
         << "Sparsity not supported by this Filler.";
  }
};

三、均勻分佈初始化（uniform）

符合均勻分佈U（a,b）的隨機變數數學期望和方差分別是——數學期望：E(X)=(a+b)/2，方差：D(X)=(b-a)²/12

假設x服從(−1d√,1d√)

Var(wi)=(2d√)2/12=13d

Var(∑di=1wixi)=d∗Var(wi)=13

最終，x服從均值=0，方差=1/3的正態分佈。

1、引數

  optional float min = 3 [default = 0]; // the min value in uniform filler
  optional float max = 4 [default = 1]; // the max value in uniform filler

2、原始碼

template <typename Dtype>
class UniformFiller : public Filler<Dtype> {
 public:
  explicit UniformFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual void Fill(Blob<Dtype>* blob) {
    CHECK(blob->count());
    caffe_rng_uniform<Dtype>(blob->count(), Dtype(this->filler_param_.min()),
        Dtype(this->filler_param_.max()), blob->mutable_cpu_data());
    CHECK_EQ(this->filler_param_.sparse(), -1)
         << "Sparsity not supported by this Filler.";
  }
};

其中關鍵則是caffe_rng_uniform函式

template <>
void caffe_gpu_rng_uniform<float>(const int n, const float a, const float b,float* r) {
  CURAND_CHECK(curandGenerateUniform(Caffe::curand_generator(), r, n));
  const float range = b - a;
  if (range != static_cast<float>(1)) {
    caffe_gpu_scal(n, range, r);
  }
  if (a != static_cast<float>(0)) {
    caffe_gpu_add_scalar(n, a, r);//r[index] += a;
  }
}

四、高斯分佈初始化

template <typename Dtype>
class GaussianFiller : public Filler<Dtype> {
 public:
  explicit GaussianFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual void Fill(Blob<Dtype>* blob) {
    Dtype* data = blob->mutable_cpu_data();
    CHECK(blob->count());
    caffe_rng_gaussian<Dtype>(blob->count(), Dtype(this->filler_param_.mean()),//均值
        Dtype(this->filler_param_.std()), blob->mutable_cpu_data());//方差
    int sparse = this->filler_param_.sparse();
    CHECK_GE(sparse, -1);
    if (sparse >= 0) {//gaussina初始化可以進行稀疏
      // 稀疏化是針對weight的

      CHECK_GE(blob->num_axes(), 1);
      const int num_outputs = blob->shape(0);//
      Dtype non_zero_probability = Dtype(sparse) / Dtype(num_outputs);//非零概率
      rand_vec_.reset(new SyncedMemory(blob->count() * sizeof(int)));
      int* mask = reinterpret_cast<int*>(rand_vec_->mutable_cpu_data());
      caffe_rng_bernoulli(blob->count(), non_zero_probability, mask);//稀疏矩陣mask
      for (int i = 0; i < blob->count(); ++i) {
        data[i] *= mask[i];
      }
    }
  }

 protected:
  shared_ptr<SyncedMemory> rand_vec_;
};

五、單元球初始化

讓每一個單元的輸入的權值的和為 1，如果一個神經元輸入為n個，先對這n個權值賦值為在（0，1）之間的均勻分佈，然後每一個權值再除以它們的和。

為了防止梯度權值不斷增加，使得sigmod函式過早進入飽和區。

template <typename Dtype>
class PositiveUnitballFiller : public Filler<Dtype> {
 public:
  explicit PositiveUnitballFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual void Fill(Blob<Dtype>* blob) {
    Dtype* data = blob->mutable_cpu_data();
    DCHECK(blob->count());
    caffe_rng_uniform<Dtype>(blob->count(), 0, 1, blob->mutable_cpu_data());//先給輸入賦予均勻分佈
    int dim = blob->count() / blob->num();
    CHECK(dim);
    for (int i = 0; i < blob->num(); ++i) {
      Dtype sum = 0;
      for (int j = 0; j < dim; ++j) {
        sum += data[i * dim + j];//將權值累加
      }
      for (int j = 0; j < dim; ++j) {
        data[i * dim + j] /= sum;//除以和，相當於歸一化
      }
    }
    CHECK_EQ(this->filler_param_.sparse(), -1)
         << "Sparsity not supported by this Filler.";
  }
};

六、Xavier初始化

如果輸入維度為n，輸入維度為m，則對權值以 $這裡寫圖片描述$ 的均勻分佈進行初始化。

假設輸入和引數的分佈為均值為0，方差分別為δi，δw。

因為 zi=∑niwij∗xj

所以zi服從均值為0，方差為n∗δi∗δw的分佈

簡單講就是：
δ2zi=n∗δ1i∗δ1w

為簡化，考慮非線性變換的線性部分，所以最終的方差是前面所有層方差的累積。如果每個方差都大於1，最終方差將會溢位，如果每個方差都小於1，最終資料之間差異變小，梯度下降變緩。

為了使得輸入和輸出之間的方差相等

令n∗δ1w=1，

前向計算考慮輸入個數，反向計算則考慮輸出個數，同時考慮則由於輸入輸出的個數往往不相等，所以最終的結果就是：

方差最終為：2ni+ni+1

如果實現均勻分佈，方差：(a−b)212

解得：[−6ni+ni+1−−−−−−√,6ni+ni+1−−−−−−√]

template <typename Dtype>
class XavierFiller : public Filler<Dtype> {
 public:
  explicit XavierFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual void Fill(Blob<Dtype>* blob) {
    CHECK(blob->count());
    int fan_in = blob->count() / blob->num();
    int fan_out = blob->count() / blob->channels();
    Dtype n = fan_in;  // 預設考慮輸入個數
    if (this->filler_param_.variance_norm() ==
        FillerParameter_VarianceNorm_AVERAGE) {
      n = (fan_in + fan_out) / Dtype(2);//方差同時考慮輸入和輸出個數
    } else if (this->filler_param_.variance_norm() ==
        FillerParameter_VarianceNorm_FAN_OUT) {
      n = fan_out;//方差只考慮輸出個數
    }
    Dtype scale = sqrt(Dtype(3) / n);
    caffe_rng_uniform<Dtype>(blob->count(), -scale, scale,
        blob->mutable_cpu_data());
    CHECK_EQ(this->filler_param_.sparse(), -1)
         << "Sparsity not supported by this Filler.";
  }
};

通過以上分析，其實該方法考慮的更多的是啟用函式的線性部分，如果是sigmod，勉強可以；但是如果是ReLu的話，就不是很合適了，這是一點微小的思考。。。。。

七、MSRA初始化

只考慮輸入時，引數初始化為一個均值為0，方差為2n的高斯分佈
其他情況與Xavier類似。

template <typename Dtype>
class MSRAFiller : public Filler<Dtype> {
 public:
  explicit MSRAFiller(const FillerParameter& param)
      : Filler<Dtype>(param) {}
  virtual void Fill(Blob<Dtype>* blob) {
    CHECK(blob->count());
    int fan_in = blob->count() / blob->num();
    int fan_out = blob->count() / blob->channels();
    Dtype n = fan_in;  // default to fan_in
    if (this->filler_param_.variance_norm() ==
        FillerParameter_VarianceNorm_AVERAGE) {
      n = (fan_in + fan_out) / Dtype(2);
    } else if (this->filler_param_.variance_norm() ==
        FillerParameter_VarianceNorm_FAN_OUT) {
      n = fan_out;
    }
    Dtype std = sqrt(Dtype(2) / n);
    caffe_rng_gaussian<Dtype>(blob->count(), Dtype(0), std,
        blob->mutable_cpu_data());
    CHECK_EQ(this->filler_param_.sparse(), -1)
         << "Sparsity not supported by this Filler.";
  }
};

CAFFE原始碼學習筆記之初始化Filler

CAFFE原始碼學習筆記之初始化Filler

Framework7學習筆記之初始化App

CAFFE原始碼學習筆記之十-data_layer

Dubbo原始碼學習筆記(配置初始化)

Tensorflow學習筆記之池化

前端基礎學習筆記樣式初始化元素型別

python學習筆記之視覺化點雲

imx6平臺V4L2程式設計學習記錄之初始化（二）

Bootstrap原始碼分析系列之初始化和依賴項

xv6學習筆記核心初始化

CTF菜鳥學習筆記之初始CTF

vue 原始碼學習二例項初始化和掛載過程

vue 原始碼學習(二) 例項初始化和掛載過程

跟我學Spring3 學習筆記七初始化與銷燬

CloudFoundry原始碼學習筆記之warden (一)

cougaar學習筆記之---序列化Asset到MySQL中

Direct11 學習筆記（初始化Direct3D）

4、Spring原始碼分析4之初始化Bean

Spring原始碼學習筆記之bean標籤屬性介紹及作用

Spring原始碼學習筆記之基於ClassPathXmlApplicationContext進行bean標籤解析

CAFFE原始碼學習筆記之初始化Filler

相關推薦