1. 程式人生 > >深度學習基礎(二)—— 從多層感知機(MLP)到卷積神經網路(CNN)

深度學習基礎(二)—— 從多層感知機(MLP)到卷積神經網路(CNN)

經典的多層感知機(Multi-Layer Perceptron)形式上是全連線(fully-connected)的鄰接網路(adjacent network)。

That is, every neuron in the network is connected to every neuron in adjacent layers.


這裡寫圖片描述

Local receptive fields

全連線的多層感知機中,輸入視為(或者需轉化為)一個列向量。而在卷積神經網路中,以手寫字元識別為例,輸入不再 reshape 為 (28*28, 1) 的列向量,而是作為 28×28 的畫素灰度矩陣。


這裡寫圖片描述

That region in the input image is called the local receptive field for the hidden neuron. It’s a little window on the input pixels. Each connection learns a weight(one single weight,也即整個 5×5 的 region 共享同一個 weight). And the hidden neuron learns an overall bias as well.

Shared weights and biases


這裡寫圖片描述

注意對應關係,是左上角對應左上角,後四列是利用不到的?
28×28(5×5)24×24
jk 列的隱層神經元的輸入為:
b+=04m=04w,maj+,k+m

該隱層神經元的輸出為:

σ(b+=04m=04w,maj+,k+m)

The network structure I’ve described so far can detect just a single kind of localized feature. To do image recognition we’ll need more than one feature map(特徵對映). And so a complete convolutional layer consists of several different feature maps:


這裡寫圖片描述

feature map:filter,kernel

In the example shown, there are 3 feature maps. Each feature map is defined by a set of 5×5 shared weights, and a single shared bias. The result is that the network can detect 3 different kinds of features(特徵檢測), with each feature being detectable across the entire image.

I’ve shown just 3 feature maps, to keep the diagram above simple. However, in practice convolutional networks may use more (and perhaps many more) feature maps. One of the early convolutional networks, LeNet-5, used 6 feature maps, each associated to a 5×5 local receptive field, to recognize MNIST digits. So the example illustrated above is actually pretty close to LeNet-5. In the examples we develop later in the chapter we’ll use convolutional layers with 20 and 40 feature maps. Let’s take a quick peek at some of the features which are learned*