第四節，Neural Networks and Deep Learning 一書小節(上)

阿新 • • 發佈：2018-03-24

rain 集合最大值劃分 import {0} mar result bsp

最近花了半個多月把Mchiael Nielsen所寫的Neural Networks and Deep Learning這本書看了一遍，受益匪淺。

該書英文原版地址地址：http://neuralnetworksanddeeplearning.com/

回顧一下這本書主要講的內容

1.使用神經網絡識別手寫數字

作者從感知器模型引申到S型神經元。然後再到神經網絡的結構。並用一個三層神經網絡結構來進行手寫數字識別，

作者詳細介紹了神經網絡學習所使用到梯度下降法，由於當訓練輸入數量過大時，學習過程將變的時分緩慢，就引

入了隨機梯度下降的算法用來加速學習。

選取二次代價函數

技術分享圖片

　神經網絡的權重偏置更新法則如下：

技術分享圖片

其中m是隨機選取的m個訓練樣本，我們把這些隨機訓練樣本標記為X₁，X₂，X₃,..,X_m.。並把它們稱為一個小批量數據。

2.反向傳播算法如何工作

這一章作者主要介紹了反向傳播的四個公式。並給出了反向傳播算法的計算流程：

以MNIST數據集為例，包含50000幅用於訓練的手寫圖片，10000幅用於校驗的手寫圖片，10000幅用於測試的手寫圖片。

1.輸入訓練集樣本的集合

2.初始化叠代期次數(epochs)，開始循環 for i in range(epochs):

2.1 打算輸入訓練集樣本，按mini_batch_size(小批量大小)劃分成許多組

2.2 針對每一小批量數據應用隨機梯度下降法，並更新權重和偏置（程序中update_mini_batch(self,mini_batch,eta)函數）

2.3 一輪訓練結束，用測試數據集檢驗準確率

3.神經網絡學習結束

其中2.2步驟，尤為重要，針對小批量數據(mini_batch)，如何應用隨機梯度下降法，更新網絡參數（update_mini_batch）

1.輸入小批量數據的集合 mini_batch

2遍歷每一個實例 (x,y)，開始循環 for x,y in mini_batch:

2.1計算每一個實例的梯度（backprop(self,x,y)函數）

2.1.1 對每層l = 2,3,...,L，計算每一層帶全權輸入z^l = w^la^l-1+b^l,激活輸出a^l

= σ(z^l)

2.1.2 計算輸出層誤差 δ^L=?C/?a^Lσ‘(z^L)，計算?C_x/?ω^L=δ^L(a^L-1)^T,?C_x/?b^L=δ^L。(註意當選擇不同的代價函數時δ^L值是不一樣，

當選擇二次代價函數時，δ^L=(a^L-y)σ‘(z^L)，當選擇交叉熵代價函數時，δ^L=(a^L-y))

2.1.3 反向傳播誤差，對每個l = L-1,L-2,...,2 計算δ^l= ((ω^l+1)^Tδ^l+1)σ‘(z^l)，計算?C_x/?ω^l=δ^L(a^l-1)^T,?C_x/?b^l=δ^l

2.1.4 ?C_x/?ω = [?C_x/?ω²,?C_x/?ω³,...,?C_x/?ω^L]， ?C_x/?b = [?C_x/?b²,?C_x/?b³,...,?C_x/?b^L]

2.2計算梯度的累積和，Σ?C_x/?ω，Σ?C_x/?b

3.應用隨機梯度下降法權重偏置更新法則更新權重和偏置 ω = ω-η/mΣ?C_x/?ω,b = b-η/mΣ?C_x/?b

選用三層神經網絡，激活函數選取S型神經元，代價函數選取二次代價函數，實現程序如下

Network1.py：

# -*- coding: utf-8 -*-
"""
Created on Mon Mar  5 20:24:32 2018

@author: Administrator
"""

‘‘‘
書籍：神經網絡與深度學習
第一章：利用梯度下降法訓練神經網絡算法  這裏代價函數采用二次代價函數
‘‘‘

import numpy as np
import random

‘‘‘
定義S型函數
當輸入z是一個向量或者numpy數組時，numpy自動地按元素應用sigmod函數，即以向量形式
‘‘‘
def sigmod(z):
    return 1.0/(1.0+np.exp(-z))


‘‘‘
定義S型函數的導數
‘‘‘
def sigmod_prime(z):
    return sigmod(z)*(1-sigmod(z))
    
    

‘‘‘
定義一個Network類，用來表示一個神經網絡
‘‘‘
class Network(object):
    ‘‘‘
    sizes:各層神經元的個數
    weights:權重，隨機初始化，服從(0,1)高斯分布 weights[i]：是一個連接著第i層和第i+1層神經元權重的numpy矩陣 i=0,1...
    biases：偏置，隨機初始化，服從(0,1)高斯分布 biases[i]：是第i+1層神經元偏置向量 i=0,1....
    ‘‘‘
    def __init__(self,sizes):
        #計算神經網絡的層數
        self.num_layers = len(sizes)
        #每一層的神經元個數
        self.sizes = sizes
        #隨機初始化權重  第i層和i+1層之間的權重向量
        self.weights = [np.random.randn(y,x) for x,y in zip(sizes[:-1],sizes[1:])]
        #隨機初始化偏置  第i層的偏置向量  i=1...num_layers
        self.biases = [np.random.randn(y,1) for y in sizes[1:]]     
        
    ‘‘‘
    前向反饋函數，對於網絡給定一個輸入向量a，返回對應的輸出
    ‘‘‘    
    def  feedforward(self,a):        
        for b,w in zip(self.biases,self.weights):
            #dot矩陣乘法  元素乘法使用*
            a = sigmod(np.dot(w,a) + b)
        return a
    
    ‘‘‘
    隨機梯度下降算法：使用小批量訓練樣本來計算梯度(計算隨機選取的小批量數據的梯度來估計整體的梯度)
    training_data:元素為(x,y)元祖的列表 (x,y)：表示訓練輸入以及對應的輸出類別  這裏的輸出類別是二值化後的10*1維向量
    epochs:叠代期數量 即叠代次數
    mini_batch:小批量數據的大小
    eta:學習率
    test_data:測試數據 元素為(x,y)元祖的列表 (x,y)：表示訓練輸入以及對應的輸出類別  這裏的輸出就是對應的實際數字 沒有二值化   
    ‘‘‘
    def SGD(self,training_data,epochs,mini_batch_size,eta,test_data=None):
        if test_data:
            #計算測試集樣本個數
            n_test = len(test_data)
        #計算訓練集樣本個數        
        n = len(training_data)
        #進行叠代
        for j in range(epochs):
            #將訓練集數據打亂，然後將它分成多個適當大小的小批量數據
            random.shuffle(training_data)            
            mini_batches = [training_data[k:k+mini_batch_size] for k in range(0,n,mini_batch_size)]
            #訓練神經網絡
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch,eta)
                
            #每一次叠代後 都評估一次對測試集數據進行預測的準確率    
            if test_data:
                print(‘Epoch {0}:  {1}/{2}‘.format(j,self.evaluate(test_data),n_test))
            else:
                print(‘Epoch {0} complete‘.format(j))
                
    ‘‘‘
    mini_batch：小批量數據 元素為(x,y)元祖的列表 (x,y)
    eta:學習率
    對每一個mini_batch應用梯度下降，更新權重和偏置
    ‘‘‘
    def update_mini_batch(self,mini_batch,eta):
        #初始化為0
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        #依次對每一個樣本求梯度，並求和
        for x,y in mini_batch:
            #計算每一個樣本代價函數的梯度（?C_x/?ω，?C_x/?b）
            delta_nabla_b,delta_nabla_w = self.backprop(x,y)
            #梯度分量求和 Σ?C_x/?ω
            nabla_b = [nb + dnb for nb,dnb in zip(nabla_b,delta_nabla_b)]
            #梯度分量求和 Σ?C_x/?b
            nabla_w = [nw + dnw for nw,dnw in zip(nabla_w,delta_nabla_w)]
        #更新權重 w = w -  η/m*Σ?C_x/?ω
        self.weights = [w - (eta/len(mini_batch))*nw for w,nw in zip(self.weights,nabla_w)]
        #更新偏置 b = b -  η/m*Σ?C_x/?b
        self.biases = [b - (eta/len(mini_batch))*nb for b,nb in zip(self.biases,nabla_b)]
        
        
    ‘‘‘
    計算給定一個樣本二次代價函數的梯度 單獨訓練樣本x的二次代價函數 C = 0.5||y - aL||^2 = 0.5∑(yj - ajL)^2
    返回一個元組(nabla_b,nabla_w) = （?C_x/?ω，?C_x/?b） ：和權重weights,偏置biases維數相同的numpy數組
    ‘‘‘
    def backprop(self,x,y):
        #初始化與self.baises,self.weights維數一樣的兩個數組 用於存放每個訓練樣本偏導數的累積和
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        #前向反饋
        activation = x
        #保存除了輸入層外所有層的σ(z)的值
        activations = [x]
        #保存除了輸入層外所有層的z的值
        zs = []
        #計算除了輸入層外每一層z和σ(z)的值
        for b,w in zip(self.biases,self.weights):
            z = np.dot(w,activation) + b
            zs.append(z)
            activation = sigmod(z)
            activations.append(activation)
        
        #計算輸出層誤差
        delta = self.cost_derivative(activations[-1],y)*sigmod_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta,activations[-2].transpose())
        
        #計算反向傳播誤差
        for l in range(2,self.num_layers):
            z = zs[-l]
            sp = sigmod_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(),delta)*sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta,activations[-l-1].transpose())
        return (nabla_b,nabla_w)
    
    ‘‘‘
    對神經網絡預測準確率進行評估
    ‘‘‘
    def evaluate(self,test_data):
        #np.argmax返回最大值所在的索引  這裏獲取預測數值和實際數值組成元組的列表
        test_results = [(np.argmax(self.feedforward(x)),y) for x,y in test_data]
        #計算預測值 == 實際值的總個數
        return sum(int(x==y) for x,y in test_results)
    
    ‘‘‘
    計算損失函數的偏導數?C/?a  a是實際輸出  
    ‘‘‘
    def cost_derivative(self,output_activations,y):
        return (output_activations - y)


def  network_baseline():
    #遇到編碼錯誤：參考鏈接http://blog.csdn.net/qq_41185868/article/details/79039604S
    #traning_data:[(784*1,10*1),...]，50000個元素
    #validation_data[(784*1,1*1),....],10000個元素
    #test_data[(784*1,1*1),....],10000個元素
    training_data,validation_data,test_data = mnist_loader.load_data_wrapper()
    print(‘訓練集數據長度‘,len(training_data))
    print(training_data[0][0].shape)      #訓練集每一個樣本的特征維數   （784，1）
    print(training_data[0][1].shape)      #訓練集每一個樣本對應的輸出維數  （10，1）
    
    print(‘測試集數據長度‘,len(test_data))
    print(test_data[0][0].shape)         #測試機每一個樣本的特征維數,1,1   （784,1）
    #print(test_data[0][1].shape)         #測試機每一個樣本對應的輸出維數   （） 這裏與訓練集的輸出略有不同，這裏輸出是一個數 並不是二指化後的10*1維向量 
    print(test_data[0][1])               #7
       
    #測試
    net = Network([784,30,10])
    ‘‘‘
    print(net.num_layers)      #3
    print(net.sizes)
    print(net.weights)
    print(net.biases)
    ‘‘‘
    
    net.SGD(training_data,30,10,3.0,test_data=test_data)


#運行程序

network_baseline()

第四節，Neural Networks and Deep Learning 一書小節(上)

rain 集合最大值劃分 import {0} mar result bsp 最近花了半個多月把Mchiael Nielsen所寫的Neural Networks and Deep Learning這本書看了一遍，受益匪淺。該書英文原版地址地址：http://neur

課程一(Neural Networks and Deep Learning)，第一週（Introduction to Deep Learning）—— 0、學習目標

1. Understand the major trends driving the rise of deep learning. 2. Be able to explain how deep learning is applied to supervised learning. 3. Unde

課程一(Neural Networks and Deep Learning)，第二週（Basics of Neural Network programming）—— 1、10個測驗題（Neural N

--------------------------------------------------中文翻譯-------

課程一(Neural Networks and Deep Learning)，第一週（Introduction to Deep Learning）—— 2、10個測驗題

1、What does the analogy “AI is the new electricity” refer to? (B) A. Through the “smart grid”, AI is delivering a new wave of electricity.

Neural Networks and Deep Learning 第四周

什麼是深度神經網路神經網路的個數是隱藏層+輸出層，輸入層不計入。對於這個只有單個神經元的網路，single neural network，我們稱之為淺層（shallow）神經網路；隱藏層5個，輸出層一個，所以這個神經網路一共是6層，稱之為深度神經網路。符號表示：L代表神經網路

Neural Networks and Deep Learning第三週

Overview of Neural Network回顧第一週的neural network，第一個neural network是z，第二個是theta，上一個傳入下一個。拿單層神經網路來說，樣本的值x1,x2...xn是input layer，是輸入層；hidden lay

Neural Networks and Deep Learning學習筆記ch1 - 神經網絡

1.4 true ole 輸出使用 .org ptr easy isp 近期開始看一些深度學習的資料。想學習一下深度學習的基礎知識。找到了一個比較好的tutorial，Neural Networks and Deep Learning，認真看完了之後覺

課程一(Neural Networks and Deep Learning)總結：Logistic Regression

pdf idt note hub blog bsp http learn gre -------------------------------------------------------------------------

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning課堂筆記

決定如同樣本理解你是水平包含 rod spa Coursera課程《Neural Networks and Deep Learning》 deeplearning.ai Week1 Introduction to deep learning What is a

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics課堂筆記

樣本數目 and 編程多次之間優化我們 round 符號 Coursera課程《Neural Networks and Deep Learning》 deeplearning.ai Week2 Neural Networks Basics 2.1 Logistic

Neural Networks and Deep Learning

too near poi sel ace data- big Dimension important Neural Networks and Deep Learning This is the first course of the deep learning specia

sp1.1-1.2 Neural Networks and Deep Learning

Relu這影象也叫線性流動函式不再用sigmoid函式當啟用函式相當於max(0,x)函式比較0和當前值哪個大可以把隱藏層看作前面整合

sp1.3-1.4 Neural Networks and Deep Learning

交叉熵定義了兩個概率分佈之間的距離，因為是概率分佈所以又引入softmax變為概率的形式相加還是1 3 shallow neural network 神經網路輸入層不算上

Neural networks and deep learning 概覽

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

如何免費學習coursera上吳恩達的Neural Networks and Deep Learning課程

首先，在註冊時不要選擇免費試用，而是要選擇旁聽。進入旁聽之後，中間的部分課程是無法做的，這時候，需要用anaconda的jupyter notebook功能來進行作業。具體方法如下：安裝後開啟進入到一個資料夾目錄，找到這個目錄在你的資料夾的具體位置。並將作業檔案複

Neural Networks and Deep Learning 整理

之前看了一些吳恩達的視訊和大話機器學習的一部分東西。選擇記錄的這本書頁數比較少，但是可以作為一個不錯的總結記錄。權重，w1, w2, . . .，表⽰相應輸⼊對於輸出重要性的實數。神經元的輸出，0 或者 1，則由分配權重後的總和 ∑j wjxj ⼩於

Neural Networks and Deep Learning 整理（三）

公式太麻煩，沒寫公式。交叉熵函式作為代價函式用求導推理說明了這樣比二次代價函式（方差的形式）要更好一些，即導數和（y-a）成正比。一開始期望值和輸出的差別越大，下降的速度

Neural Networks and Deep Learning 整理（二）

反向傳播（backpropagation）權重矩陣偏置向量帶權輸入z

A Beginner's Guide to Neural Networks and Deep Learning

Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data throug

Neural Networks and Deep Learning 筆記

目錄 1 Introduction to Deep Learning 1.1 結構化資料/非結構化資料 1.2 為什麼深度學習會興起 2 Neural Networks Basics 2.1 二分類問題 2.2 邏輯迴歸 2.3 損失函式

第四節，Neural Networks and Deep Learning 一書小節(上)

相關推薦