1. 程式人生 > >MNIST手寫數字圖片識別(線性回歸、CNN方法的手工及框架實現)(未完待續)

MNIST手寫數字圖片識別(線性回歸、CNN方法的手工及框架實現)(未完待續)

shape 初始化 result rect not found pro res edi ise

0-Background

作為Deep Learning中的Hello World 項目無論如何都要做一遍的。

代碼地址:Github 練習過程中將持續更新blog及代碼。

第一次寫博客,很多地方可能語言組織不清,請多多提出意見。。謝謝~

0.1 背景知識:

  • Linear regression
  • CNN

    LeNet-5
    AlexNet
    ResNet
    VGG

  • 各種regularization方式

0.2 Catalog

  • 1-Prepare
  • 2-MNIST
  • 3-LinearRegression

1-Prepare

  • Numpy 開源的數值計算庫
  • matplotlib Python 的 2D繪圖庫
  • TensorFlow 開源的人工智能學習系統
  • Keras 基Tensorflow、Theano以及CNTK後端的一個高層神經網絡API

2-MNIST

MNIST作為NIST的一個超集,是一個由來自 250 個不同人手寫的數字構成。其中包含60,000個訓練樣本和10,000個測試樣本。
加載MNIST

import numpy as np
import os
import struct
import matplotlib.pyplot as plt


class load:
    def __init__(self,
                 path=‘mnist‘):
        self
.path = path def load_mnist(self): """Read train and test dataset and labels from path""" train_image_path = ‘train-images.idx3-ubyte‘ train_label_path = ‘train-labels.idx1-ubyte‘ test_image_path = ‘t10k-images.idx3-ubyte‘ test_label_path = ‘t10k-labels.idx1-ubyte‘
with open(os.path.join(self.path, train_label_path), ‘rb‘) as labelpath: magic, n = struct.unpack(‘>II‘, labelpath.read(8)) labels = np.fromfile(labelpath, dtype=np.uint8) train_labels = labels.reshape(len(labels), 1) with open(os.path.join(self.path, train_image_path), ‘rb‘) as imgpath: magic, num, rows, cols = struct.unpack(‘>IIII‘, imgpath.read(16)) images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(train_labels), 784) train_images = images with open(os.path.join(self.path, test_label_path), ‘rb‘) as labelpath: magic, n = struct.unpack(‘>II‘, labelpath.read(8)) labels = np.fromfile(labelpath, dtype=np.uint8) test_labels = labels.reshape(len(labels), 1) with open(os.path.join(self.path, test_image_path), ‘rb‘) as imgpath: magic, num, rows, cols = struct.unpack(‘>IIII‘, imgpath.read(16)) images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(test_labels), 784) test_images = images return train_images, train_labels, test_images, test_labels if __name__ == ‘__main__‘: train_images, train_labels, test_images, test_labels = load().load_mnist() print(‘train_images shape:%s % str(train_images.shape)) print(‘train_labels shape:%s % str(train_labels.shape)) print(‘test_images shape:%s % str(test_images.shape)) print(‘test_labels shape:%s % str(test_labels.shape)) np.random.seed(1024) trainImage = np.random.randint(60000, size=4) testImage = np.random.randint(10000, size=2) img1 = train_images[trainImage[0]].reshape(28, 28) label1 = train_labels[trainImage[0]] img2 = train_images[trainImage[1]].reshape(28, 28) label2 = train_labels[trainImage[1]] img3 = train_images[trainImage[2]].reshape(28, 28) label3 = train_labels[trainImage[2]] img4 = train_images[trainImage[3]].reshape(28, 28) label4 = train_labels[trainImage[3]] img5 = test_images[testImage[0]].reshape(28, 28) label5 = test_labels[testImage[0]] img6 = test_images[testImage[1]].reshape(28, 28) label6 = test_labels[testImage[1]] plt.figure(num=‘mnist‘, figsize=(2, 3)) plt.subplot(2, 3, 1) plt.title(label1) plt.imshow(img1) plt.subplot(2, 3, 2) plt.title(label2) plt.imshow(img2) plt.subplot(2, 3, 3) plt.title(label3) plt.imshow(img3) plt.subplot(2, 3, 4) plt.title(label4) plt.imshow(img4) plt.subplot(2, 3, 5) plt.title(label5) plt.imshow(img5) plt.subplot(2, 3, 6) plt.title(label6) plt.imshow(img6) plt.show()

運行得到輸出:

技術分享圖片

3-LinearRegression

采用線性回歸的方式對MNIST數據集訓練識別。
采用2層網絡,hidden layer具有四個神經元,激活函數分別使用Tanh和ReLu。

由於MNIST是一個多分類問題,故輸出層采用Softmax作為激活函數,並使用cross entropy作為Loss Function。

3.1 使用Numpy實現

3.1.1 通過Tran data、label獲取 layer size

Code

def layer_size(X, Y):
    """
    Get number of input and output size, and set hidden layer size
    :param X: input dataset‘s shape(m, 784)
    :param Y: input labels‘s shape(m,1)
    :return:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """

    n_x = X.T.shape[0]
    n_h = 4
    n_y = Y.T.shape[0]

    return n_x, n_h, n_y

3.1.2 初始化參數

初始化W1、b1、W2、b2*

W初始化為非0數字

b均初始化為0

Code

def initialize_parameters(n_x, n_h, n_y):
    """
    Initialize parameters
    :param n_x: the size of the input layer
    :param n_h: the size of the hidden layer
    :param n_y: the size of the output layer
    :return: dictionary of parameters
    """

    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2
                  }

    return parameters

3.1.3 Forward Propagation

ReLu采用\((|Z|+Z)/2\)的方式實現

def ReLu(Z):
    return (abs(Z) + Z) / 2
def forward_propagation(X, parameters, activation="tanh"):
    """
    Compute the forword propagation
    :param X: input data (m, n_x)
    :param parameters: parameters from initialize_parameters
    :param activation: activation function name, has "tanh" and "relu"
    :return:
        cache: caches of forword result
        A2: sigmoid output
    """

    X = X.T

    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1 = np.dot(W1, X) + b1
    if activation == "tanh":
        A1 = np.tanh(Z1)
    elif activation == "relu":
        A1 = ReLu(Z1)
    else:
        raise Exception(‘Activation function is not found!‘)
    Z2 = np.dot(W2, A1) + b2
    A2 = 1 / (1 + np.exp(-Z2))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

3.1.4 Compute Cost

MNIST手寫數字圖片識別(線性回歸、CNN方法的手工及框架實現)(未完待續)