Deep Learning 7_深度學習UFLDL教程：Self-Taught Learning_Exercise（斯坦福大學深度學習教程）

阿新 • • 發佈：2019-01-12

前言

練習環境：win7， matlab2015b，16G記憶體，2T硬碟

一是用29404個無標註資料unlabeledData（手寫數字資料庫MNIST Dataset中數字為5-9的資料）來訓練稀疏自動編碼器，得到其權重引數opttheta。這一步的目的是提取這些資料的特徵，雖然我們不知道它提取的究竟是哪些特徵（當然，可以通過視覺化結果看出來，可假設其提取的特徵為Features），但是我們知道它提取到的特徵實際上就是已訓練好的稀疏自動編碼器的隱藏層的啟用值（即：第2層啟用值）。注意：本節所有訓練稀疏自動編碼器的演算法用的都L-BFGS演算法。

二

是把15298個已標註資料trainData（手寫數字資料庫MNIST Dataset中數字為0-4的前一半資料）作為訓練資料集通過這個已訓練好的稀疏自動編碼器（即：權重引數為opttheta的稀疏自動編碼器），就可提取出跟上一步一樣的相同的特徵引數，這裡trainData提取的特徵表達假設為trainFeatures，它其實也是隱藏層的啟用值。如果還不明白，這裡打一個比方：假設上一步提取的是一個通訊訊號A(對應unlabeledData)的特徵是一階累積量，而這一步提取的就是通訊訊號B（對應trainData）的一階累積量，它們提取的都是同樣的特徵，只是物件不同而已。同樣地，unlabeledData和trainData提取的是同樣的特徵Features，只是物件不同而已。

注意：如果上一步對unlabeledData做了預處理，一定要把其各種資料預處理引數（比如PCA中主成份U）儲存起來，因為這一步的訓練資料集trainData和下一步的測試資料集testData也一定要做相同的預處理。本節練習，因為用的是手寫數字資料庫MNIST Dataset，已經經過了預處理，所以不用再預處理。

三是把15298個已標註資料testData（手寫數字資料庫MNIST Dataset中數字為0-4的後一半資料）作為測試資料集通過這個已訓練好的稀疏自動編碼器（即：權重引數為opttheta的稀疏自動編碼器），，就可提取出跟上一步一樣的相同的特徵引數，這裡testData提取的特徵表達假設為testFeatures，它其實也是隱藏層的啟用值。

四是把第二步提取出來的特徵trainFeatures和已標註資料trainData的標籤trainLabels作為輸入來訓練softmax分類器，得到其迴歸模型softmaxModel。

五是把第三步提取出來的特徵testFeatures輸入訓練好的softmax迴歸模型softmaxModel，從而預測出已標註資料testData的類別pred，再把pred和已標註資料testData本來的標籤testLabels對比，就可得出正確率。

綜上，Self-taught learning是利用未標註資料，用無監督學習來提取特徵引數，然後用有監督學習和提取的特徵引數來訓練分類器。

本節方法適用範圍：

用於在一些擁有大量未標註資料和少量的已標註資料的場景中，本節方法可能是最有效的。即使在只有已標註資料的情況下（這時我們通常忽略訓練資料的類標號進行特徵學習），以上想法也能得到很好的結果。

一些matlab函式

numel：求元素總數。

n=numel(A)該語句返回陣列中元素的總數。

s=size(A),當只有一個輸出引數時，返回一個行向量，該行向量的第一個元素時陣列的行數，第二個元素是陣列的列數。

[r,c]=size(A),當有兩個輸出引數時，size函式將陣列的行數返回到第一個輸出變數，將陣列的列數返回到第二個輸出變數。

round(n)的意思是純粹的四捨五入，意思與我們以前數學中的四捨五入是一樣的！

find

找到非零元素的索引和值

語法：

1. ind = find(X)

2. ind = find(X, k)

3. ind = find(X, k, 'first')

4. ind = find(X, k, 'last')

5. [row,col] = find(X, ...)

6. [row,col,v] = find(X, ...)

說明：

1. ind = find(X)

找出矩陣X中的所有非零元素，並將這些元素的線性索引值（linear indices：按列）返回到向量ind中。

如果X是一個行向量，則ind是一個行向量；否則，ind是一個列向量。

如果X不含非零元素或是一個空矩陣，則ind是一個空矩陣。

2. ind = find(X, k) 或 3. ind = find(X, k, 'first')

返回第一個非零元素k的索引值。

k必須是一個正數，但是它可以是任何數字數值型別。

4. ind = find(X, k, 'last')

返回最後一個非零元素k的索引值。

5. [row,col] = find(X, ...)

返回矩陣X中非零元素的行和列的索引值。

這個語法對於處理稀疏矩陣尤其有用。

如果X是一個N（N>2）維矩陣，col包括列的線性索引。

例如，一個5*7*3的矩陣X，有一個非零元素X（4,2,3），find函式將返回row=4和col=16。也就是說，（第1頁有7列）+（第2頁有7列）+（第3頁有2列）=16。

6. [row,col,v] = find(X, ...)

返回X中非零元素的一個列或行向量v，同時返回行和列的索引值。

如果X是一個邏輯表示，則v是一個邏輯矩陣。

輸出向量v包含通過評估X表示得到的邏輯矩陣的非零元素。

例如，

A= magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1

[r,c,v]= find(A>10);

r', c', v'
ans =
1 2 4 4 1 3 (按列)
ans =
1 2 2 3 4 4 （按列）
ans =
1 1 1 1 1 1

這裡返回的向量v是一個邏輯矩陣，它包含N個非零元素，N=(A>10)

例子：

例1

X = [1 0 4 -3 0 0 0 8 6];
indices = find(X)

返回X中非零元素的線性索引值。

indices =
1 3 4 8 9

例2

你可以用一個邏輯表達方式定義X。例如

find(X > 2)

返回X中大於2的元素的相對應的線性索引值。

ans =
3 8 9

unique:

　　unique為找出向量中的非重複元素並進行排序後輸出。

執行結果

權重引數opttheta中W1的視覺化結果，也就是所提取特徵的視覺化結果如下：

Test Accuracy: 98.333115%

Elapsed time is 594.435594 seconds.

結果總結：

1. 為什麼Andrew Ng他們訓練樣本用25分鐘，而我所有執行時間不到6分鐘？估計前幾年電腦配置比現在的電腦配置差很多！

2.為了對比，Andrew Ng團隊做了實驗，如果不用本節稀疏自動編碼器提取的特徵代替原始畫素值（即：原始資料）訓練softmax分類器，準確率最多達到96%。實際上，本節練習和上一節練習Deep Learning六：Softmax Regression_Exercise（斯坦福大學UFLDL深度學習教程）的不同之處，就是本節練習用的是稀疏自動編碼器提取的特徵訓練softmax分類器，而上一節練習用的原始資料訓練softmax分類器，上節練習我們得到的準確率實際上只有92.640%，當然，可能Andrew Ng團隊的準確率最多達到了96%。

程式碼

stlExercise.m

%% CS294A/CS294W Self-taught Learning Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  self-taught learning. You will need to complete code in feedForwardAutoencoder.m
%  You will also need to have implemented sparseAutoencoderCost.m and 
%  softmaxCost.m from previous exercises.
%
%% ======================================================================
%  STEP 0: Here we provide the relevant parameters values that will
%  allow your sparse autoencoder to get good filters; you do not need to 
%  change the parameters below.
tic
inputSize  = 28 * 28;
numLabels  = 5;
hiddenSize = 200;
sparsityParam = 0.1; % desired average activation of the hidden units.
                     % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                     %  in the lecture notes). 
lambda = 3e-3;       % weight decay parameter       
beta = 3;            % weight of sparsity penalty term   
maxIter = 400;

%% ======================================================================
%  STEP 1: Load data from the MNIST database
%
%  This loads our training and test data from the MNIST database files.
%  We have sorted the data for you in this so that you will not have to
%  change it.

% Load MNIST database files
mnistData   = loadMNISTImages('train-images.idx3-ubyte');
mnistLabels = loadMNISTLabels('train-labels.idx1-ubyte');

% Set Unlabeled Set (All Images)

% Simulate a Labeled and Unlabeled set
labeledSet   = find(mnistLabels >= 0 & mnistLabels <= 4);%返回mnistLabels中元素值大於等於0且小於等於4的數字的行號
unlabeledSet = find(mnistLabels >= 5);

numTrain = round(numel(labeledSet)/2);
trainSet = labeledSet(1:numTrain);
testSet  = labeledSet(numTrain+1:end);

unlabeledData = mnistData(:, unlabeledSet);% 無標籤資料集

trainData   = mnistData(:, trainSet);% mnistData中大於等於0且小於等於4的數字的前一半數字作為有標籤的訓練資料
trainLabels = mnistLabels(trainSet)' + 1; % Shift Labels to the Range 1-5

testData   = mnistData(:, testSet);% mnistData中大於等於0且小於等於4的數字的後一半數字作為有標籤的測試資料
testLabels = mnistLabels(testSet)' + 1;   % Shift Labels to the Range 1-5

% Output Some Statistics
fprintf('# examples in unlabeled set: %d\n', size(unlabeledData, 2));
fprintf('# examples in supervised training set: %d\n\n', size(trainData, 2));
fprintf('# examples in supervised testing set: %d\n\n', size(testData, 2));

%% ======================================================================
%  STEP 2: Train the sparse autoencoder
%  This trains the sparse autoencoder on the unlabeled training
%  images. 

%  按均勻分佈隨機初始化theta引數   Randomly initialize the parameters
theta = initializeParameters(hiddenSize, inputSize);

%% ----------------- YOUR CODE HERE ----------------------
%  Find opttheta by running the sparse autoencoder on
%  unlabeledTrainingImages
%  利用L-BFGS演算法，用無標籤資料集來訓練稀疏自動編碼器

opttheta = theta; 

addpath minFunc/
options.Method = 'lbfgs';
options.maxIter = 400;
options.display = 'on';
[opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
      inputSize, hiddenSize, ...
      lambda, sparsityParam, ...
      beta, unlabeledData), ...
      theta, options);


%% -----------------------------------------------------
                          
% Visualize weights
W1 = reshape(opttheta(1:hiddenSize * inputSize), hiddenSize, inputSize);
display_network(W1');

%%======================================================================
%% STEP 3: 從有標籤資料集中提取特徵 Extract Features from the Supervised Dataset
%  
%  You need to complete the code in feedForwardAutoencoder.m so that the 
%  following command will extract features from the data.

trainFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       trainData);

testFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       testData);

%%======================================================================
%% STEP 4: Train the softmax classifier

softmaxModel = struct;  
%% ----------------- YOUR CODE HERE ----------------------
%  Use softmaxTrain.m from the previous exercise to train a multi-class
%  classifier. 
%  利用L-BFGS演算法，用從有標籤訓練資料集中提取的特徵及其標籤，訓練softmax迴歸模型，

%  Use lambda = 1e-4 for the weight regularization for softmax
lambda = 1e-4;
inputSize = hiddenSize;
numClasses = numel(unique(trainLabels));%unique為找出向量中的非重複元素並進行排序
% You need to compute softmaxModel using softmaxTrain on trainFeatures and
% trainLabels

options.maxIter = 100; %最大迭代次數
softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...
                            trainFeatures, trainLabels, options);





%% -----------------------------------------------------


%%======================================================================
%% STEP 5: Testing 

%% ----------------- YOUR CODE HERE ----------------------
% Compute Predictions on the test set (testFeatures) using softmaxPredict
% and softmaxModel

[pred] = softmaxPredict(softmaxModel, testFeatures);



%% -----------------------------------------------------

% Classification Score
fprintf('Test Accuracy: %f%%\n', 100*mean(pred(:) == testLabels(:)));
toc
% (note that we shift the labels by 1, so that digit 0 now corresponds to
%  label 1)
%
% Accuracy is the proportion of correctly classified images
% The results for our implementation was:
%
% Accuracy: 98.3%
%
%

feedForwardAutoencoder.m

 1 function [activation] = feedForwardAutoencoder(theta, hiddenSize, visibleSize, data)
 2 
 3 % theta: trained weights from the autoencoder
 4 % visibleSize: the number of input units (probably 64) 
 5 % hiddenSize: the number of hidden units (probably 25) 
 6 % data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. 
 7   
 8 % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 
 9 % follows the notation convention of the lecture notes. 
10 
11 W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
12 b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
13 
14 %% ---------- YOUR CODE HERE --------------------------------------
15 %  Instructions: Compute the activation of the hidden layer for the Sparse Autoencoder.
16 
17 activation  = sigmoid(W1*data+repmat(b1,[1,size(data,2)]));
18 %-------------------------------------------------------------------
19 
20 end
21 
22 %-------------------------------------------------------------------
23 % Here's an implementation of the sigmoid function, which you may find useful
24 % in your computation of the costs and the gradients.  This inputs a (row or
25 % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). 
26 
27 function sigm = sigmoid(x)
28     sigm = 1 ./ (1 + exp(-x));
29 end

參考資料：

……

Deep Learning 7_深度學習UFLDL教程：Self-Taught Learning_Exercise（斯坦福大學深度學習教程）

前言理論知識：自我學習練習環境：win7， matlab2015b，16G記憶體，2T硬碟一是用29404個無標註資料unlabeledData（手寫數字資料庫MNIST Dataset中數字為5-9的資料）來訓練稀疏自動編碼器，得到其權重引數opttheta。這一步的目的是提取這

Deep Learning 1_深度學習UFLDL教程：Sparse Autoencoder練習（斯坦福大學深度學習教程）

1前言本人寫技術部落格的目的，其實是感覺好多東西，很長一段時間不動就會忘記了，為了加深學習記憶以及方便以後可能忘記後能很快回憶起自己曾經學過的東西。首先，在網上找了一些資料，看見介紹說UFLDL很不錯，很適合從基礎開始學習，Adrew Ng大牛寫得一點都不裝B，感覺非常好

Deep Learning 4_深度學習UFLDL教程：PCA in 2D_Exercise（斯坦福大學深度學習教程）

前言本節練習的主要內容：PCA，PCA Whitening以及ZCA Whitening在2D資料上的使用，2D的資料集是45個數據點，每個資料點是2維的。要注意區別比較二維資料與二維影象的不同，特別是在程式碼中，可以看出主要二維資料的在PCA前的預處理不需要先0均值歸一化，而二維自然影象需要先

Deep Learning 19_深度學習UFLDL教程：Convolutional Neural Network_Exercise（斯坦福大學深度學習教程）

基礎知識概述 CNN是由一個或多個卷積層（其後常跟一個下采樣層）和一個或多個全連線層組成的多層神經網路。CNN的輸入是2維影象（或者其他2維輸入，如語音訊號）。它通過區域性連線和權值共享，再通過池化可得到平移不變特徵。CNN的另一個優點就是易於訓練

Deep Learning 11_深度學習UFLDL教程：資料預處理（斯坦福大學深度學習教程）

資料預處理是深度學習中非常重要的一步！如果說原始資料的獲得，是深度學習中最重要的一步，那麼獲得原始資料之後對它的預處理更是重要的一部分。 1.資料預處理的方法： ①資料歸一化：簡單縮放：對資料的每一個維度的值進行重新調節，使其在 [0,1]或[ − 1,1] 的區間內逐樣本均值消減：在每個

Deep Learning 13_深度學習UFLDL教程：Independent Component Analysis_Exercise（斯坦福大學深度學習教程）

前言實驗環境：win7， matlab2015b，16G記憶體，2T機械硬碟難點：本實驗難點在於執行時間比較長，跑一次都快一天了，並且我還要驗證各種代價函式的對錯，所以跑了很多次。實驗基礎說明： ①不同點：本節實驗中的基是標準正交的，也是線性獨立的，而Deep Learni

Deep Learning 2_深度學習UFLDL教程：向量化程式設計（斯坦福大學深度學習教程）

1前言本節主要是讓人用向量化程式設計代替效率比較低的for迴圈。在前一節的Sparse Autoencoder練習中已經實現了向量化程式設計，所以與前一節的區別只在於本節訓練集是用MINIST資料集，而上一節訓練集用的是從10張圖片中隨機選擇的8*8的10000張小圖塊。綜上，只需要在

Deep Learning 10_深度學習UFLDL教程：Convolution and Pooling_exercise（斯坦福大學深度學習教程）

前言實驗環境：win7， matlab2015b，16G記憶體，2T機械硬碟實驗內容：Exercise:Convolution and Pooling。從2000張64*64的RGB圖片（它是 the STL10 Dataset的一個子集）中提取特徵作為訓練資料集，訓練softmax分類器，然後從

Deep Learning 5_深度學習UFLDL教程：PCA and Whitening_Exercise（斯坦福大學深度學習教程）

close all; % clear all; %%================================================================ %% Step 0a: Load data % Here we provide the code to load n

Deep Learning 3_深度學習UFLDL教程：預處理之主成分分析與白化_總結（斯坦福大學深度學習教程）

1PCA ①PCA的作用：一是降維；二是可用於資料視覺化；注意：降維的原因是因為原始資料太大，希望提高訓練速度但又不希望產生很大的誤差。 ② PCA的使用場合：一是希望提高訓練速度；二是記憶體太小；三是希望資料視覺化。 ③用PCA前的預處理：(1)規整化特徵的均值大致為0；(

Deep Learning 8_深度學習UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大學深度學習教程）

前言 2.實驗環境：win7， matlab2015b，16G記憶體，2T硬碟 3.實驗內容：Exercise: Implement deep networks for digit classification。利用深度網路完成MNIST手寫數字資料庫中手寫數字的識別。即：用6萬個已標註資料（即：6萬

Deep Learning 7_深度學習UFLDL教程：Self-Taught Learning_Exercise（斯坦福大學深度學習教程）

前言

程式碼

Deep Learning 7_深度學習UFLDL教程：Self-Taught Learning_Exercise（斯坦福大學深度學習教程）

Deep Learning 1_深度學習UFLDL教程：Sparse Autoencoder練習（斯坦福大學深度學習教程）

Deep Learning 4_深度學習UFLDL教程：PCA in 2D_Exercise（斯坦福大學深度學習教程）

Deep Learning 19_深度學習UFLDL教程：Convolutional Neural Network_Exercise（斯坦福大學深度學習教程）

Deep Learning 11_深度學習UFLDL教程：資料預處理（斯坦福大學深度學習教程）

Deep Learning 13_深度學習UFLDL教程：Independent Component Analysis_Exercise（斯坦福大學深度學習教程）

Deep Learning 2_深度學習UFLDL教程：向量化程式設計（斯坦福大學深度學習教程）

Deep Learning 10_深度學習UFLDL教程：Convolution and Pooling_exercise（斯坦福大學深度學習教程）

Deep Learning 5_深度學習UFLDL教程：PCA and Whitening_Exercise（斯坦福大學深度學習教程）

Deep Learning 3_深度學習UFLDL教程：預處理之主成分分析與白化_總結（斯坦福大學深度學習教程）

Deep Learning 8_深度學習UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大學深度學習教程）

Deep Learning 12_深度學習UFLDL教程：Sparse Coding_exercise（斯坦福大學深度學習教程）

Deep Learning 6_深度學習UFLDL教程：Softmax Regression_Exercise（斯坦福大學深度學習教程）

Deep Learning 9_深度學習UFLDL教程：linear decoder_exercise（斯坦福大學深度學習教程）

實時翻譯的發動機：矢量語義（斯坦福大學課程解讀）

Keras TensorFlow教程：如何從零開發一個複雜深度學習模型

斯坦福大學公開課機器學習：advice for applying machine learning | learning curves （改進學習算法：高偏差和高方差與學習曲線的關系）

CS294-112深度增強學習課程（加州大學伯克利分校 2017）NO.3 Learning dynamical system models from data

《TensorFlow：實戰Google深度學習框架》——5.4 模型持久化（模型儲存、模型載入）

斯坦福大學深度學習筆記：邏輯迴歸

Deep Learning 7_深度學習UFLDL教程：Self-Taught Learning_Exercise（斯坦福大學深度學習教程）

前言

程式碼

相關推薦