初識LSTM——遷移學習小記（四）

阿新 • • 發佈：2018-12-11

一、遞迴神經網路（RRN）

傳統的神經網路，只是在深度上進行多層連線，層與層之間具有連線，但是在同層的內部節點之間沒有任何連線，於是對於處理前後有關的資料顯得無能為力。RRN為解決這一問題，在廣度上也進行了連線。

具體的，RNN網路會對前面資訊進行記憶，並將其運用在當前輸出的計算當中，即隱藏層的輸入不僅包含輸入層的輸出還包含上一時刻隱藏層的輸出。下圖較為形象的顯示了其工作方式：

二、LSTM神經網路

當遇到預測點與依賴點的相關資訊距離比較遠的時候就難以學到該相關資訊，稱為長時依賴問題。為很好解決這一問題，我們想到了LSTM神經網路，它是一種特殊的RNN。

其模組結構圖如下：

處理層各符號的表示的意思為：

LSTM通過一些“門”（輸入門、遺忘門、輸出門）結構讓資訊有選擇性的影響網路中的每個時刻的狀態。這裡的所謂門就是sigmoid神經網路和一個按位做乘法的操作。叫做‘門’是因為sigmoid啟用函式的全連線神經網路會輸出0到1之間的數值，描述當前有多少資訊量可以通過這個結構。

遺忘門的作用是讓RNN忘記之前沒有用的資訊。輸入們的作用就是根據當前結點輸入xt，之前狀態c(t-1)，上一時刻輸出決定當前哪些資訊進入到當前的ct。

三、程式碼樣例

# 這裡用mnist資料集來簡單說明下lstm的搭建
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# 首先觀察下資料
mnist  = input_data.read_data_sets('MNIST_data/', one_hot=True)
print(mnist.train.images.shape)
# 使用LSTM來實現mnist的分類，將輸入28*28的影象每一行看作輸入畫素序列，行行之間具有時間資訊。即step=28
# 設定超引數
#超引數
lr = 0.001
training_inter = 100000
batch_size = 128
# display_step = 10 #

n_input = 28 # w
n_step = 28 # h
n_hidden = 128
n_classes = 10

# placeholder
x = tf.placeholder(tf.float32, [None, n_input, n_step])
y = tf.placeholder(tf.float32, [None, n_classes])

weights = {
    'in': tf.Variable(tf.random_normal([n_input, n_hidden])), # (28, 128)
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes])) # (128, 10)
}
biases = {
    'in': tf.Variable(tf.constant(0.1, shape=[n_hidden])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes]))
}

def RNN(x, weights, biases):
    # 原始的x是3維,需要將其變為2為的，才能和weight矩陣乘法
    # x=(128, 28, 28) ->> (128*28, 28)
    X = tf.reshape(x, [-1, n_input])
    X_in = tf.matmul(X, weights['in']) + biases['in'] # (128*28, 128)
    X_in = tf.reshape(X_in, [-1, n_step, n_hidden]) # (128, 28, 128)
    # 定義LSTMcell
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden)
    init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
    outputs, final_state = tf.nn.dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=False)
    results = tf.matmul(final_state[1], weights['out']) + biases['out']
    return results

pre = RNN(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pre))
train_op = tf.train.AdamOptimizer(lr).minimize(cost)

correct_pred = tf.equal(tf.argmax(pre, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    step = 0
    while step*batch_size < training_inter:
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        batch_xs = batch_xs.reshape([batch_size, n_step, n_input])
        sess.run([train_op], feed_dict={x: batch_xs, y: batch_ys})
        if step % 20 == 0:
            print(sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys}))
        step += 1

結果如下：

0.21875
0.6875
0.765625
0.828125
0.875
0.8515625
0.875
0.8671875
0.921875
0.890625
0.8671875
0.921875
0.9296875
0.921875
0.96875
0.9140625
0.96875
0.984375
0.9609375
0.9453125
0.9453125
0.953125
0.9609375
0.9453125
0.96875
0.9453125
0.9609375
0.9609375
0.96875
0.9609375
0.984375
0.9375
0.9609375
0.9609375
0.953125
0.9765625
0.9765625
0.9609375
0.9765625
0.9453125

初識LSTM——遷移學習小記（四）

一、遞迴神經網路（RRN）

二、LSTM神經網路

三、程式碼樣例

初識LSTM——遷移學習小記（四）

遷移學習小記（一）僅用於學習記錄

續（利用tensorflow實現簡單的卷積神經網路-對程式碼中相關函式介紹）——遷移學習小記（三）

利用tensorflow實現簡單的卷積神經網路——遷移學習小記（二）

.net core 2.0學習筆記（四）：遷移.net framework 工程到.net core

Cocos2d-x學習筆記（四）布景層的加入移除

機器學習筆記（四）機器學習可行性分析

Python_sklearn機器學習庫學習筆記（四）decision_tree（決策樹）

學習yaf（四）路由

Python學習筆記（四）列表生成式_生成器

Unity3D之Mecanim動畫系統學習筆記（四）：Animation State

python機器學習實戰（四）

JavaScript學習日誌（四）：BOM

C語言學習系列（四）C語言基本語法和數據類型

ES6學習筆記（四）—— async 函數

Hibernate學習筆記（四） --- 映射基本數據類型的List集合

java學習筆記（四）：import語法

Cesium學習筆記（四）Camera

Flask 學習系列（四）---Jinjia2 模板繼承

JAVA學習總結（四）

初識LSTM——遷移學習小記（四）

一、遞迴神經網路（RRN）

二、LSTM神經網路

三、程式碼樣例

相關推薦