1. 程式人生 > >神經網路在TensorFlow實現

神經網路在TensorFlow實現

1.引言

1.1神經網路的術語

1.偏置bias:

2.啟用函式:sigmoid函式;tanh函式;Relu函式。

3.損失函式:最小平方誤差準則(MSE)、交叉熵(cross-entropy)、對數似然函式(log-likelihood)、指數損失函式(exp-loss)、Hinge損失函式(Hinge-loss)、0-1損失函式、絕對值損失函式

4.反向傳播優化演算法:隨機梯度下降(SGD)、動量(Momentum)、涅斯捷羅夫(Nesterov)演算法、Adagrad、Adadelta、Adam、Adamax、Nadam.

1.1.3損失函式TensorFlow語句

#mse
mse = tf.reduce_sum(tf.square(y_ -  y))
#交叉熵
cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))) 
#softmax迴歸後的交叉熵
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y,y_)

1.2神經網路的實現過程

-提取實體的特徵向量;
-定義神經網路的結構;
-用訓練資料得到引數;
-訓練好後預測未知資料;

2、前向傳播演算法實現

這裡寫圖片描述

知識點:
1、x=tf.constant([[0.7,0.9]])代表shape[1,2]?等價於
x=tf.constant([0.7,0.9],shape[1,2])

2.2示例2

# -*- coding: utf-8 -*-
"""
Created on Sun Dec 10 21:53:08 2017

@author: RoFun
"""

import tensorflow as tf

w1=tf.Variable(tf.random_normal([2,3],stddev=1,seed=1))
#b1=tf.Variable(tf.constant(0,shape=[3]))
w2=tf.Variable(tf.random_normal([3,1],stddev=1,seed=1)) #b2=tf.Variable(tf.constant(0,shape=[1])) #x=tf.constant([[0.7,0.9]]) x=tf.placeholder(tf.float32,name='input') a=tf.matmul(x,w1) y=tf.matmul(a,w2) sess=tf.Session() init_op=tf.global_variables_initializer() sess.run(init_op) print(sess.run(y,feed_dict={x:[[0.7,0.9]]})) sess.close()

執行結果

runfile('G:/tensorflow/qianxiang.py', wdir='G:/tensorflow')
[[ 3.95757794]]
知識點:
1.使用tf.placeholder()來存放輸入資料,可以不定義shape屬性;
2、feed_dict是placeholder用到的取值字典(map);

3、神經網路模型訓練

上一節使用的神經網路結構,給定的引數都是隨機的,在使用神經網路解決實際的分類、迴歸問題,我們往往需要採用監督方式來獲得結構的引數。

3.1模型訓練步驟

訓練的步驟:
1、定義神經網路的結構和前向傳播輸出結果;
2、定義損失函式和反向傳播演算法優化;
3、生成會話,並在訓練資料上反覆執行反向傳播優化演算法。
#定義損失函式
cross_entropy=-tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,e-10,1.0)))
#定義學習率
learning_rate=0.001
#定義反向傳播演算法
train_step=\tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

3.2 demo

這個demo是線性處理資料的神經網路結構。

#完整神經網路樣例程式
import tensorflow as tf
from numpy.random import RandomState

#1. 定義神經網路的引數,輸入和輸出節點。
batch_size = 8
w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
w2= tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
x = tf.placeholder(tf.float32, shape=(None, 2), name="x-input")
y_= tf.placeholder(tf.float32, shape=(None, 1), name='y-input')

#2. 定義前向傳播過程,損失函式及反向傳播演算法。
a = tf.matmul(x, w1)
y = tf.matmul(a, w2)
cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))) 
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)

#3. 生成模擬資料集。
rdm = RandomState(1)
X = rdm.rand(128,2)
Y = [[int(x1+x2 < 1)] for (x1, x2) in X]

#4. 建立一個會話來執行TensorFlow程式。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # 輸出目前(未經訓練)的引數取值。
    print("w1:", sess.run(w1))
    print("w2:", sess.run(w2))
    print("\n")

    # 訓練模型。
    STEPS = 5000
    for i in range(STEPS):
        start = (i*batch_size) % 128
        end = (i*batch_size) % 128 + batch_size
        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
        if i % 1000 == 0:
            total_cross_entropy = sess.run(cross_entropy, feed_dict={x: X, y_: Y})
            print("After %d training step(s), cross entropy on all data is %g" % (i, total_cross_entropy))

    # 輸出訓練後的引數取值。
    print("\n")
    print("w1:", sess.run(w1))
    print("w2:", sess.run(w2))

輸出結果

w1: [[-0.81131822  1.48459876  0.06532937]
 [-2.4427042   0.0992484   0.59122431]]
w2: [[-0.81131822]
 [ 1.48459876]
 [ 0.06532937]]


After 0 training step(s), cross entropy on all data is 0.0674925
After 1000 training step(s), cross entropy on all data is 0.0163385
After 2000 training step(s), cross entropy on all data is 0.00907547
After 3000 training step(s), cross entropy on all data is 0.00714436
After 4000 training step(s), cross entropy on all data is 0.00578471


w1: [[-1.9618274   2.58235407  1.68203783]
 [-3.46817183  1.06982327  2.11788988]]
w2: [[-1.8247149 ]
 [ 2.68546653]
 [ 1.41819513]]

知識點:

1. tf.reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None)

功能:求平均值

引數1--input_tensor:待求值的tensor。

引數2--reduction_indices:在哪一維上求解。

2.常用優化函式:
tf.train.GradientDescentOptimizer/
tf.train.AdamOptimizer()/
tf.train.MomentumOptimizer()

4、完整前向神經網路樣例

4.1 示例1

下面的程式還是使用線性的神經網路,主要增加了從CSV檔案讀取資料的過程。

#-基於神經網路模型預測

#完整神經網路樣例程式
import tensorflow as tf
from numpy.random import RandomState

#1. 定義神經網路的引數,輸入和輸出節點。
batch_size = 8

w1= tf.Variable(tf.random_normal([6, 7], stddev=1, seed=1))
w2= tf.Variable(tf.random_normal([7, 1], stddev=1, seed=1))
x = tf.placeholder(tf.float32,  name="x-input")
y_= tf.placeholder(tf.float32,  name='y-input')

#2. 定義前向傳播過程,損失函式及反向傳播演算法。
a = tf.matmul(x, w1)
y = tf.matmul(a, w2)
#cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))) 

mse = tf.reduce_sum(tf.square(y_ -  y))
#train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer(0.001).minimize(mse)

# #3. 生成模擬資料集。
# rdm = RandomState(1)
# X = rdm.rand(128,2)
# Y = [[int(x1+x2 < 1)] for (x1, x2) in X]



#3.讀取csv至字典x,y
import csv

# 讀取csv至字典
csvFile = open(r'G:\訓練小樣本.csv', "r")
reader = csv.reader(csvFile)
#print(reader)

# 建立空字典
result = {}

i=0
for item in reader:
    if reader.line_num==1:
        continue
    result[i]=item
    i=i+1

 # 建立空字典   
j=0
xx={}
yy={}
for i in list(range(29)):
    xx[j]=result[i][1:-1]
    yy[j]=result[i][-1]
    # print(x[j])
    # print(y[j])
    j=j+1

csvFile.close()

##3.1字典轉換成list
X=[]
Y=[]
for i in xx.values():
    X.append(i)

#for j in xx.values():
#    X.append(j)    
for j in yy.values():
    Y.append(j)  

#4. 建立一個會話來執行TensorFlow程式。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # 輸出目前(未經訓練)的引數取值。
    print("w1:", sess.run(w1))
    print("w2:", sess.run(w2))
    print("\n")

    # 訓練模型。
    STEPS = 4
    for i in range(STEPS):
        start = (i*batch_size) % 29
        end = (i*batch_size) % 128 + batch_size
        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
            # total_cross_entropy = sess.run(cross_entropy, feed_dict={x: X, y_: Y})
            # print("After %d training step(s), cross entropy on all data is %g" % (i, total_cross_entropy))
        total_mse=sess.run(mse,feed_dict={x: X, y_: Y})
        print("After %d training step(s), mse on all data is %g" % (i, total_mse))

    # 輸出訓練後的引數取值。
    print("\n")
    print("w1:", sess.run(w1))
    print("w2:", sess.run(w2))

示例2

下面的demo主要加了bias和啟用函式(activation),從而可以處理非線性的問題。

import tensorflow as tf

#weights=tf.Variable(tf.random_normal([2.0,3.0],stddev=2))

w1=tf.Variable(tf.random_normal([2,3],stddev=1,seed=1))
b1=tf.Variable(tf.constant(0.0,shape=[3]))

w2=tf.Variable(tf.random_normal([3,1],stddev=1,seed=1))
b2=tf.Variable(tf.constant(0.0,shape=[1]))

x=tf.constant([[0.7,0.9]])

a=tf.nn.relu(tf.matmul(x,w1)+b1)
y=tf.nn.relu(tf.matmul(a,w2)+b2)
sess=tf.Session()

init_op=tf.global_variables_initializer()
sess.run(init_op)

print(sess.run(y))
sess.close()

5、神經網路優化演算法

神經網路僅僅使用前向傳播演算法是不夠的,我們還需要反向傳播演算法和其他優化演算法來訓練網路的引數。
下面將簡單介紹反向傳播演算法。
反向傳播演算法,是使用一些優化演算法,如梯度下降法,來優化網路的所有引數。

5.1梯度下降法
公式:
這裡寫圖片描述

5.2隨機梯度下降法(stochastic gradient descent)

使用梯度下降法,往往存在計算量大的問題,為此,我們使用了隨機梯度下降(SGD)。

SGD, 每次只取一個或者一部分樣本進行訓練,比起每次隨機取一個樣本的方法 訓練波動更小,更加穩定,更加高效,這基本上是最常用的一種優化方法。

5.3 學習率的設定

#學習率的設定

import tensorflow as tf
TRAINING_STEPS = 10
LEARNING_RATE = 1
x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x")
y = tf.square(x)

train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(TRAINING_STEPS):
        sess.run(train_op)
        x_value = sess.run(x)
        print("After %s iteration(s): x%s is %f."% (i+1, i+1, x_value))

5.4正則化處理過擬合問題

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

data = []
label = []
np.random.seed(0)

# 以原點為圓心,半徑為1的圓把散點劃分成紅藍兩部分,並加入隨機噪音。
for i in range(150):
    x1 = np.random.uniform(-1,1)
    x2 = np.random.uniform(0,2)
    if x1**2 + x2**2 <= 1:
        data.append([np.random.normal(x1, 0.1),np.random.normal(x2,0.1)])
        label.append(0)
    else:
        data.append([np.random.normal(x1, 0.1), np.random.normal(x2, 0.1)])
        label.append(1)

data = np.hstack(data).reshape(-1,2)
label = np.hstack(label).reshape(-1, 1)
plt.scatter(data[:,0], data[:,1], c=label,
           cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white")
plt.show()



def get_weight(shape, lambda1):
    var = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(lambda1)(var))
    return var

x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
sample_size = len(data)

# 每層節點的個數
layer_dimension = [2,10,5,3,1]

n_layers = len(layer_dimension)

cur_layer = x
in_dimension = layer_dimension[0]

# 迴圈生成網路結構
for i in range(1, n_layers):
    out_dimension = layer_dimension[i]
    weight = get_weight([in_dimension, out_dimension], 0.003)
    bias = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
    cur_layer = tf.nn.elu(tf.matmul(cur_layer, weight) + bias)
    in_dimension = layer_dimension[i]

y= cur_layer

# 損失函式的定義。
mse_loss = tf.reduce_sum(tf.pow(y_ - y, 2)) / sample_size
tf.add_to_collection('losses', mse_loss)
loss = tf.add_n(tf.get_collection('losses'))

# 定義訓練的目標函式mse_loss,訓練次數及訓練模型
train_op = tf.train.AdamOptimizer(0.001).minimize(mse_loss)
TRAINING_STEPS = 40000

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for i in range(TRAINING_STEPS):
        sess.run(train_op, feed_dict={x: data, y_: label})
        if i % 2000 == 0:
            print("After %d steps, mse_loss: %f" % (i,sess.run(mse_loss, feed_dict={x: data, y_: label})))

    # 畫出訓練後的分割曲線       
    xx, yy = np.mgrid[-1.2:1.2:.01, -0.2:2.2:.01]
    grid = np.c_[xx.ravel(), yy.ravel()]
    probs = sess.run(y, feed_dict={x:grid})
    probs = probs.reshape(xx.shape)

plt.scatter(data[:,0], data[:,1], c=label,
           cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white")
plt.contour(xx, yy, probs, levels=[.5], cmap="Greys", vmin=0, vmax=.1)
plt.show()

5.5滑動平均模型

#滑動平均模型

import tensorflow as tf

v1 = tf.Variable(0, dtype=tf.float32)
step = tf.Variable(0, trainable=False)
ema = tf.train.ExponentialMovingAverage(0.99, step)
maintain_averages_op = ema.apply([v1]) 

with tf.Session() as sess:

    # 初始化
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    print(sess.run([v1, ema.average(v1)]))

    # 更新變數v1的取值
    sess.run(tf.assign(v1, 5))
    sess.run(maintain_averages_op)
    print(sess.run([v1, ema.average(v1)]))

    # 更新step和v1的取值
    sess.run(tf.assign(step, 10000))  
    sess.run(tf.assign(v1, 10))
    sess.run(maintain_averages_op)
    print(sess.run([v1, ema.average(v1)]))       

    # 更新一次v1的滑動平均值
    sess.run(maintain_averages_op)
    print(sess.run([v1, ema.average(v1)]))       

執行結果:

)
[0.0, 0.0]
[5.0, 4.5]
[10.0, 4.5549998]
[10.0, 4.6094499]

知識點:

1.tf.train.ExponentialMovingAverage(decay, step):這個函式用來生成影子變數,decay為衰減率;step,是影響decay的一個引數;

6.基於反向傳播演算法的深層神經網路實現

#tch-基於深層神經網路實現

#天池-基於神經網路模型預測

import tensorflow as tf
from numpy.random import RandomState

#1. 定義神經網路的引數,輸入和輸出節點。
batch_size = 8

# w1= tf.Variable(tf.random_normal([5, 7], stddev=1, seed=1))
# w2= tf.Variable(tf.random_normal([7, 1], stddev=1, seed=1))
x = tf.placeholder(tf.float32,  name="x-input")
y_= tf.placeholder(tf.float32,  name='y-input')
# b1=tf.Variable(tf.constant([0.0,0.0,0.0,0.0,0.0,0.0,0.0],shape=[7]))
# b2=tf.Variable(tf.constant(0.0,shape=[1]))

#2.1計算3層神經網路L2正則化的損失函式

def get_weight(shape, lambda1):
    var = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(lambda1)(var))
    return var

sample_size = 8
# 每層節點的個數
layer_dimension = [5,7,1]

n_layers = len(layer_dimension)#3

cur_layer = x
in_dimension = layer_dimension[0] #2

# 迴圈生成網路結構
for i in range(1, n_layers):
    out_dimension = layer_dimension[i]
    weight = get_weight([in_dimension, out_dimension], 0.003)
    bias = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
    cur_layer = tf.nn.elu(tf.matmul(cur_layer, weight) + bias)
    in_dimension = layer_dimension[i]

y= cur_layer

# a = tf.nn.relu(tf.matmul(x, w1)+b1)
# y = tf.nn.relu(tf.matmul(a, w2)+b2)

# 正則化損失函式的定義。
mse_loss = tf.reduce_sum(tf.pow(y_ - y, 2)) / sample_size
tf.add_to_collection('losses', mse_loss)
loss = tf.add_n(tf.get_collection('losses'))
#train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

TRAINING_STEPS = 10
#LEARNING_RATE = 1
global_step=tf.Variable(0)
LEARNING_RATE=tf.train.exponential_decay(0.8,global_step,100,0.96,staircase=True)
train_step = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss,global_step=global_step)

#2.3.讀取csv至字典x,y
import csv

# 讀取csv至字典
csvFile = open(r'G:\0研究生\tianchiCompetition\訓練小樣本.csv', "r")
reader = csv.reader(csvFile)
#print(reader)

# 建立空字典
result = {}

i=0
for item in reader:
    if reader.line_num==1:
        continue
    result[i]=item
    i=i+1

 # 建立空字典   
j=0
xx={}
yy={}
for i in list(range(29)):
    xx[j]=result[i][1:-1]
    yy[j]=result[i][-1]
    # print(x[j])
    # print(y[j])
    j=j+1

csvFile.close()

##字典轉換成list
X=[]
Y=[]
for i in xx.values():
    X.append(i)

for j in xx.values():
    X.append(j)    


#4. 建立一個會話來執行TensorFlow程式。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # 輸出目前(未經訓練)的引數取值。
    print("w1:", sess.run(weight[0]))
    print("w2:", sess.run(weight[1]))
    print("\n")


    STEPS = 4
    for i in range(STEPS):
        start = (i*batch_size) % 29
        end = min((i*batch_size) % 29 + batch_size,29)
        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
        total_loss = sess.run(loss, feed_dict={x: X, y_: Y})
        print("After %d training step(s), loss on all data is %g" % (i, total_loss))  

        # 輸出訓練後的引數取值。
    print("\n")
    print("w1:", sess.run(weight[0]))
    print("w2:", sess.run(weight[1]))

執行結果:

w1: [-0.90025067]
w2: [-0.61561918]


After 0 training step(s), loss on all data is 0.0724032
After 1 training step(s), loss on all data is 0.0720561
After 2 training step(s), loss on all data is 0.0717106
After 3 training step(s), loss on all data is 0.0713668


w1: [-0.89163929]
w2: [-0.60973048]
現在對於29*6維的x,採用了兩層神經網路,效果不是很好。

後來,採用了127個樣本,並且把訓練的次數提高到1000,效果明顯:

修改部分的程式碼:

#4. 建立一個會話來執行TensorFlow程式。
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # 輸出目前(未經訓練)的引數取值。
    print("w1:", sess.run(weight[0]))
    print("w2:", sess.run(weight[1]))
    print("\n")


    STEPS = 1000
    for i in range(STEPS):
        start = (i*batch_size) % data_size
        end = min((i*batch_size) % data_size + batch_size,data_size)
        sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
        if i%200==0:
            total_loss = sess.run(loss, feed_dict={x: X, y_: Y})
            print("After %d training step(s), loss on all data is %g" % (i, total_loss))  

        # 輸出訓練後的引數取值。
    print("\n")
    print("w1:", sess.run(weight[0]))
    print("w2:", sess.run(weight[1]))

執行的結果,只給出兩個權重:

w1: [-0.38975346]
w2: [ 0.13079379]


After 0 training step(s), loss on all data is 0.0602596
After 200 training step(s), loss on all data is 0.023503
After 400 training step(s), loss on all data is 0.00986995
After 600 training step(s), loss on all data is 0.00443687
After 800 training step(s), loss on all data is 0.00212367


w1: [-0.05206319]
w2: [ 0.01747141]

採用了5000個steps,基本上是收斂了:

w1: [ 1.00491798]
w2: [ 0.45586446]


After 0 training step(s), loss on all data is 0.0393548
After 1000 training step(s), loss on all data is 0.000703361
After 2000 training step(s), loss on all data is 4.84807e-05
After 3000 training step(s), loss on all data is 8.19349e-06
After 4000 training step(s), loss on all data is 2.51321e-06


w1: [ 0.00541062]
w2: [ 0.00245444]

7、關於神經網路模型不收斂