Python與自然語言處理（三）：Tensorflow基礎學習

阿新 • • 發佈：2019-01-11

看了一段時間的TensorFlow，然而一直沒有思路，偶然看到一個講解TensorFlow的系列視訊，通俗易懂，學到了不少，在此分享一下，也記錄下自己的學習過程。

教學視訊連結：點這裡

在機器學習中，常見的就是分類問題，郵件分類，電影分類等等

我這裡使用iris的資料進行花的種類預測，iris是一個經典的資料集，在weka中也有使用。

iris資料集：點這裡

資料集示例：

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica

在做分類預測前，我對資料集進行了處理，這裡有三種類別的花，分別以（1,0,0）表示Iris-setosa，（0,1,0）表示Iris-versicolor，（0,0,1）表示Iris-virginica

處理後的資料集示例：

4.4,3.2,1.3,0.2,1,0,0
5.0,3.5,1.6,0.6,1,0,0
5.1,3.8,1.9,0.4,1,0,0
5.7,3.0,4.2,1.2,0,1,0
5.7,2.9,4.2,1.3,0,1,0
6.2,2.9,4.3,1.3,0,1,0
5.8,2.7,5.1,1.9,0,0,1
6.8,3.2,5.9,2.3,0,0,1
6.7,3.3,5.7,2.5,0,0,1

思路：

首先，將資料集分成兩份，一部分為training set，共120條資料；一部分為testing set，共30條資料。

然後，讀取資料檔案（txt格式），每條資料的1~4列為輸入變數，5~7列為花的種類，所以輸入為4元，輸出為3元。

之後，初始化權重（weights）和偏量（biase），根據Y=W*X+b進行計算。

最後，計算預測值與真實值的差距，並選擇合適的學習速率來減小差距。

具體實現如下：

# -*-coding=utf-*-
import tensorflow as tf
import numpy as np

training_data = np.loadtxt('./MNIST_data/iris_training.txt',delimiter=',',unpack=True,dtype='float32')
test_data = np.loadtxt('./MNIST_data/iris_test.txt',delimiter=',',unpack=True,dtype='float32')

training_data = training_data.T #轉置
test_data = test_data.T

#print(training_data.shape)

iris_X = training_data[:,0:4] #[行，列]
iris_Y = training_data[:,4:7]

iris_test_X = test_data[:,0:4]
iris_test_Y = test_data[:,4:7]

def add_layer(inputs,in_size,out_size,activation_function=None):
	Weights = tf.Variable(tf.random_normal([in_size,out_size]))
	biases = tf.Variable(tf.zeros([1,out_size])) + 0.1 #推薦biases最好不為零
	Wx_plus_b = tf.matmul(inputs,Weights) + biases
	if activation_function is None: #activation_function=none表示線性函式，否則是非線性
		outputs = Wx_plus_b
	else:
		outputs = activation_function(Wx_plus_b)
	return outputs

def computer_accuracy(v_xs,v_ys):
	global prediction
	y_pre = sess.run(prediction,feed_dict={xs:v_xs})
	correct_prediction = tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))
	accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
	result = sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys})
	return result

xs = tf.placeholder(tf.float32,[None,4])
ys = tf.placeholder(tf.float32,[None,3])

prediction = add_layer(xs,4,3,activation_function=tf.nn.softmax) #softmax一般用來做classification

cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction),reduction_indices=[1])) #loss
#學習速率根據 具體使用的資料進行選取
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)
init = tf.initialize_all_variables()sess = tf.Session()sess.run(init)
for i in range(700): #迭代700次
      sess.run(train_step,feed_dict={xs:iris_X,ys:iris_Y})
      if i % 50 == 0:
            print(computer_accuracy(iris_test_X,iris_test_Y),sess.run(cross_entropy,feed_dict={xs:iris_X,ys:iris_Y})) #測試準確率與訓練誤差

輸出：

第一次測試：
(0.53333336, 1.4005684)
(0.60000002, 1.0901265)
(0.56666666, 0.88059545)
(0.69999999, 0.75857288)
(0.76666665, 0.69699961)
(0.80000001, 0.66695786)
(0.80000001, 0.65065509)
(0.80000001, 0.63982773)
(0.80000001, 0.63119632)
(0.80000001, 0.62356067)
(0.80000001, 0.61648983)
(0.80000001, 0.60982168)
(0.80000001, 0.60348624)
(0.80000001, 0.59744602)

第二次測試：
(0.53333336, 6.5918231)
(0.60000002, 5.4625101)
(0.60000002, 4.4928107)
(0.60000002, 3.6116555)
(0.60000002, 2.8158391)
(0.56666666, 2.094959)
(0.53333336, 1.4308809)
(0.53333336, 0.89007533)
(0.5, 0.61958134)
(0.46666667, 0.53800708)
(0.5, 0.51004487)
(0.46666667, 0.49609584)
(0.5, 0.48745066)
(0.5, 0.48149654)

這裡使用的numpy，因此需要對numpy有所瞭解。這裡沒有涉及到隱藏層，只有一層輸入層和一層輸出層，分類的準確率不是太好，每次測試的結果也不相同，應該跟weights的初始值有關；為提高準確率，考慮加入隱藏層，按照這個思路繼續嘗試！

訓練生成的網路想要儲存下來就需要用到tensorflow.train.Saver()了，下面是一個簡單的例子：

import tensorflow as tf
import numpy as np

#save to file
W = tf.Variable([[1.,2.,3.],[4.,5.,6.]],name='weights')
b = tf.Variable([[1.,2.,3.]],name='biases')

init = tf.initialize_all_variables()

saver = tf.train.Saver()

with tf.Session() as sess:
	sess.run(init)
	save_path = saver.save(sess,"./save_net.ckpt") #儲存網路到XX路徑
	print('Save to path:',save_path)


#restore variables
#redefine the same shape and same type for your variables
<span style="font-size:18px;"></span><pre name="code" class="python">#因為訓練時 W和b都是float32的型別，若不指定資料型別會報錯

w_type = np.array(np.arange(6).reshape(2,3),dtype=np.float32)b_type = np.array(np.arange(3).reshape(1,3),dtype=np.float32)W = tf.Variable(w_type,name='weights')b = tf.Variable(b_type,name='biases')#not need init stepsaver = tf.train.Saver()with tf.Session() as sess:saver.restore(sess,'./save_net.ckpt') #從XX路徑讀取網路print('weights:',sess.run(W))print('weights:',sess.run(b))

【output】：

Python與自然語言處理（三）：Tensorflow基礎學習

Python與自然語言處理（三）：Tensorflow基礎學習

python與自然語言處理（五）：中文文字詞雲

python與自然語言處理（六）：中文文字轉影象

Python與自然語言處理（一）搭建環境

Python與自然語言處理（二）基於Gensim的Word2Vec

深度學習與自然語言處理（三）——深度學習運用到自然語言處理領域的成功案例

《使用Python進行自然語言處理（Nltk）》2

深度學習與自然語言處理（一）

聊天機器人（chatbot）終極指南：自然語言處理（NLP）和深度機器學習（Deep Machine Learning）

1.自然語言處理（NLP）與Python

python自然語言處理（二）

python自然語言處理（一）

自然語言處理（NLP）- HMM+VITERBI演算法實現詞性標註（解碼問題）（動態規劃）（Python實現）

Python 自然語言處理（NLP）工具庫彙總

Python自然語言處理（NLP）工具小結

python自然語言處理（一）之中文分詞預處理、統計詞頻

python自然語言處理（NLP）1------中文分詞1，基於規則的中文分詞方法

OpenCV3計算機視覺Python語言實現（三）：使用OpenCV3處理影象

自然語言處理中傳統詞向量表示VS深度學習語言模型（三）：word2vec詞向量

python數字圖像處理（三）邊緣檢測常用算子

Python與自然語言處理（三）：Tensorflow基礎學習

相關推薦