1. 程式人生 > >TensorFlow基礎及MNIST資料集邏輯迴歸應用實踐-大資料ML樣本集案例實戰

TensorFlow基礎及MNIST資料集邏輯迴歸應用實踐-大資料ML樣本集案例實戰

版權宣告:本套技術專欄是作者(秦凱新)平時工作的總結和昇華,通過從真實商業環境抽取案例進行總結和分享,並給出商業應用的調優建議和叢集環境容量規劃等內容,請持續關注本套部落格。QQ郵箱地址:[email protected],如有任何學術交流,可隨時聯絡。

1 TensorFlow基本使用操作

  • TensorFlow基本模型

      import tensorflow as tf
      a = 3
      # Create a variable.
      w = tf.Variable([[0.5,1.0]])
      x = tf.Variable([[2.0],[1.0]]) 
      
      y = tf.matmul(w, x)  
      
      #variables have to be explicitly initialized before you can run Ops
      init_op = tf.global_variables_initializer()
      with tf.Session() as sess:
          sess.run(init_op)
          print (y.eval())
    複製程式碼
  • TensorFlow基本資料型別

      # float32
      tf.zeros([3, 4], int32) ==> [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
      
      # 'tensor' is [[1, 2, 3], [4, 5, 6]]
      tf.zeros_like(tensor) ==> [[0, 0, 0], [0, 0, 0]]
      tf.ones([2, 3], int32) ==> [[1, 1, 1], [1, 1, 1]]
      
      # 'tensor' is [[1, 2, 3], [4, 5, 6]]
      tf.ones_like(tensor) ==> [[1, 1, 1], [1, 1, 1]]
      
      # Constant 1-D Tensor populated with value list.
      tensor = tf.constant([1, 2, 3, 4, 5, 6, 7]) => [1 2 3 4 5 6 7]
      
      # Constant 2-D tensor populated with scalar value -1.
      tensor = tf.constant(-1.0, shape=[2, 3]) => [[-1. -1. -1.]
                                                    [-1. -1. -1.]]
      
      tf.linspace(10.0, 12.0, 3, name="linspace") => [ 10.0  11.0  12.0]
      
      # 'start' is 3
      # 'limit' is 18
      # 'delta' is 3
      tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]
    複製程式碼
  • random_shuffle運算元及random_normal運算元

      norm = tf.random_normal([2, 3], mean=-1, stddev=4)
      
      # Shuffle the first dimension of a tensor
      c = tf.constant([[1, 2], [3, 4], [5, 6]])
      shuff = tf.random_shuffle(c)
      
      # Each time we run these ops, different results are generated
      sess = tf.Session()
      print (sess.run(norm))
      print (sess.run(shuff))
      
      [[-0.30886292  3.11809683  3.29861784]
       [-7.09597015 -1.89811802  1.75282788]]
      
      [[3 4]
       [5 6]
       [1 2]]
    複製程式碼
  • 簡單操作的複雜性

      state = tf.Variable(0)
      new_value = tf.add(state, tf.constant(1))
      update = tf.assign(state, new_value)
      
      with tf.Session() as sess:
          sess.run(tf.global_variables_initializer())
          print(sess.run(state))    
          for _ in range(3):
              sess.run(update)
              print(sess.run(state))
    複製程式碼
  • 模型的儲存與載入

      #tf.train.Saver
      w = tf.Variable([[0.5,1.0]])
      x = tf.Variable([[2.0],[1.0]])
      y = tf.matmul(w, x)
      init_op = tf.global_variables_initializer()
      saver = tf.train.Saver()
      with tf.Session() as sess:
          sess.run(init_op)
      # Do some work with the model.
      # Save the variables to disk.
          save_path = saver.save(sess, "C://tensorflow//model//test")
          print ("Model saved in file: ", save_path)
    複製程式碼
  • numpy與TensorFlow互轉

      import numpy as np
      a = np.zeros((3,3))
      ta = tf.convert_to_tensor(a)
      with tf.Session() as sess:
           print(sess.run(ta))
    複製程式碼
  • TensorFlow佔坑操作

      input1 = tf.placeholder(tf.float32)
      input2 = tf.placeholder(tf.float32)
      output = tf.mul(input1, input2)
      with tf.Session() as sess:
          print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))
    複製程式碼

2 TensorFlow線性迴歸實現

  • numpy線性資料集生成

      import numpy as np
      import tensorflow as tf
      import matplotlib.pyplot as plt
      
      # 隨機生成1000個點,圍繞在y=0.1x+0.3的直線周圍
      num_points = 1000
      vectors_set = []
      for i in range(num_points):
          x1 = np.random.normal(0.0, 0.55)
          y1 = x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
          vectors_set.append([x1, y1])
      
      # 生成一些樣本
      x_data = [v[0] for v in vectors_set]
      y_data = [v[1] for v in vectors_set]
      
      plt.scatter(x_data,y_data,c='r')
      plt.show()
    複製程式碼

  • TensorFlow實現線性模型

       生成1維的W矩陣,取值是[-1,1]之間的隨機數
      W = tf.Variable(tf.random_uniform([1], -1.0, 1.0), name='W')
      # 生成1維的b矩陣,初始值是0
      b = tf.Variable(tf.zeros([1]), name='b')
      # 經過計算得出預估值y
      y = W * x_data + b
      
      # Loss: 以預估值y和實際值y_data之間的均方誤差作為損失
      loss = tf.reduce_mean(tf.square(y - y_data), name='loss')
      # 優化器:採用梯度下降法來優化引數(train模組,引數表示學習率)
      optimizer = tf.train.GradientDescentOptimizer(0.5)
      
      # 開始訓練:訓練的過程就是最小化這個誤差值
      train = optimizer.minimize(loss, name='train')
      
      sess = tf.Session()
      
      init = tf.global_variables_initializer()
      sess.run(init)
      
      # 初始化的W和b是多少
      print ("W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss))
      # 執行20次訓練
      for step in range(20):
          sess.run(train)
          # 輸出訓練好的W和b
          print ("W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss))
      writer = tf.train.SummaryWriter("./tmp", sess.graph)
    複製程式碼
  • TensorFlow迭代結果

      W = [ 0.96539688] b = [ 0.] loss = 0.297884
      W = [ 0.71998411] b = [ 0.28193575] loss = 0.112606
      W = [ 0.54009342] b = [ 0.28695393] loss = 0.0572231
      W = [ 0.41235447] b = [ 0.29063231] loss = 0.0292957
      W = [ 0.32164571] b = [ 0.2932443] loss = 0.0152131
      W = [ 0.25723246] b = [ 0.29509908] loss = 0.00811188
      W = [ 0.21149193] b = [ 0.29641619] loss = 0.00453103
      W = [ 0.17901111] b = [ 0.29735151] loss = 0.00272536
      W = [ 0.15594614] b = [ 0.29801565] loss = 0.00181483
      W = [ 0.13956745] b = [ 0.29848731] loss = 0.0013557
      W = [ 0.12793678] b = [ 0.29882219] loss = 0.00112418
      W = [ 0.11967772] b = [ 0.29906002] loss = 0.00100743
      W = [ 0.11381286] b = [ 0.29922891] loss = 0.000948558
      W = [ 0.10964818] b = [ 0.29934883] loss = 0.000918872
      W = [ 0.10669079] b = [ 0.29943398] loss = 0.000903903
      W = [ 0.10459071] b = [ 0.29949448] loss = 0.000896354
      W = [ 0.10309943] b = [ 0.29953739] loss = 0.000892548
      W = [ 0.10204045] b = [ 0.29956791] loss = 0.000890629
      W = [ 0.10128847] b = [ 0.29958954] loss = 0.000889661
      W = [ 0.10075447] b = [ 0.29960492] loss = 0.000889173
      W = [ 0.10037527] b = [ 0.29961586] loss = 0.000888927
    
      plt.scatter(x_data,y_data,c='r')
      plt.plot(x_data,sess.run(W)*x_data+sess.run(b))
      plt.show()
    複製程式碼
  • 版權宣告:本套技術專欄是作者(秦凱新)平時工作的總結和昇華,通過從真實商業環境抽取案例進行總結和分享,並給出商業應用的調優建議和叢集環境容量規劃等內容,請持續關注本套部落格。QQ郵箱地址:[email protected],如有任何學術交流,可隨時聯絡。

3 MNIST資料集載入介紹

  • 載入

      import numpy as np
      import tensorflow as tf
      import matplotlib.pyplot as plt
      #from tensorflow.examples.tutorials.mnist import input_data
      import input_data
      
      print ("packs loaded")
      
      print ("Download and Extract MNIST dataset")
      ##使用one_hot 01編碼
      mnist = input_data.read_data_sets('data/', one_hot=True)
      print
      print (" tpye of 'mnist' is %s" % (type(mnist)))
      print (" number of trian data is %d" % (mnist.train.num_examples))
      print (" number of test data is %d" % (mnist.test.num_examples))
      
      Download and Extract MNIST dataset
      Extracting data/train-images-idx3-ubyte.gz
      Extracting data/train-labels-idx1-ubyte.gz
      Extracting data/t10k-images-idx3-ubyte.gz
      Extracting data/t10k-labels-idx1-ubyte.gz
       tpye of 'mnist' is <class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
       number of trian data is 55000
       number of test data is 10000
    複製程式碼
  • What does the data of MNIST look like?

      print ("What does the data of MNIST look like?")
      trainimg   = mnist.train.images
      trainlabel = mnist.train.labels
      testimg    = mnist.test.images
      testlabel  = mnist.test.labels
      print
      print (" type of 'trainimg' is %s"    % (type(trainimg)))
      print (" type of 'trainlabel' is %s"  % (type(trainlabel)))
      print (" type of 'testimg' is %s"     % (type(testimg)))
      print (" type of 'testlabel' is %s"   % (type(testlabel)))
      print (" shape of 'trainimg' is %s"   % (trainimg.shape,))
      print (" shape of 'trainlabel' is %s" % (trainlabel.shape,))
      print (" shape of 'testimg' is %s"    % (testimg.shape,))
      print (" shape of 'testlabel' is %s"  % (testlabel.shape,))
    
    
      What does the data of MNIST look like?
       type of 'trainimg' is <class 'numpy.ndarray'>
       type of 'trainlabel' is <class 'numpy.ndarray'>
       type of 'testimg' is <class 'numpy.ndarray'>
       type of 'testlabel' is <class 'numpy.ndarray'>
       shape of 'trainimg' is (55000, 784)
       shape of 'trainlabel' is (55000, 10)
       shape of 'testimg' is (10000, 784)
       shape of 'testlabel' is (10000, 10)
    複製程式碼
  • How does the training data look like?

      # How does the training data look like?
      print ("How does the training data look like?")
      nsample = 5
      randidx = np.random.randint(trainimg.shape[0], size=nsample)
      
      for i in randidx:
          curr_img   = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix 
          curr_label = np.argmax(trainlabel[i, :] ) # Label
          plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
          plt.title("" + str(i) + "th Training Data " 
                    + "Label is " + str(curr_label))
          print ("" + str(i) + "th Training Data " 
                 + "Label is " + str(curr_label))
          plt.show()
    複製程式碼

  • Batch Learning?

     print ("Batch Learning? ")
     batch_size = 100
     batch_xs, batch_ys = mnist.train.next_batch(batch_size)
     print ("type of 'batch_xs' is %s" % (type(batch_xs)))
     print ("type of 'batch_ys' is %s" % (type(batch_ys)))
     print ("shape of 'batch_xs' is %s" % (batch_xs.shape,))
     print ("shape of 'batch_ys' is %s" % (batch_ys.shape,))
    
     Batch Learning? 
     type of 'batch_xs' is <class 'numpy.ndarray'>
     type of 'batch_ys' is <class 'numpy.ndarray'>
     shape of 'batch_xs' is (100, 784)
     shape of 'batch_ys' is (100, 10)
    複製程式碼

4 MNIST資料集邏輯迴歸測試

  • tensorflow的tf.reduce_mean函式

      m1 = tf.reduce_mean(x, axis=0)
      結果為:[1.5, 1.5]
    複製程式碼
  • tensorflow的argmaxtensorflow的 sess = tf.InteractiveSession()

      arr = np.array([[31, 23,  4, 24, 27, 34],
                      [18,  3, 25,  0,  6, 35],
                      [28, 14, 33, 22, 20,  8],
                      [13, 30, 21, 19,  7,  9],
                      [16,  1, 26, 32,  2, 29],
                      [17, 12,  5, 11, 10, 15]])
      
      #列印加上eval 
      ## 矩陣的維度 2
      #tf.rank(arr).eval()
      
      ## 矩陣行和列 [6,6]
      #tf.shape(arr).eval()
      
      # 引數0表示維度,按照列。  表示最每列最大值的索引 [0,3,2,4,0,1]
      #tf.argmax(arr, 0).eval()
      # 0 -> 31 (arr[0, 0])
      # 3 -> 30 (arr[3, 1])
      # 2 -> 33 (arr[2, 2])
      tf.argmax(arr, 1).eval()
      # 5 -> 34 (arr[0, 5])
      # 5 -> 35 (arr[1, 5])
      # 2 -> 33 (arr[2, 2])
    
      array([5, 5, 2, 1, 3, 0], dtype=int64)
    複製程式碼
  • 載入資料集

      import numpy as np
      import tensorflow as tf
      import matplotlib.pyplot as plt
      import input_data
      
      mnist      = input_data.read_data_sets('data/', one_hot=True)
      trainimg   = mnist.train.images
      trainlabel = mnist.train.labels
      testimg    = mnist.test.images
      testlabel  = mnist.test.labels
      print ("MNIST loaded")
      
      Extracting data/train-images-idx3-ubyte.gz
      Extracting data/train-labels-idx1-ubyte.gz
      Extracting data/t10k-images-idx3-ubyte.gz
      Extracting data/t10k-labels-idx1-ubyte.gz
      MNIST loaded
      
      print (trainimg.shape)
      print (trainlabel.shape)
      print (testimg.shape)
      print (testlabel.shape)
      #print (trainimg)
      print (trainlabel[0])
      
      (55000, 784)
      (55000, 10)
      (10000, 784)
      (10000, 10)
      [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
    複製程式碼
  • TF邏輯迴歸模型構建

      # 先放坑(每一行是一個樣本)
      x = tf.placeholder("float", [None, 784])
      # 總共10位 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
      y = tf.placeholder("float", [None, 10])  # None is for infinite 
      
      #10分類任務 784輸入,10代表輸出
      W = tf.Variable(tf.zeros([784, 10]))
      
      # 10代表輸出
      b = tf.Variable(tf.zeros([10]))
      
      # LOGISTIC REGRESSION MODEL(輸出為10)
      actv = tf.nn.softmax(tf.matmul(x, W) + b) 
      
      # COST FUNCTION(損失函式)
      cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(actv), reduction_indices=1)) 
      
      # OPTIMIZER
      learning_rate = 0.01
      optm = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    複製程式碼
  • TF模型訓練

      ##迭代次數
      training_epochs = 50
      每次迭代多少樣本
      batch_size      = 100
      display_step    = 5
      # SESSION
      sess = tf.Session()
      sess.run(init)
      # MINI-BATCH LEARNING
      for epoch in range(training_epochs):
          avg_cost = 0.
          num_batch = int(mnist.train.num_examples/batch_size)
          for i in range(num_batch): 
              batch_xs, batch_ys = mnist.train.next_batch(batch_size)
              sess.run(optm, feed_dict={x: batch_xs, y: batch_ys})
              feeds = {x: batch_xs, y: batch_ys}
              avg_cost += sess.run(cost, feed_dict=feeds)/num_batch
          # DISPLAY
          if epoch % display_step == 0:
              feeds_train = {x: batch_xs, y: batch_ys}
              feeds_test = {x: mnist.test.images, y: mnist.test.labels}
              train_acc = sess.run(accr, feed_dict=feeds_train)
              test_acc = sess.run(accr, feed_dict=feeds_test)
              print ("Epoch: %03d/%03d cost: %.9f train_acc: %.3f test_acc: %.3f" 
                     % (epoch, training_epochs, avg_cost, train_acc, test_acc))
      print ("DONE")  
      
      Epoch: 000/050 cost: 1.177906594 train_acc: 0.840 test_acc: 0.855
      Epoch: 005/050 cost: 0.440515266 train_acc: 0.860 test_acc: 0.895
      Epoch: 010/050 cost: 0.382895913 train_acc: 0.910 test_acc: 0.905
      Epoch: 015/050 cost: 0.356607343 train_acc: 0.870 test_acc: 0.909
      Epoch: 020/050 cost: 0.341326642 train_acc: 0.860 test_acc: 0.912
      Epoch: 025/050 cost: 0.330556413 train_acc: 0.910 test_acc: 0.913
      Epoch: 030/050 cost: 0.321508561 train_acc: 0.840 test_acc: 0.916
      Epoch: 035/050 cost: 0.314936944 train_acc: 0.940 test_acc: 0.917
      Epoch: 040/050 cost: 0.309805418 train_acc: 0.940 test_acc: 0.918
      Epoch: 045/050 cost: 0.305343132 train_acc: 0.960 test_acc: 0.918
      DONE
    複製程式碼

5 總結

通過簡單的案例,真正明白TensorFlow設計思想,才是本文的目的。

版權宣告:本套技術專欄是作者(秦凱新)平時工作的總結和昇華,通過從真實商業環境抽取案例進行總結和分享,並給出商業應用的調優建議和叢集環境容量規劃等內容,請持續關注本套部落格。QQ郵箱地址:[email protected],如有任何學術交流,可隨時聯絡。

秦凱新 於深圳 201812092128