tensorflow資料讀取和處理

阿新 • • 發佈：2018-12-13

檔案匹配

["file0", "file1"]或[("file%d" % i) for i in range(2)]

files = tf.train.match_filenames_once("C:/path/to/data.tfrecords-*")

讀取檔案佇列

二進位制檔案，每一個畫素點的代表佔用一個byte的檔案，所以在以二進位制儲存的圖片中，圖片總共的畫素點表示一張圖片的大小

tf.train.string_input_producer()

傳入以一個檔案列表，系統會自動生成檔名的佇列

num_epochs: 使用佇列的次數

shufflue: 對傳入的檔案列表進行打亂

*注意：只使用tf.train.string_input_producer()

，不會吐出資料（裡面的檔案不會流動起來，處與停滯狀態）只用呼叫 tf.train.start_queue_runers 後才會使停滯的資料流動起來，程式不會陷入等待狀態。

batch 輸出佇列

深入tensorflow

讀取

tf.wholeFileReader()

讀取佇列中的檔案佇列中的檔案，讀取整個檔案，如果一個檔案中有多個檔案，則不能使用wholeFileReader，讀取一個檔案中的多個檔案使用 tf.FIxedLengthRecordReader()(已固定的大小讀取檔案中)

tf.FIxedLengthRecordReader()

每次一固定的大小讀取一個檔案中的片段

TFrecorder

製作

# open TFRecord file
writer = tf.python_io.TFRecordWriter('%s.tfrecord' %'test')

# 這裡我們將會寫3個樣本，每個樣本里有4個feature：標量，向量，矩陣，張量
for i in range(3):
    # 建立字典
    features={}
    # 寫入標量，型別Int64，由於是標量，所以"value=[scalars[i]]" 變成list
    features['scalar'] = tf.train.Feature(int64_list=tf.train.Int64List( 
value=[scalars[i]]))
    
    # 寫入向量，型別float，本身就是list，所以"value=vectors[i]"沒有中括號
    features['vector'] = tf.train.Feature(float_list = tf.train.FloatList(value=vectors[i]))
    
    # 寫入矩陣，型別float，本身是矩陣，一種方法是將矩陣flatten成list
    features['matrix'] = tf.train.Feature(float_list = tf.train.FloatList(value=matrices[i].reshape(-1)))
    # 然而矩陣的形狀資訊(2,3)會丟失，需要儲存形狀資訊，隨後可轉回原形狀
    features['matrix_shape'] = tf.train.Feature(int64_list = tf.train.Int64List(value=matrices[i].shape))
    
    # 寫入張量，型別float，本身是三維張量，另一種方法是轉變成字元型別儲存，隨後再轉回原型別
    features['tensor']= tf.train.Feature(bytes_list=tf.train.BytesList(value=[tensors[i].tostring()]))
    # 儲存丟失的形狀資訊(806,806,3)
    features['tensor_shape'] = tf.train.Feature(int64_list = tf.train.Int64List(value=tensors[i].shape))
  
    # 將存有所有feature的字典送入tf.train.Features中
    tf_features = tf.train.Features(feature= features)
    # 再將其變成一個樣本example
    tf_example = tf.train.Example(features = tf_features)
    # 序列化該樣本
    tf_serialized = tf_example.SerializeToString()
    # write
    writer.write(tf_serialized)
# close  
writer.close()

載入

def parse_function(example_proto):
    # example_proto，tf_serialized
    dics = {'scalar': tf.FixedLenFeature(shape=(), dtype=tf.int64, default_value=None),            
        # when parse the example, shape below can be used as reshape, for example reshape (3,) to (1,3)
        'vector': tf.FixedLenFeature(shape=(1,3), dtype=tf.float32), 
        
        # we can use VarLenFeature, but it returns SparseTensor
        'matrix': tf.VarLenFeature(dtype=dtype('float32')), 
        'matrix_shape': tf.FixedLenFeature(shape=(2,), dtype=tf.int64), 
        
        # tensor在寫入時 使用了toString()，shape是()
        # we first set the type as tf.string, then change to its original type: tf.uint8
        'tensor': tf.FixedLenFeature(shape=(), dtype=tf.string), 
        'tensor_shape': tf.FixedLenFeature(shape=(3,), dtype=tf.int64)}
# parse all features in a single example according to the dics
parsed_example = tf.parse_single_example(example_proto, dics)
# decode string
parsed_example['tensor'] = tf.decode_raw(parsed_example['tensor'], tf.uint8)
# sparse_tensor_to_dense
parsed_example['matrix'] = tf.sparse_tensor_to_dense(parsed_example['matrix'])

# reshape matrix
parsed_example['matrix'] = tf.reshape(parsed_example['matrix'], parsed_example['matrix_shape'])

# reshape tensor
parsed_example['tensor'] = tf.reshape(parsed_example['tensor'], parsed_example['tensor_shape'])
return parsed_example

處理

程式碼應用

21個專案 p28

with tf.Session() as sess:
    # 我們要讀三幅圖片A.jpg, B.jpg, C.jpg
    filename = ['A.jpg', 'B.jpg', 'C.jpg']
    # string_input_producer會產生一個檔名佇列
    filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=5)
    # reader從檔名佇列中讀資料。對應的方法是reader.read
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)
    # tf.train.string_input_producer定義了一個epoch變數，要對它進行初始化
    tf.local_variables_initializer().run()
    # 使用start_queue_runners之後，才會開始填充佇列
    threads = tf.train.start_queue_runners(sess=sess)
    i = 0
    while True:
        i += 1
        # 獲取圖片資料並儲存
        image_data = sess.run(value)
        with open('read/test_%d.jpg' % i, 'wb') as f:
            f.write(image_data)
# 程式最後會丟擲一個OutOfRangeError，這是epoch跑完，佇列關閉的標誌

tensorflow資料讀取和處理

檔案匹配 ["file0", "file1"]或[("file%d" % i) for i in range(2)] files = tf.train.match_filenames_once("C:/path/to/data.tfrecords-*") 讀取

TensorFlow走過的坑之---資料讀取和tf中batch的使用方法

首先介紹資料讀取問題，現在TensorFlow官方推薦的資料讀取方法是使用tf.data.Dataset，具體的細節不在這裡贅述，看官方文件更清楚，這裡主要記錄一下官方文件沒有提到的坑，以示"後人"。因為是記錄踩過的坑，所以行文混亂，見諒。 I 問題背景不感興趣的可跳過此節。最近在研究ENAS的程式

python包-numpy資料讀取和儲存（二）

目錄 0.為什麼要使用numpy儲存資料 1.儲存為二進位制檔案(.npy/.npz)並讀取 numpy.save和numpy.load numpy.savez numpy.savez_compressed 2.儲存到文字檔案 numpy.savetxt nump

Java byte資料轉換和處理總結

一.byte和int相互轉換的方法 java程式或Android程式的socket資料傳輸，都是通過byte陣列，但是int型別是4個byte組成的，如何把一個整形int轉換成byte陣列，同時如何把一個長度為4的byte陣列轉換為int型別。 /** * int到byte[]

Appium資料配置-Yaml資料讀取和轉換（2）

背景 Appium裡面的capability檔案在遇到不同裝置或測試不同軟體時需要手動修改，此時直接在程式碼內修改引數，顯然是可不取的，故使用Yaml來配置相關引數，自動化指令碼直接呼叫對應的引數即可。此外Yaml語言是一種通用的資料序列化格式。 Yaml語法規則如下：大小寫敏感

串列埠的非同步讀取和處理

string strAllChar = "1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM"; &nb

Excel 檔案資料讀取和篩選

需求：已知一個excel 表中的"Sheet1"中，有id, name, salary 3列的內容，要求將薪水重複次數最多的按從高到低進行排序 #coding=utf-8 import xlrd from collections import Counter import opera

pytorch中資料載入和處理例項

pytorch中資料載入和處理例項 **A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make d

TensorFlow 資料讀取方法總結

作者：黑暗星球原文地址：https://blog.csdn.net/u014061630/article/details/80712635 ====================下一篇：tf.data 官方教程==================== ==============

Tensorflow資料讀取機制及tfrecords高效讀取資料

1. tensorflow 的資料讀取機制以影象資料為例，資料讀取過程如下所示：假設我們的硬碟中有一個圖片資料集0001.jpg，0002.jpg，0003.jpg……我們只需要把它們讀取到記憶體中，然後提供給GPU或是CPU進行計算就

Tensorflow資料讀取方式總結

1、使用placeholder讀記憶體中的資料最簡單的一種方法是用placeholder，然後以feed_dict將資料給holder的變數，進行傳遞值。如下面程式碼所示： from __future__ import print_function i

tensorflow資料讀取之tfrecords

掌握一個深度學習框架的用法，從訓練一個模型的流程來看，需要掌握以下幾個步驟： 1. 資料的處理，包括訓練資料轉成網路的輸入，模型引數的儲存與讀取 2. 網路結構的定義，包括網路主體的搭建以及loss的定義 3. solver的定義，也就是如何對網路進行優化

TensorFlow資料讀取模組呼叫過程（cifar10）

最近在看TensorFlow資料讀取模組，有了一點思路，先把讀取部分的呼叫過程寫下來，以cifar10為例。入口 cifar10_train.py distorted_inputs() 函式執行資料讀取 def train(): with tf.Graph().a

tensorflow 1.0 學習：十圖詳解tensorflow資料讀取機制

本文轉自：https://zhuanlan.zhihu.com/p/27238630 在學習tensorflow的過程中，有很多小夥伴反映讀取資料這一塊很難理解。確實這一塊官方的教程比較簡略，網上也找不到什麼合適的學習材料。今天這篇文章就以圖片的形式，用最簡單的語言，為大家詳細解釋一下tensorflow的

arm9+linux fl2440 GPS 資料採集和處理

======================================================================= 主機作業系統：centos 6.7

python的兩種影象讀取和處理方法

1 skimage庫 from skimage import data_dir,io, data img = data.camera() #讀取data中預先儲存的影象camera #Anaconda 版本參

TensorFlow資料讀取方法

轉自：http://honggang.io/2016/08/19/tensorflow-data-reading/ 引言 Tensorflow的資料讀取有三種方式： Preloaded data: 預載入資料Feeding: Python產生資料，再把資料餵給後端

Hadoop-No.15之Flume基於事件的資料收集和處理

Flume是一種分散式的可靠開源系統,用於流資料的高效收集,聚集和移動.Flume通常用於移動日誌資料.但是也能移動大量事件資料.如社交媒體訂閱,訊息佇列事件或者網路流量資料. Flume架構

ObjectInputStream和ObjectOutputStream的用法（Map資料讀取和寫入）

ObjectOutputStream的簡單用法 /** * 寫入Object資料 * * @param fileName * 需要寫入的檔案 * @par

用Python對XML讀取和處理

簡介 XML不是為了方便閱讀而設計，而是為了編碼為資料。當有一些文字有很多文件，可以用編碼的方式使3一些文字便與處理。設計原則提供不依賴平臺的資料轉移方便的編寫讀寫XML程式資料格式是可驗證的便於人工閱讀為了支援各種應用而設計練習：提取XML資

tensorflow資料讀取和處理

檔案匹配

讀取檔案佇列

batch 輸出佇列

讀取

tf.wholeFileReader()

tf.FIxedLengthRecordReader()

TFrecorder

製作

載入

處理

程式碼應用

相關推薦