MNIST資料集的格式以及讀取方式
MNIST 網站
http://yann.lecun.com/exdb/mnist/
四個檔案
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
下下來後 解壓
$ gunzip *.gz
t10k-images-idx3-ubyte
train-images-idx3-ubyte
t10k-labels-idx1-ubyte
train-labels-idx1-ubyte
解壓後會生成上面的四個檔案
檔案的格式
There are 4 files:
train-images-idx3-ubyte: training set images
train-labels-idx1-ubyte: training set labels
t10k-images-idx3-ubyte: test set images
t10k-labels-idx1-ubyte: test set labels
The training set contains 60000 examples, and the test set 10000 examples.
The first 5000 examples of the test set are taken from the original NIST training set. The last 5000 are taken from the original NIST test set. The first 5000 are cleaner and easier than the last 5000.
TRAINING SET LABEL FILE (train-labels-idx1-ubyte) :
[offset] [type] [value] [description]
0000 32 bit integer 0x00000801(2049) magic number (MSB first)
0004 32 bit integer 60000 number of items
0008 unsigned byte ?? label
0009 unsigned byte ?? label
........
xxxx unsigned byte ?? label
The labels values are 0 to 9.
TRAINING SET IMAGE FILE (train-images-idx3-ubyte):
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
........
xxxx unsigned byte ?? pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).
TEST SET LABEL FILE (t10k-labels-idx1-ubyte):
[offset] [type] [value] [description]
0000 32 bit integer 0x00000801(2049) magic number (MSB first)
0004 32 bit integer 10000 number of items
0008 unsigned byte ?? label
0009 unsigned byte ?? label
........
xxxx unsigned byte ?? label
The labels values are 0 to 9.
TEST SET IMAGE FILE (t10k-images-idx3-ubyte):
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 10000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
........
xxxx unsigned byte ?? pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).
影象檔案的前16個位元組是頭,包含了4個位元組的幻數,4個位元組表示影象數量,4個位元組表示單個影象的行數,4個位元組表示單個影象的列數.
標記檔案的前8個位元組是頭,包含了4個位元組的幻數,4個位元組表示標記數量.
下面讀取檔案
from __future__ import division
from __future__ import print_function
#gunzip *.gz
#http://yann.lecun.com/exdb/mnist/
import os
import sys
import struct
file_list = [
"train-images-idx3-ubyte",
"train-labels-idx1-ubyte",
"t10k-images-idx3-ubyte",
"t10k-labels-idx1-ubyte",
]
def create_path(path):
if not os.path.isdir(path):
os.makedirs(path)
def get_file_full_name(path, name):
create_path(path)
if path[-1] == "/":
full_name = path + name
else:
full_name = path + "/" + name
return full_name
def read_mnist(file_name):
file_path = "/home/your/data/path"
full_path = get_file_full_name(file_path, file_name)
file_object = open(full_path, 'rb') #python3 need rb python2 r is ok
return file_object
def get_file_header_data(file_name, header_len, unpack_str):
f = read_mnist(file_name)
raw_header = f.read(header_len)
header_data = struct.unpack(unpack_str, raw_header)
return header_data
def show_images_file_header(file_name):
show_file_header(file_name, 16, ">4I")
def show_labels_file_header(file_name):
show_file_header(file_name, 8, ">2I")
def show_file_header(file_name, header_len, unpack_str):
header_data = get_file_header_data(file_name, header_len, unpack_str)
print("%s header data:%s" % (file_name, header_data))
def show_mnist_file_header():
train_images_file_name = file_list[0]
show_images_file_header(train_images_file_name)
test_images_file_name = file_list[2]
show_images_file_header(test_images_file_name)
train_labels_file_name = file_list[1]
show_labels_file_header(train_labels_file_name)
test_labels_file_name = file_list[3]
show_labels_file_header(test_labels_file_name)
def run():
show_mnist_file_header()
run()
輸出
train-images-idx3-ubyte header data:(2051, 60000, 28, 28)
t10k-images-idx3-ubyte header data:(2051, 10000, 28, 28)
train-labels-idx1-ubyte header data:(2049, 60000)
t10k-labels-idx1-ubyte header data:(2049, 10000)
下面我問讀取一張圖片 並且展示一張圖片和它的標記
from __future__ import division
from __future__ import print_function
#gunzip *.gz
#http://yann.lecun.com/exdb/mnist/
import os
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
file_list = [
"train-images-idx3-ubyte",
"train-labels-idx1-ubyte",
"t10k-images-idx3-ubyte",
"t10k-labels-idx1-ubyte",
]
def create_path(path):
if not os.path.isdir(path):
os.makedirs(path)
def get_file_full_name(path, name):
create_path(path)
if path[-1] == "/":
full_name = path + name
else:
full_name = path + "/" + name
return full_name
def read_mnist(file_name):
file_path = "/home/your/data/path"
full_path = get_file_full_name(file_path, file_name)
file_object = open(full_path, 'rb') #python3 need rb python2 r is ok
return file_object
def get_file_header_data(file_obj, header_len, unpack_str):
raw_header = file_obj.read(header_len)
header_data = struct.unpack(unpack_str, raw_header)
return header_data
def show_images_file_header(file_name):
show_file_header(file_name, 16, ">4I")
def show_labels_file_header(file_name):
show_file_header(file_name, 8, ">2I")
def show_file_header(file_name, header_len, unpack_str):
file_obj = read_mnist(file_name)
header_data = get_file_header_data(file_obj, header_len, unpack_str)
show_file_header_data(file_name, header_data)
file_obj.close()
def show_mnist_file_header():
train_images_file_name = file_list[0]
show_images_file_header(train_images_file_name)
test_images_file_name = file_list[2]
show_images_file_header(test_images_file_name)
train_labels_file_name = file_list[1]
show_labels_file_header(train_labels_file_name)
test_labels_file_name = file_list[3]
show_labels_file_header(test_labels_file_name)
def read_a_image(file_object):
img = file_object.read(28*28)
tp = struct.unpack(">784B",img)
image = np.asarray(tp)
image = image.reshape((28,28))
#image = image.astype(np.float64)
plt.imshow(image,cmap = plt.cm.gray)
plt.show()
def read_a_label(file_object):
img = file_object.read(1)
tp = struct.unpack(">B",img)
print("the label is :%s" % tp[0])
def show_file_header_data(file_name,header_data):
print("%s header data:%s" % (file_name, header_data))
def show_a_image():
images_file_name = file_list[0]
labels_file_name = file_list[1]
images_file = read_mnist(images_file_name)
header_data = get_file_header_data(images_file, 16, ">4I")
show_file_header_data(images_file_name, header_data)
labels_file = read_mnist(labels_file_name)
header_data = get_file_header_data(labels_file, 8, ">2I")
show_file_header_data(labels_file_name, header_data)
read_a_image(images_file)
read_a_label(labels_file)
def run():
#show_mnist_file_header()
show_a_image()
run()
輸出
train-images-idx3-ubyte header data:(2051, 60000, 28, 28)
train-labels-idx1-ubyte header data:(2049, 60000)
the label is :5
然後圖片
恩 圖片和標記一樣是5
然後我們修改成能自動生成批資料
from __future__ import division
from __future__ import print_function
#gunzip *.gz
#http://yann.lecun.com/exdb/mnist/
import os
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
file_list = [
"train-images-idx3-ubyte",
"train-labels-idx1-ubyte",
"t10k-images-idx3-ubyte",
"t10k-labels-idx1-ubyte",
]
def show_images_file_header(file_name):
show_file_header(file_name, 16, ">4I")
def show_labels_file_header(file_name):
show_file_header(file_name, 8, ">2I")
def show_file_header(file_name, header_len, unpack_str):
file_obj = read_mnist(file_name)
header_data = get_file_header_data(file_obj, header_len, unpack_str)
show_file_header_data(file_name, header_data)
file_obj.close()
def show_mnist_file_header():
train_images_file_name = file_list[0]
show_images_file_header(train_images_file_name)
test_images_file_name = file_list[2]
show_images_file_header(test_images_file_name)
train_labels_file_name = file_list[1]
show_labels_file_header(train_labels_file_name)
test_labels_file_name = file_list[3]
show_labels_file_header(test_labels_file_name)
def show_a_image(file_object):
image = read_a_image(images_file)
image = np.asarray(tp)
image = image.reshape((28,28))
plt.imshow(image,cmap = plt.cm.gray)
plt.show()
def show_a_lebel(file_object):
tp = read_a_label(file_object)
print("the label is :%s" % tp)
def show_file_header_data(file_name,header_data):
print("%s header data:%s" % (file_name, header_data))
def show_a_image():
images_file_name = file_list[0]
labels_file_name = file_list[1]
images_file = read_mnist(images_file_name)
header_data = get_file_header_data(images_file, 16, ">4I")
show_file_header_data(images_file_name, header_data)
labels_file = read_mnist(labels_file_name)
header_data = get_file_header_data(labels_file, 8, ">2I")
show_file_header_data(labels_file_name, header_data)
show_a_image(images_file)
read_a_label(labels_file)
def create_path(path):
if not os.path.isdir(path):
os.makedirs(path)
def get_file_full_name(path, name):
create_path(path)
if path[-1] == "/":
full_name = path + name
else:
full_name = path + "/" + name
return full_name
def read_mnist(file_name):
file_path = "/home/your/data/path"
full_path = get_file_full_name(file_path, file_name)
file_object = open(full_path, 'rb') #python3 need rb python2 r is ok
return file_object
def get_file_header_data(file_obj, header_len, unpack_str):
raw_header = file_obj.read(header_len)
header_data = struct.unpack(unpack_str, raw_header)
return header_data
def read_a_image(file_object):
raw_img = file_object.read(28*28)
img = struct.unpack(">784B",raw_img)
return img
def read_a_label(file_object):
raw_label = file_object.read(1)
label = struct.unpack(">B",raw_label)
return label
def generate_a_batch(images_file_name,labels_file_name,batch_size=8):
images_file = read_mnist(images_file_name)
header_data = get_file_header_data(images_file, 16, ">4I")
#show_file_header_data(images_file_name, header_data)
labels_file = read_mnist(labels_file_name)
header_data = get_file_header_data(labels_file, 8, ">2I")
#show_file_header_data(labels_file_name, header_data)
while True:
images = []
labels = []
for i in range(100):
try:
image = read_a_image(images_file)
label = read_a_label(labels_file)
images.append(image)
labels.append(label)
except Exception as err:
print(err)
break
yield images,labels
def get_train_data_generator():
images_file_name = file_list[0]
labels_file_name = file_list[1]
gennerator = generate_a_batch(images_file_name,labels_file_name)
return gennerator-
def get_test_data_generator():
images_file_name = file_list[2]
labels_file_name = file_list[3]
gennerator = generate_a_batch(images_file_name,labels_file_name)
return gennerator
def get_test_data_generator():
images_file_name = file_list[2]
labels_file_name = file_list[3]
gennerator = generate_a_batch(images_file_name,labels_file_name)
return gennerator-
def get_a_batch(data_generator):
if sys.version >'3':
batch_img, batch_labels = data_generator.__next__()
else:
batch_img, batch_labels = data_generator.next()
return batch_img,batch_labels
def generate_test_batch():
data_generator = get_test_data_generator()
count = 1
while count:
batch_img,batch_labels = get_a_batch(data_generator)
if not batch_img and not batch_labels:
break
batch_img = np.array(batch_img)
batch_labels = np.array(batch_labels)
print("img shape:%s label shape:%s count:%s" %(batch_img.shape,batch_labels.shape,count))
count +=1
def generate_train_batch():
epoch
相關推薦
MNIST資料集的格式以及讀取方式
MNIST 網站 http://yann.lecun.com/exdb/mnist/
四個檔案
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz
cifar10資料格式以及讀取方式
cifar10 資料網站 http://www.cs.toronto.edu/~kriz/cifar.html
讀取下面的檔案
CIFAR-10 binary version (suitable for C programs) 162 MB c32a1d4ab5d03f1284b
MNIST資料集格式ubyte轉png
MNIST資料集是ubyte格式儲存的,現在轉化為png格式:
訓練集:
import numpy as np
import struct
from PIL import Image
import os
data_file = 'train-images-idx3
numpy方法讀取載入mnist資料集
方法來自機器之心公眾號
首先下載mnist資料集,並將裡面四個資料夾解壓出來,下載方法見前面的部落格
import tensorflow as tf
import numpy as np
import os
dataset_path = r'D:\PycharmProjects\ten
讀取mnist資料集顯示圖片資訊
MNIST資料集下載地址https://download.csdn.net/download/weixin_33595571/10826617
QQ群:476842922(歡迎加群討論學習)
import numpy as np
import struct
import matplotlib
使用 Java 讀取 MNIST 資料集
使用 Java 讀取 Mnist 資料集
0. 前言
好久沒寫 blog 了,沒有堅持住,心中滿滿的負罪感!!!
上週一時衝動了,決定自己 code 一下 mlp (多層感知機)。最後的測試部分使用它來識別手寫數字,也就是在 MNIST 資料集上訓練並測試效果。在讀取 MNI
Mnist資料集以及input_data.py的程式碼
Mnist作為tensorflow的入門,但是很多人都在Mnist的資料集上就已經卡住了。有的人找不到input_data.pyde程式碼。所以在此給那些找不到input_data.py的人提供程式碼。僅供學習。原始碼來自於https://tensorflow.
神經網路模型的儲存和讀取(基於Mnist資料集)
#Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/",one_hot=True)
impo
MNIST 資料集讀取和視覺化
MNIST 資料集已經是一個被”嚼爛”了的資料集, 很多教程都會對它”下手”, 幾乎成為一個 “典範”. 不過有些人可能對它還不是很瞭解, 下面來介紹一下.Training set images: train-images-idx3-ubyte.gz (9.9 MB, 解壓後
C++ —— 讀取MNIST資料集資料並轉存為影象
在上一個部落格中,我們已經對MNIST資料集的資料格式有了一定的瞭解,這裡我們要完成的工作是將讀到的資料轉成圖片,存入資料夾中,以便日後使用。在開始之前,我們先對該資料庫的儲存格式進行一個具體的介紹:MNIST(Mixed National Institute
機器學習Tensorflow基於MNIST資料集識別自己的手寫數字(讀取和測試自己的模型)
更新:
以下為原博:
廢話不多說,先上效果圖
整體來看,效果是非常不錯的,模型的訓練,參照官方程式碼mnist_deep.py,準確率是高達99.2%
那麼,我是怎麼實現的呢?
一.讀懂卷積神經網路程式碼(至少得把程式跑通)
首先參照Tensorfl
MNIST資料集手寫體識別(MLP實現)
github部落格傳送門 csdn部落格傳送門
本章所需知識:
沒有基礎的請觀看深度學習系列視訊
tensorflow
Python基礎 資料下載連結:
深度學習基礎網路模型(mnist手寫體識別資料集) MNIST資料集手寫體識別(MLP實現)
import tensorflow
MNIST資料集手寫體識別(CNN實現)
github部落格傳送門 csdn部落格傳送門
本章所需知識:
沒有基礎的請觀看深度學習系列視訊
tensorflow
Python基礎 資料下載連結: 深度學習基礎網路模型(mnist手寫體識別資料集)
MNIST資料集手寫體識別(CNN實現)
import tensorflow
MNIST資料集手寫體識別(RNN實現)
github部落格傳送門 csdn部落格傳送門
本章所需知識:
沒有基礎的請觀看深度學習系列視訊
tensorflow
Python基礎 資料下載連結: 深度學習基礎網路模型(mnist手寫體識別資料集)
MNIST資料集手寫體識別(CNN實現)
import tensorflow
MNIST資料集手寫體識別(SEQ2SEQ實現)
github部落格傳送門 csdn部落格傳送門
本章所需知識:
沒有基礎的請觀看深度學習系列視訊
tensorflow
Python基礎 資料下載連結: 深度學習基礎網路模型(mnist手寫體識別資料集)
MNIST資料集手寫體識別(CNN實現)
import tensorflow
kears搭建神經網路分類mnist資料集
from keras.datasets import mnist
from keras import models
from keras import layers
from keras.utils import to_categorical
from keras.optimizers im
pytorch:實現簡單的GAN(MNIST資料集)
# -*- coding: utf-8 -*-
"""
Created on Sat Oct 13 10:22:45 2018
@author: www
"""
import torch
from torch import nn
from torch.autograd import Vari
COCO資料集格式互換
poly->compacted RLE:
seg=np.array([312.29, 562.89, 402.25, 511.49, 400.96, 425.38, 398.39, 372.69, 388.11, 332.85, 318.71, 325.14, 295
神經網路實現Mnist資料集簡單分類
本文針對mnist手寫數字集,搭建了四層簡單的神經網路進行圖片的分類,詳細心得記錄下來分享 我是採用的TensorFlow框架進行的訓練
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import in
http請求資料的格式以及格式
http請求報包含三個部分:
請求行 + 請求頭 + 資料體
請求行包含三個內容 method + request-URI + http-version
method 包含有 post , get, head,delete, put,