VOC2007格式資料集製作

阿新 • • 發佈：2018-12-09

<div class="article-copyright"> 版權宣告：本文為博主原創文章，轉載需宣告出處。 https://blog.csdn.net/gulingfengze/article/details/79639111 </div> <div class="markdown_views"> <h5 id="1前序">1.前序</h5>

前幾天師弟問我如何做自己的VOC2007資料集的事情，當時跟他說網上資料很多，讓他自己查查，但不知道什麼原因和我說還是沒搞好。自己想想也是，不熟悉的東西即便在別人眼裡看似很簡單，到了自己跟前也變得深奧到天際。所以這裡方便大家一起學習就寫了這篇部落格，供大家和師弟參考，如有錯誤的地方還請大家指教。 在做目標檢測時，我們需要準備好自己的資料集，將其製作為VOC2007格式的資料集，這裡可以下載原始VOC2007資料集：<a href="https://pjreddie.com/projects/pascal-voc-dataset-mirror/" rel="nofollow" target="_blank">VOC2007資料集</a>，我們來看看這個資料集到底是什麼樣的。

解壓VOC2007資料集後可以看到VOC2007資料夾下有以下5個資料夾：

<ul> <li>Annotations資料夾 該檔案下存放的是xml格式的標籤檔案，每個xml檔案都對應於JPEGImages資料夾的一張圖片。</li> <li>JPEGImages資料夾 改資料夾下存放的是資料集圖片，包括訓練和測試圖片。</li> <li>ImageSets資料夾 該資料夾下存放了三個檔案，分別是Layout、Main、Segmentation。在這裡我們只用存放影象資料的Main檔案，其他兩個暫且不管。</li> <li>SegmentationClass檔案和SegmentationObject檔案。 這兩個檔案都是與影象分割相關。</li> </ul>

製作自己的VOC2007格式資料集其實不需要上述那麼多內容，我們只要做三個部分即可：Annotations資料夾、JPEGImages資料夾、ImageSets資料夾下的Main檔案。 第一步：我們參照原始VOC2007資料集的檔案層次建立上述四個資料夾，也就是建立一個VOCdevkit資料夾，下面再建立Annotations、JPEGImages、ImageSets三個資料夾，最後在ImageSets資料夾下再建立一個Main資料夾。 建立好所有資料夾後，我們將自己的資料集圖片都放到JPEGImages資料夾下。按照習慣，我們將圖片的名字修改為000001.jpg這種格式的（參照原始資料集圖片命名規則），統一命名方法網路上有很多，網上很多，這裡就不多贅述了。 另外強調兩點：第一點是圖片的格式，圖片需是JPEG或者JPG格式，其他格式需要轉換一下。第二點是圖片的長寬比，圖片長寬比不能太大或太小，這個參考原始VOC2007資料集圖片即可。 第二步：我們來製作Annotations資料夾下所需要存放的xml檔案。這裡我們需要藉助大神帶給我們的福利了：<a href="https://github.com/tzutalin/labelImg" rel="nofollow" target="_blank">LabelImg工具</a>，可以按照上面的說明進行安裝和使用。看到滿篇的英文是不是很暈，那這裡有個簡單的方法可以幫助到大家！當然<code>lxml</code> 庫檔案還是要裝的，但如果你用的是Anaconda環境，那麼你什麼都不用做，只需要點選這裡：<a href="https://pan.baidu.com/s/1aQy3JJ7xgFS10gCXR36FvA" rel="nofollow" target="_blank">LabelImg標註工具</a>，根據自己的情況選擇下載window版本還是linux版本，然後解壓使用就行了！ 關於如何使用，這裡以window版本的為例說明。下載解壓後會得到一個exe可執行檔案，另一個是data資料夾，這裡面有個txt檔案，內容是預定義的分類標籤名，裡面的標籤可以根據自己的需要進行修改。執行exe檔案開啟標註介面就可以進行操作了，操作方法可以參考這篇文章：<a href="http://blog.csdn.net/jesse_mx/article/details/53606897" rel="nofollow" target="_blank">使用方法</a> 這裡給張標註工具的參考圖： <img src="//img-blog.csdn.net/20180321163104919?watermark/2/text/Ly9ibG9nLmNzZG4ubmV0L2d1bGluZ2Zlbmd6ZQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" alt="這裡寫圖片描述" title=""> 下面就進行漫長的標註工作吧。。。 說明：每標註完一張圖片後進行儲存，儲存的xml檔名要與對應圖片名一致，大家可以參考原始VOC2007資料集中JPEGImages資料夾下圖片的命名和Annotations資料夾中的xml檔案命名規則。 備註：這裡還有個製作工具<a href="https://pan.baidu.com/s/1EBbX9Phy8BTRrWrmEfQnsw" rel="nofollow" target="_blank">VOC2007資料格式製作工具</a> 也很好用，大家也可以試一試。這個是在網上看到的，忘記作者了，在這裡表示感謝。 第三步：我們來製作ImageSets資料夾下Main資料夾中的4個檔案（test.txt、train.txt、trainval.txt、val.txt）。 首先我們先來了解下這四個檔案到底是幹什麼用的，當然從檔案的命名上我們也都能大體猜得上來他們的作用，不過這裡還是簡單的說明一下吧。 test.txt：測試集 train.txt：訓練集 val.txt：驗證集 trainval.txt：訓練和驗證集

在原始VOC2007資料集中，trainval大約佔整個資料集的50%，test大約為整個資料集的50%；train大約是trainval的50%，val大約為trainval的50%。所以我們可參考以下程式碼來生成這4個txt檔案：

<pre class="prettyprint" name="code"><code class="language-python hljs has-numbering">import os import random

trainval_percent = 0.5 train_percent = 0.5 xmlfilepath = 'Annotations' txtsavepath = 'ImageSets/Main' total_xml = os.listdir(xmlfilepath)

num=len(total_xml) list=range(num) tv=int(num*trainval_percent) tr=int(tv*train_percent) trainval= random.sample(list,tv) train=random.sample(trainval,tr)

ftrainval = open(txtsavepath+'/trainval.txt', 'w') ftest = open(txtsavepath+'/test.txt', 'w') ftrain = open(txtsavepath+'/train.txt', 'w') fval = open(txtsavepath+'/val.txt', 'w')

for i in list: name=total_xml[i][:-4]+'\n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name)

ftrainval.close() ftrain.close() fval.close() ftest .close()</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li><li style="color: rgb(153, 153, 153);">32</li><li style="color: rgb(153, 153, 153);">33</li><li style="color: rgb(153, 153, 153);">34</li><li style="color: rgb(153, 153, 153);">35</li><li style="color: rgb(153, 153, 153);">36</li></ul></pre>

注意：上述程式碼中涉及到的路徑要寫全，另外各個資料集所佔比例根據實際資料集的大小調整比例。

至此，我們自己的VOC2007格式資料集就全部製作完成了。 </div> <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/markdown_views-ea0013b516.css">

VOC2007格式資料集製作

VOC2007格式資料集製作

製作VOC2007格式資料集用於Faster-RCNN訓練

VOC格式資料集製作

【目標檢測實戰】目標檢測實戰之一--手把手教你LMDB格式資料集製作！

深度學習caffe平臺--製作自己.lmdb格式資料集及分類標籤檔案

超級簡單的VOC2007資料集製作——使用自制的VOC2007資料集製作工具

資料集製作之xml檔案轉化為csv

資料集製作之txt轉xml

論文Multi-Perspective Sentence Similarity Modeling with Convolution Neural Networks實現之資料集製作

關於資料集製作過程中對圖片的一些操作

影象分割 | FCN資料集製作的全流程（影象標註）

caffe fcn資料集製作 -標籤

MXNET資料集製作，生成rec檔案

ubuntu 使用yolov3 yolo-tiny-v3 基於cudnn 7.1 + cuda 9.1 + opencv 3.4.0 以及yolo資料集製作

利用Darket 和YOLOV3訓練自己的資料集(製作VOC)

Tensorflow框架下Faster-RCNN實踐（一）——Faster-RCNN所需資料集製作（附程式碼）

Tensorflow中建立自己的TFRecord格式資料集

tensorflow資料集製作/檔案佇列讀取方式

R語言學習三各種格式資料集的匯入

Pascalvoc資料集製作

VOC2007格式資料集製作

相關推薦