YOLO訓練視覺化訓練過程中的中間引數-解析

阿新 • • 發佈：2019-02-16

等待訓練結束後（有時還沒等結束模型就開始發散了），因此需要檢測各項指標（如loss）是否達到了我們期望的數值，如果沒有，要分析為什麼。視覺化訓練過程的中間引數可以幫助我們分析問題。視覺化中間引數需要用到訓練時儲存的log檔案：

./darknet detector train cfg/tiny-yolo.cfg tiny-yolo_8000.conv.9 2>1 | tee person_train_log.txt

命令：

tee person_train_log.txt

儲存log時會生成兩個檔案，檔案1裡儲存的是網路載入資訊和checkout點儲存資訊，person_train_log.txt中儲存的是訓練資訊。

訓練log中各引數的意義

Region Avg IOU：平均的IOU，代表預測的bounding box和ground truth的交集與並集之比，期望該值趨近於1。

Class：是標註物體的概率，期望該值趨近於1.

Obj：期望該值趨近於1.

No Obj：期望該值越來越小但不為零.

Avg Recall：期望該值趨近1

avg：平均損失，期望該值趨近於0

rate：當前學習率

在使用指令碼繪製變化曲線之前，需要先使用extract_log.py指令碼，格式化log,用生成的新的log檔案供視覺化工具繪圖，格式化log的extract_log.py指令碼如下：

# coding=utf-8
# 該檔案用來提取訓練log，去除不可解析的log後使log檔案格式化，生成新的log檔案供視覺化工具繪圖

def extract_log(log_file,new_log_file,key_word):
f = open(log_file)
train_log = open(new_log_file, 'w')
for line in f:
    # 去除多gpu的同步log
    if 'Syncing' in line:
        continue
    # 去除除零錯誤的log
    if 'nan' in line:
        continue
    if key_word in line:
        train_log.write(line)

f.close()
train_log.close()

extract_log('person_train_log.txt','person_train_log_loss.txt','images')   #voc_train_log.txt 用於繪製loss曲線
extract_log('person_train_log.txt','person_train_log_iou.txt','IOU')


 
使用train_loss_visualization.py指令碼可以繪製loss變化曲線 

train_loss_visualization.py指令碼如下：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline

lines =9873
result = pd.read_csv('person_train_log_loss.txt', skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))] ,error_bad_lines=False, names=['loss', 'avg', 'rate', 'seconds', 'images'])
result.head()

result['loss']=result['loss'].str.split(' ').str.get(1)
result['avg']=result['avg'].str.split(' ').str.get(1)
result['rate']=result['rate'].str.split(' ').str.get(1)
result['seconds']=result['seconds'].str.split(' ').str.get(1)
result['images']=result['images'].str.split(' ').str.get(1)
result.head()
result.tail()

#print(result.head())
# print(result.tail())
# print(result.dtypes)

print(result['loss'])
print(result['avg'])
print(result['rate'])
print(result['seconds'])
print(result['images'])

result['loss']=pd.to_numeric(result['loss'])
result['avg']=pd.to_numeric(result['avg'])
result['rate']=pd.to_numeric(result['rate'])
result['seconds']=pd.to_numeric(result['seconds'])
result['images']=pd.to_numeric(result['images'])
result.dtypes


fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(result['avg'].values,label='avg_loss')
#ax.plot(result['loss'].values,label='loss')
ax.legend(loc='best')
ax.set_title('The loss curves')
ax.set_xlabel('batches')
fig.savefig('avg_loss')
#fig.savefig('loss')


 
修改train_loss_visualization.py中lines為log行數，並根據需要修改要跳過的行數：

skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))]

執行train_loss_visualization.py會在指令碼所在路徑生成avg_loss.png。

可以通過分析損失變化曲線，修改cfg中的學習率變化策略，比如上圖：模型在100000萬次迭代後損失下降速度非常慢，幾乎沒有下降。結合log和cfg檔案發現，自定義的學習率變化策略在十萬次迭代時會減小十倍，十萬次迭代後學習率下降到非常小的程度，導致損失下降速度降低。修改cfg中的學習率變化策略，10萬次迭代時不改變學習率，30萬次時再降低。

除了視覺化loss，還可以視覺化Avg IOU，Avg Recall等引數
視覺化’Region Avg IOU’, ‘Class’, ‘Obj’, ‘No Obj’, ‘Avg Recall’,’count’這些引數可以使用指令碼train_iou_visualization.py，使用方式和train_loss_visualization.py相同，train_iou_visualization.py指令碼如下：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline

lines =9873
result = pd.read_csv('voc_train_log_iou.txt', skiprows=[x for x in range(lines) if (x%10==0 or x%10==9) ] ,error_bad_lines=False, names=['Region Avg IOU', 'Class', 'Obj', 'No Obj', 'Avg Recall','count'])
result.head()

result['Region Avg IOU']=result['Region Avg IOU'].str.split(': ').str.get(1)
result['Class']=result['Class'].str.split(': ').str.get(1)
result['Obj']=result['Obj'].str.split(': ').str.get(1)
result['No Obj']=result['No Obj'].str.split(': ').str.get(1)
result['Avg Recall']=result['Avg Recall'].str.split(': ').str.get(1)
result['count']=result['count'].str.split(': ').str.get(1)
result.head()
result.tail()

#print(result.head())
# print(result.tail())
# print(result.dtypes)
print(result['Region Avg IOU'])

result['Region Avg IOU']=pd.to_numeric(result['Region Avg IOU'])
result['Class']=pd.to_numeric(result['Class'])
result['Obj']=pd.to_numeric(result['Obj'])
result['No Obj']=pd.to_numeric(result['No Obj'])
result['Avg Recall']=pd.to_numeric(result['Avg Recall'])
result['count']=pd.to_numeric(result['count'])
result.dtypes

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(result['Region Avg IOU'].values,label='Region Avg IOU')
#ax.plot(result['Class'].values,label='Class')
#ax.plot(result['Obj'].values,label='Obj')
#ax.plot(result['No Obj'].values,label='No Obj')
#ax.plot(result['Avg Recall'].values,label='Avg Recall')
#ax.plot(result['count'].values,label='count')
ax.legend(loc='best')
#ax.set_title('The Region Avg IOU curves')
ax.set_title('The Region Avg IOU curves')
ax.set_xlabel('batches')
#fig.savefig('Avg IOU')
fig.savefig('Region Avg IOU')
執行train_iou_visualization.py會在指令碼所在路徑生成相應的曲線圖。

YOLO訓練視覺化訓練過程中的中間引數-解析

YOLO訓練視覺化訓練過程中的中間引數-解析

DeepLearning tutorial（2）機器學習演算法在訓練過程中儲存引數

Keras在訓練期間視覺化訓練誤差和測試誤差

Caffe學習筆記（4） -- 視覺化訓練結果

NVIDIA 的深度學習工具箱NVIDIA DIGITS：視覺化訓練

用自己的資料集訓練Mask-RCNN實現過程中的坑

理解YOLOv2訓練過程中輸出引數含義

NLP之WE之Skip-Gram：基於TF利用Skip-Gram模型實現詞嵌入並進行視覺化、過程全記錄

Keras視覺化神經網路的中間層結果

Android外掛化開發過程中遇到的問題總結

神經網路卷積層的實現原理與視覺化其過程 (caffe為例)

（GIS視覺化）ArcGis中屬性連線、關聯和空間連線的區別

tensorboard視覺化操作過程及測試程式碼

基於docker-registry 私有映象庫安裝視覺化工具Harbor中遇到的問題

PCL:點雲的視覺化（程式中線上顯示）

MSSQL 儲存過程中的引數傳遞應用

儲存過程中輸出引數為遊標的時候怎麼處理

專案問題-------傳入儲存過程中的引數的長度一定要和資料庫表的欄位長度保持一直

使用者程式函式呼叫過程中的引數傳遞方式

lmdb編譯過程中出現無法解析的外部符號 NtCreateSection

YOLO訓練視覺化訓練過程中的中間引數-解析

相關推薦