GE2E Speaker Verification 論文復現

軟體開發 · 發表 2019-04-08 08:20:14

摘要：最近使用GE2E loss 訓練了一個提取 Speaker Embedding 的網路，GE2E 的基本思想是： GE2E loss pushes the embedding towards the centroid of the true speaker, and away ...

最近使用GE2E loss 訓練了一個提取 Speaker Embedding 的網路，GE2E 的基本思想是：

GE2E loss pushes the embedding towards the centroid of the true speaker, and away from the centroid of the most similar different speaker.

比較重要的幾點：

gradient scale: 0.01 w/b, project kernel/bias 0.5
論文使用 SGD 很好用，我在實驗的時候，使用 Adam 完全不 work，通常 Adam 不會差，可能是 gradient scale 跟 Adam 的 scale 機制存在衝突
構建三層 LSTM 網路，LSTMBlockFusedCell/CudnnLSTM 比 LSTMCell + dynamic_rnn 快很多
在大資料集上頻譜特徵預先提取、儲存，訓練時候 data pipeline 的效率對優化模型訓練速度很重要（資料準備跟 GPU 訓練重疊、掩蓋）

def generator():
...
# load features from disk/SSD/network
yield {'features': features }

dataset = tf.data.Dataset.from_generator(generator, ...)
dataset = dataset.prefetch(1024)# 預取很重要
iterator = dataset.make_one_shot_iterator()
return iterator.get_next()

訓練過程，loss 下降很快、沒有嚴重的抖動