1. 程式人生 > >TensorFlow學習記錄-- 6.百度warp-ctc 引數以及測試例子2解釋

TensorFlow學習記錄-- 6.百度warp-ctc 引數以及測試例子2解釋

1 百度CTC

2 CTC詳解

總的來說就是想不對齊標籤,來設計一個loss,通過最小化這個loss,可以得到精確的識別效果(即最後還能在不對齊標籤的情況下解碼出來),在語音識別方面效果和優勢明顯。
未完待續

3 解讀百度warp-ctc引數以及例子

1 ctc函式

ctc(activations, flat_labels, label_lengths, input_lengths, blank_label=0)
    Computes the CTC loss between a sequence of activations and a
    ground truth labeling.

    Args:

        activations: A 3
-D Tensor of floats. The dimensions should be (t, n, a), where t is the time index, n is the minibatch index, and a indexes over activations for each symbol in the alphabet. #這個相當於logits吧(rnn預測的輸出):在tensorflow中,相當於第一個是時間序列t,第二個為batch n,第三個為輸入資料的維度a,一樣的
flat_labels: A 1-D Tensor of ints, a concatenation of all the labels for the minibatch. #labels是1-D的tensor,例如,對於倆個輸入資料,他的label分別為1,2,那麼1-D的label就可以記為[1,2],這是一個batch的,假如多個batch,也要把多個batch打平,假如倆個batch的label都為1,2,那麼倆個batch的label應該寫作[1,2,1,2]。 label_lengths: A 1
-D Tensor of ints, the length of each label for each example in the minibatch. #這個是每個minibatch中每個例子的每個label的長度,可能是因為所有label都連在一起了,不告訴label的長度就無法區分了吧? input_lengths: A 1-D Tensor of ints, the number of time steps for each sequence in the minibatch. #上面這個是輸入長度,這是每個minibatch的每個序列的時間嗎? blank_label: int, the label value/index that the CTC calculation should use as the blank label #返回每個minibatch每個例子?的cost。 Returns: 1-D float Tensor, the cost of each example in the minibatch (as negative log probabilities). * This class performs the softmax operation internally. * The label reserved for the blank symbol should be label 0.

2 基礎測試 _test_basic輸入解讀

        #開始activations維度為(2,5)
         activations = np.array([
            [0.1, 0.6, 0.1, 0.1, 0.1],
            [0.1, 0.1, 0.6, 0.1, 0.1]
            ], dtype=np.float32)

        alphabet_size = 5
        # dimensions should be t, n, p: (t timesteps, n minibatches,
        # p prob of each alphabet). This is one instance, so expand
        # dimensions in the middle
        #現在activations維度為(2,1,5),對應為(t,batch_size,dims)
        activations = np.expand_dims(activations, 1)
        #label
        labels = np.asarray([1, 2], dtype=np.int32)
        #每個minibatch中每個例子的每個label的長度
        label_lengths = np.asarray([2], dtype=np.int32)
        #輸入的時間序列長度
        input_lengths = np.asarray([2], dtype=np.int32)

3 多batch測試 輸入解讀

        #開始activations維度為(2,5)
        activations = np.array([
            [0.1, 0.6, 0.1, 0.1, 0.1],
            [0.1, 0.1, 0.6, 0.1, 0.1]
        ], dtype=np.float32)

        alphabet_size = 5
        # dimensions should be t, n, p: (t timesteps, n minibatches,
        # p prob of each alphabet). This is one instance, so expand
        # dimensions in the middle
        #現在activations維度為(2,1,5),對應為(t,batch_size,dims)
        _activations = np.expand_dims(activations, 1)
        #現在activations維度為(2,2,5),對應為(t,batch_size,dims)
        activations = np.concatenate([_activations, _activations[...]], axis=1)
        #flat labels
        labels = np.asarray([1, 2, 1, 2], dtype=np.int32)
        #每個minibatch中每個例子的每個label的長度,然後再組合起來
        label_lengths = np.asarray([2, 2], dtype=np.int32)
        #輸入的時間序列長度,然後也再組合起來
        input_lengths = np.asarray([2, 2], dtype=np.int32)