1. 程式人生 > >自頂向下分析一個簡單的語音識別系統(六)

自頂向下分析一個簡單的語音識別系統(六)

上回分析了run_model函式的configuration過程,其中load_placeholder_into_network函式用於構建該語音識別系統中RNN網路的基本結構,本回將分析以下該網路。

1.RNN簡介

人們並不是從每秒鐘他接收到的資訊開始處理的,就像在看一篇論文的時候,大家都是先理解前文,然後根據現在的資訊逐漸獲得完整的資訊。關於這些帶有時間資訊的序列處理時,傳統的DNN可能無能為力。RNN就是為這類問題而設計的,通常RNN的結構如下圖所示:
這裡寫圖片描述
由上圖可以看出,每個時刻的輸出h即於當前的輸入x有關,同時也於上一狀態c有關。
這裡寫圖片描述

2.LSTM簡介

由上文可知RNN可以處理與時序相關的問題,但是由於梯度擴散以及消失的問題,RNN只能記錄到較短的時序資訊。為了解決上述問題,引入了LSTM結構,如下圖所示:
這裡寫圖片描述


由上圖可以看出,LSTM的內部結構比RNN要複雜一些,主要增加了遺忘門、輸入門和輸出門三個部件,對儲存到RNN中的狀態進行處理,使其能夠記住較長的序列。
首先,輸入門“ input gate”的作用是提供一個是否接收當前正常輸入(包含訊號和隱結點狀態)資訊的權重(0~1)。其值取決於當前輸入訊號和前一時刻隱含層的輸出,結構如下圖所示:
這裡寫圖片描述
然後,遺忘門“ forget gate”的作用是提供一個遺忘當前狀態的權重(0~1)。其值取決於當前輸入訊號和前一時刻隱含層的輸出,結構如下圖所示:
這裡寫圖片描述
其中,LTSM 細胞產生的新記憶,由輸入與遺忘兩部分組成,如下圖所示:
這裡寫圖片描述
最後,輸出門“ output gate”的作用是對當前輸出提供一個 0~1之間的權重(對輸出的認可程度)。其值取決於當前輸入訊號和前一時刻隱含層的輸出,結構如下圖所示:
這裡寫圖片描述

3.BiRNN簡介

由上文可知RNN可以在時序上處理資訊,但是當前的資訊只能根據前面時刻的資訊推測得出,這就忽略了未來的資訊對現在的影響。訪問未來的上下文,對於很多序列標註任務是十分有益的。因此,我們的語音識別系統採用該種網路結構。
那麼如何構建這種特殊的網路結構呢?一種顯而易見的辦法是在輸入和目標之間新增延遲,進而可以給網路一些時間步加入一些未來的上下文資訊。雙向迴圈神經網路(BiRNN)的基本思想是提出每一個訓練序列向前和向後分別是兩個迴圈神經網路(RNN),而且這兩個都連線著一個輸出層。這個結構提供給輸出層輸入序列中每一個點的完整的過去和未來的上下文資訊。下圖展示的是一個沿著時間展開的雙向迴圈神經網路。六個獨特的權值在每一個時步被重複的利用,六個權值分別對應:輸入到向前和向後隱含層(w1, w3),隱含層到隱含層自己(w2, w5),向前和向後隱含層到輸出層(w4, w6)。值得注意的是:向前和向後隱含層之間沒有資訊流,這保證了展開圖是非迴圈的。
這裡寫圖片描述

4.程式碼分析

上文簡單介紹了RNN以及該網路的相關變式,現在我們根據上文所講,簡單分析一下我們的語音識別系統的實現。具體程式碼如下(位於rnn.py中):

def BiRNN(conf_path, batch_x, seq_length, n_input, n_context):
    parser = ConfigParser(os.environ)
    parser.read(conf_path)

    dropout = [float(x) for x in parser.get('birnn', 'dropout_rates').split(',')]
    relu_clip = parser.getint('birnn', 'relu_clip')

    b1_stddev = parser.getfloat('birnn', 'b1_stddev')
    h1_stddev = parser.getfloat('birnn', 'h1_stddev')
    b2_stddev = parser.getfloat('birnn', 'b2_stddev')
    h2_stddev = parser.getfloat('birnn', 'h2_stddev')
    b3_stddev = parser.getfloat('birnn', 'b3_stddev')
    h3_stddev = parser.getfloat('birnn', 'h3_stddev')
    b5_stddev = parser.getfloat('birnn', 'b5_stddev')
    h5_stddev = parser.getfloat('birnn', 'h5_stddev')
    b6_stddev = parser.getfloat('birnn', 'b6_stddev')
    h6_stddev = parser.getfloat('birnn', 'h6_stddev')

    n_hidden_1 = parser.getint('birnn', 'n_hidden_1')
    n_hidden_2 = parser.getint('birnn', 'n_hidden_2')
    n_hidden_5 = parser.getint('birnn', 'n_hidden_5')
    n_cell_dim = parser.getint('birnn', 'n_cell_dim')

    n_hidden_3 = int(eval(parser.get('birnn', 'n_hidden_3')))
    n_hidden_6 = parser.getint('birnn', 'n_hidden_6')

    # Input shape: [batch_size, n_steps, n_input + 2*n_input*n_context]
    batch_x_shape = tf.shape(batch_x)

    # Reshaping `batch_x` to a tensor with shape `[n_steps*batch_size, n_input + 2*n_input*n_context]`.
    # This is done to prepare the batch for input into the first layer which expects a tensor of rank `2`.

    # Permute n_steps and batch_size
    batch_x = tf.transpose(batch_x, [1, 0, 2])
    # Reshape to prepare input for first layer
    batch_x = tf.reshape(batch_x,
                         [-1, n_input + 2 * n_input * n_context])  # (n_steps*batch_size, n_input + 2*n_input*n_context)

    # The next three blocks will pass `batch_x` through three hidden layers with
    # clipped RELU activation and dropout.

    # 1st layer
    with tf.name_scope('fc1'):
        b1 = variable_on_cpu('b1', [n_hidden_1], tf.random_normal_initializer(stddev=b1_stddev))
        h1 = variable_on_cpu('h1', [n_input + 2 * n_input * n_context, n_hidden_1],
                             tf.random_normal_initializer(stddev=h1_stddev))
        layer_1 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(batch_x, h1), b1)), relu_clip)
        layer_1 = tf.nn.dropout(layer_1, (1.0 - dropout[0]))

        tf.summary.histogram("weights", h1)
        tf.summary.histogram("biases", b1)
        tf.summary.histogram("activations", layer_1)

    # 2nd layer
    with tf.name_scope('fc2'):
        b2 = variable_on_cpu('b2', [n_hidden_2], tf.random_normal_initializer(stddev=b2_stddev))
        h2 = variable_on_cpu('h2', [n_hidden_1, n_hidden_2], tf.random_normal_initializer(stddev=h2_stddev))
        layer_2 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_1, h2), b2)), relu_clip)
        layer_2 = tf.nn.dropout(layer_2, (1.0 - dropout[1]))

        tf.summary.histogram("weights", h2)
        tf.summary.histogram("biases", b2)
        tf.summary.histogram("activations", layer_2)

    # 3rd layer
    with tf.name_scope('fc3'):
        b3 = variable_on_cpu('b3', [n_hidden_3], tf.random_normal_initializer(stddev=b3_stddev))
        h3 = variable_on_cpu('h3', [n_hidden_2, n_hidden_3], tf.random_normal_initializer(stddev=h3_stddev))
        layer_3 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_2, h3), b3)), relu_clip)
        layer_3 = tf.nn.dropout(layer_3, (1.0 - dropout[2]))

        tf.summary.histogram("weights", h3)
        tf.summary.histogram("biases", b3)
        tf.summary.histogram("activations", layer_3)

    # Create the forward and backward LSTM units. Inputs have length `n_cell_dim`.
    # LSTM forget gate bias initialized at `1.0` (default), meaning less forgetting
    # at the beginning of training (remembers more previous info)
    with tf.name_scope('lstm'):
        # Forward direction cell:
        lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_cell_dim, forget_bias=1.0, state_is_tuple=True)
        lstm_fw_cell = tf.contrib.rnn.DropoutWrapper(lstm_fw_cell,
                                                     input_keep_prob=1.0 - dropout[3],
                                                     output_keep_prob=1.0 - dropout[3],
                                                     # seed=random_seed,
                                                     )
        # Backward direction cell:
        lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_cell_dim, forget_bias=1.0, state_is_tuple=True)
        lstm_bw_cell = tf.contrib.rnn.DropoutWrapper(lstm_bw_cell,
                                                     input_keep_prob=1.0 - dropout[4],
                                                     output_keep_prob=1.0 - dropout[4],
                                                     # seed=random_seed,
                                                     )

        # `layer_3` is now reshaped into `[n_steps, batch_size, 2*n_cell_dim]`,
        # as the LSTM BRNN expects its input to be of shape `[max_time, batch_size, input_size]`.
        layer_3 = tf.reshape(layer_3, [-1, batch_x_shape[0], n_hidden_3])

        # Now we feed `layer_3` into the LSTM BRNN cell and obtain the LSTM BRNN output.
        outputs, output_states = tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell,
                                                                 cell_bw=lstm_bw_cell,
                                                                 inputs=layer_3,
                                                                 dtype=tf.float32,
                                                                 time_major=True,
                                                                 sequence_length=seq_length)

        tf.summary.histogram("activations", outputs)

        # Reshape outputs from two tensors each of shape [n_steps, batch_size, n_cell_dim]
        # to a single tensor of shape [n_steps*batch_size, 2*n_cell_dim]
        outputs = tf.concat(outputs, 2)
        outputs = tf.reshape(outputs, [-1, 2 * n_cell_dim])

    with tf.name_scope('fc5'):
        # Now we feed `outputs` to the fifth hidden layer with clipped RELU activation and dropout
        b5 = variable_on_cpu('b5', [n_hidden_5], tf.random_normal_initializer(stddev=b5_stddev))
        h5 = variable_on_cpu('h5', [(2 * n_cell_dim), n_hidden_5], tf.random_normal_initializer(stddev=h5_stddev))
        layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(outputs, h5), b5)), relu_clip)
        layer_5 = tf.nn.dropout(layer_5, (1.0 - dropout[5]))

        tf.summary.histogram("weights", h5)
        tf.summary.histogram("biases", b5)
        tf.summary.histogram("activations", layer_5)

    with tf.name_scope('fc6'):
        # Now we apply the weight matrix `h6` and bias `b6` to the output of `layer_5`
        # creating `n_classes` dimensional vectors, the logits.
        b6 = variable_on_cpu('b6', [n_hidden_6], tf.random_normal_initializer(stddev=b6_stddev))
        h6 = variable_on_cpu('h6', [n_hidden_5, n_hidden_6], tf.random_normal_initializer(stddev=h6_stddev))
        layer_6 = tf.add(tf.matmul(layer_5, h6), b6)

        tf.summary.histogram("weights", h6)
        tf.summary.histogram("biases", b6)
        tf.summary.histogram("activations", layer_6)

    # Finally we reshape layer_6 from a tensor of shape [n_steps*batch_size, n_hidden_6]
    # to the slightly more useful shape [n_steps, batch_size, n_hidden_6].
    # Note, that this differs from the input in that it is time-major.
    layer_6 = tf.reshape(layer_6, [-1, batch_x_shape[0], n_hidden_6])

    summary_op = tf.summary.merge_all()

    # Output shape: [n_steps, batch_size, n_hidden_6]
    return layer_6, summary_op

其中,
第2-25行程式碼讀入neural_network.ini中儲存的[birnn]網路的配置引數,如下圖所示:
這裡寫圖片描述
第27-37行對網路進行輸入資料進行處理,可知每個batch_x為2維的,第一個引數為-1,第二個引數為n_input + 2*n_input*n_context=26+2*26*9=494(前文中已給出n_input、n_context的獲取方式)。第二個引數表示網路的輸入向量的寬度。
第42-144行程式碼使用tensorflow中的庫函式對整個網路進行建模,使用tensorboard畫出其結構圖如下:
這裡寫圖片描述
結合上圖以及程式碼可以看出,該網路主要有6層,引數結構如下表所示:

層數 輸入 輸出 其它說明
fc1 494 1024 輸出大於20被置為20,dropout=0.05
fc2 1024 1024 輸出大於20被置為20,dropout=0.05
fc3 1024 2048 輸出大於20被置為20,dropout=0.05
lstm 2048 2048 關鍵部分後文分解
fc5 2048 1024 輸出大於20被置為20,dropout=0.05
fc6 1024 29 輸出層

其中,全連線層fc1/fc2/fc3/fc5/fc6就是簡單的全連線層,下面詳細分析lstm結構。
由程式碼中第79-105行可以看出,該網路主要使用tf.contrib.rnn構建,下面我們來分析以下BasicLSTMCell程式碼,如下所示:

class BasicLSTMCell(RNNCell):
  def __init__(self, num_units, forget_bias=1.0, input_size=None,
               state_is_tuple=True, activation=tanh, reuse=None):
    """Initialize the basic LSTM cell.

    Args:
      num_units: int, The number of units in the LSTM cell.
      forget_bias: float, The bias added to forget gates (see above).
      input_size: Deprecated and unused.
      state_is_tuple: If True, accepted and returned states are 2-tuples of
        the `c_state` and `m_state`.  If False, they are concatenated
        along the column axis.  The latter behavior will soon be deprecated.
      activation: Activation function of the inner states.
      reuse: (optional) Python boolean describing whether to reuse variables
        in an existing scope.  If not `True`, and the existing scope already has
        the given variables, an error is raised.
    """
    if not state_is_tuple:
      logging.warn("%s: Using a concatenated state is slower and will soon be "
                   "deprecated.  Use state_is_tuple=True.", self)
    if input_size is not None:
      logging.warn("%s: The input_size parameter is deprecated.", self)
    self._num_units = num_units
    self._forget_bias = forget_bias
    self._state_is_tuple = state_is_tuple
    self._activation = activation
    self._reuse = reuse

  @property
  def state_size(self):
    return (LSTMStateTuple(self._num_units, self._num_units)
            if self._state_is_tuple else 2 * self._num_units)

  @property
  def output_size(self):
    return self._num_units

  def __call__(self, inputs, state, scope=None):
    """Long short-term memory cell (LSTM)."""
    with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse):
      # Parameters of gates are concatenated into one multiply for efficiency.
      if self._state_is_tuple:
        c, h = state
      else:
        c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)
      concat = _linear([inputs, h], 4 * self._num_units, True)

      # i = input_gate, j = new_input, f = forget_gate, o = output_gate
      i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)

      new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
               self._activation(j))
      new_h = self._activation(new_c) * sigmoid(o)

      if self._state_is_tuple:
        new_state = LSTMStateTuple(new_c, new_h)
      else:
        new_state = array_ops.concat([new_c, new_h], 1)
      return new_h, new_state

其中__call__ 函式實現了前面所講的LSTM結構。
同時第100行中tf.nn.bidirectional_dynamic_rnn函式將兩個lstm結構合併成一個BiRNN結構,程式碼如下:

def bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None,
                              initial_state_fw=None, initial_state_bw=None,
                              dtype=None, parallel_iterations=None,
                              swap_memory=False, time_major=False, scope=None):
  """Creates a dynamic version of bidirectional recurrent neural network.

  Similar to the unidirectional case above (rnn) but takes input and builds
  independent forward and backward RNNs. The input_size of forward and
  backward cell must match. The initial state for both directions is zero by
  default (but can be set optionally) and no intermediate states are ever
  returned -- the network is fully unrolled for the given (passed in)
  length(s) of the sequence(s) or completely unrolled if length(s) is not
  given.

  Args:
    cell_fw: An instance of RNNCell, to be used for forward direction.
    cell_bw: An instance of RNNCell, to be used for backward direction.
    inputs: The RNN inputs.
      If time_major == False (default), this must be a tensor of shape:
        `[batch_size, max_time, input_size]`.
      If time_major == True, this must be a tensor of shape:
        `[max_time, batch_size, input_size]`.
      [batch_size, input_size].
    sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
      containing the actual lengths for each of the sequences in the batch.
      If not provided, all batch entries are assumed to be full sequences; and
      time reversal is applied from time `0` to `max_time` for each sequence.
    initial_state_fw: (optional) An initial state for the forward RNN.
      This must be a tensor of appropriate type and shape
      `[batch_size, cell_fw.state_size]`.
      If `cell_fw.state_size` is a tuple, this should be a tuple of
      tensors having shapes `[batch_size, s] for s in cell_fw.state_size`.
    initial_state_bw: (optional) Same as for `initial_state_fw`, but using
      the corresponding properties of `cell_bw`.
    dtype: (optional) The data type for the initial states and expected output.
      Required if initial_states are not provided or RNN states have a
      heterogeneous dtype.
    parallel_iterations: (Default: 32).  The number of iterations to run in
      parallel.  Those operations which do not have any temporal dependency
      and can be run in parallel, will be.  This parameter trades off
      time for space.  Values >> 1 use more memory but take less time,
      while smaller values use less memory but computations take longer.
    swap_memory: Transparently swap the tensors produced in forward inference
      but needed for back prop from GPU to CPU.  This allows training RNNs
      which would typically not fit on a single GPU, with very minimal (or no)
      performance penalty.
    time_major: The shape format of the `inputs` and `outputs` Tensors.
      If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`.
      If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`.
      Using `time_major = True` is a bit more efficient because it avoids
      transposes at the beginning and end of the RNN calculation.  However,
      most TensorFlow data is batch-major, so by default this function
      accepts input and emits output in batch-major form.
    dtype: (optional) The data type for the initial state.  Required if
      either of the initial states are not provided.
    scope: VariableScope for the created subgraph; defaults to
      "bidirectional_rnn"

  Returns:
    A tuple (outputs, output_states) where:
      outputs: A tuple (output_fw, output_bw) containing the forward and
        the backward rnn output `Tensor`.
        If time_major == False (default),
          output_fw will be a `Tensor` shaped:
          `[batch_size, max_time, cell_fw.output_size]`
          and output_bw will be a `Tensor` shaped:
          `[batch_size, max_time, cell_bw.output_size]`.
        If time_major == True,
          output_fw will be a `Tensor` shaped:
          `[max_time, batch_size, cell_fw.output_size]`
          and output_bw will be a `Tensor` shaped:
          `[max_time, batch_size, cell_bw.output_size]`.
        It returns a tuple instead of a single concatenated `Tensor`, unlike
        in the `bidirectional_rnn`. If the concatenated one is preferred,
        the forward and backward outputs can be concatenated as
        `tf.concat(outputs, 2)`.
      output_states: A tuple (output_state_fw, output_state_bw) containing
        the forward and the backward final states of bidirectional rnn.

  Raises:
    TypeError: If `cell_fw` or `cell_bw` is not an instance of `RNNCell`.
  """

  # pylint: disable=protected-access
  if not isinstance(cell_fw, rnn_cell_impl._RNNCell):
    raise TypeError("cell_fw must be an instance of RNNCell")
  if not isinstance(cell_bw, rnn_cell_impl._RNNCell):
    raise TypeError("cell_bw must be an instance of RNNCell")
  # pylint: enable=protected-access

  with vs.variable_scope(scope or "bidirectional_rnn"):
    # Forward direction
    with vs.variable_scope("fw") as fw_scope:
      output_fw, output_state_fw = dynamic_rnn(
          cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
          initial_state=initial_state_fw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=fw_scope)

    # Backward direction
    if not time_major:
      time_dim = 1
      batch_dim = 0
    else:
      time_dim = 0
      batch_dim = 1

    def _reverse(input_, seq_lengths, seq_dim, batch_dim):
      if seq_lengths is not None:
        return array_ops.reverse_sequence(
            input=input_, seq_lengths=seq_lengths,
            seq_dim=seq_dim, batch_dim=batch_dim)
      else:
        return array_ops.reverse(input_, axis=[seq_dim])

    with vs.variable_scope("bw") as bw_scope:
      inputs_reverse = _reverse(
          inputs, seq_lengths=sequence_length,
          seq_dim=time_dim, batch_dim=batch_dim)
      tmp, output_state_bw = dynamic_rnn(
          cell=cell_bw, inputs=inputs_reverse, sequence_length=sequence_length,
          initial_state=initial_state_bw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=bw_scope)

  output_bw = _reverse(
      tmp, seq_lengths=sequence_length,
      seq_dim=time_dim, batch_dim=batch_dim)

  outputs = (output_fw, output_bw)
  output_states = (output_state_fw, output_state_bw)

  return (outputs, output_states)

其中呼叫了dynamic_rnn函式,程式碼如下:

def dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None,
                dtype=None, parallel_iterations=None, swap_memory=False,
                time_major=False, scope=None):
  """Creates a recurrent neural network specified by RNNCell `cell`.

  This function is functionally identical to the function `rnn` above, but
  performs fully dynamic unrolling of `inputs`.

  Unlike `rnn`, the input `inputs` is not a Python list of `Tensors`, one for
  each frame.  Instead, `inputs` may be a single `Tensor` where
  the maximum time is either the first or second dimension (see the parameter
  `time_major`).  Alternatively, it may be a (possibly nested) tuple of
  Tensors, each of them having matching batch and time dimensions.
  The corresponding output is either a single `Tensor` having the same number
  of time steps and batch size, or a (possibly nested) tuple of such tensors,
  matching the nested structure of `cell.output_size`.

  The parameter `sequence_length` is optional and is used to copy-through state
  and zero-out outputs when past a batch element's sequence length. So it's more
  for correctness than performance, unlike in rnn().

  Args:
    cell: An instance of RNNCell.
    inputs: The RNN inputs.

      If `time_major == False` (default), this must be a `Tensor` of shape:
        `[batch_size, max_time, ...]`, or a nested tuple of such
        elements.

      If `time_major == True`, this must be a `Tensor` of shape:
        `[max_time, batch_size, ...]`, or a nested tuple of such
        elements.

      This may also be a (possibly nested) tuple of Tensors satisfying
      this property.  The first two dimensions must match across all the inputs,
      but otherwise the ranks and other shape components may differ.
      In this case, input to `cell` at each time-step will replicate the
      structure of these tuples, except for the time dimension (from which the
      time is taken).

      The input to `cell` at each time step will be a `Tensor` or (possibly
      nested) tuple of Tensors each with dimensions `[batch_size, ...]`.
    sequence_length: (optional) An int32/int64 vector sized `[batch_size]`.
    initial_state: (optional) An initial state for the RNN.
      If `cell.state_size` is an integer, this must be
      a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`.
      If `cell.state_size` is a tuple, this should be a tuple of
      tensors having shapes `[batch_size, s] for s in cell.state_size`.
    dtype: (optional) The data type for the initial state and expected output.
      Required if initial_state is not provided or RNN state has a heterogeneous
      dtype.
    parallel_iterations: (Default: 32).  The number of iterations to run in
      parallel.  Those operations which do not have any temporal dependency
      and can be run in parallel, will be.  This parameter trades off
      time for space.  Values >> 1 use more memory but take less time,
      while smaller values use less memory but computations take longer.
    swap_memory: Transparently swap the tensors produced in forward inference
      but needed for back prop from GPU to CPU.  This allows training RNNs
      which would typically not fit on a single GPU, with very minimal (or no)
      performance penalty.
    time_major: The shape format of the `inputs` and `outputs` Tensors.
      If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`.
      If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`.
      Using `time_major = True` is a bit more efficient because it avoids
      transposes at the beginning and end of the RNN calculation.  However,
      most TensorFlow data is batch-major, so by default this function
      accepts input and emits output in batch-major form.
    scope: VariableScope for the created subgraph; defaults to "rnn".

  Returns:
    A pair (outputs, state) where:

      outputs: The RNN output `Tensor`.

        If time_major == False (default), this will be a `Tensor` shaped:
          `[batch_size, max_time, cell.output_size]`.

        If time_major == True, this will be a `Tensor` shaped:
          `[max_time, batch_size, cell.output_size]`.

        Note, if `cell.output_size` is a (possibly nested) tuple of integers
        or `TensorShape` objects, then `outputs` will be a tuple having the
        same structure as `cell.output_size`, containing Tensors having shapes
        corresponding to the shape data in `cell.output_size`.

      state: The final state.  If `cell.state_size` is an int, this
        will be shaped `[batch_size, cell.state_size]`.  If it is a
        `TensorShape`, this will be shaped `[batch_size] + cell.state_size`.
        If it is a (possibly nested) tuple of ints or `TensorShape`, this will
        be a tuple having the corresponding shapes.

  Raises:
    TypeError: If `cell` is not an instance of RNNCell.
    ValueError: If inputs is None or an empty list.
  """

  # pylint: disable=protected-access
  if not isinstance(cell, rnn_cell_impl._RNNCell):
    raise TypeError("cell must be an instance of RNNCell")
  # pylint: enable=protected-access

  # By default, time_major==False and inputs are batch-major: shaped
  #   [batch, time, depth]
  # For internal calculations, we transpose to [time, batch, depth]
  flat_input = nest.flatten(inputs)

  if not time_major:
    # (B,T,D) => (T,B,D)
    flat_input = tuple(array_ops.transpose(input_, [1, 0, 2])
                       for input_ in flat_input)

  parallel_iterations = parallel_iterations or 32
  if sequence_length is not None:
    sequence_length = math_ops.to_int32(sequence_length)
    if sequence_length.get_shape().ndims not in (None, 1):
      raise ValueError(
          "sequence_length must be a vector of length batch_size, "
          "but saw shape: %s" % sequence_length.get_shape())
    sequence_length = array_ops.identity(  # Just to find it in the graph.
        sequence_length, name="sequence_length")

  # Create a new scope in which the caching device is either
  # determined by the parent scope, or is set to place the cached
  # Variable using the same placement as for the rest of the RNN.
  with vs.variable_scope(scope or "rnn") as varscope:
    if varscope.caching_device is None:
      varscope.set_caching_device(lambda op: op.device)
    input_shape = tuple(array_ops.shape(input_) for input_ in flat_input)
    batch_size = input_shape[0][1]

    for input_ in input_shape:
      if input_[1].get_shape() != batch_size.get_shape():
        raise ValueError("All inputs should have the same batch size")

    if initial_state is not None:
      state = initial_state
    else:
      if not dtype:
        raise ValueError("If there is no initial_state, you must give a dtype.")
      state = cell.zero_state(batch_size, dtype)

    def _assert_has_shape(x, shape):
      x_shape = array_ops.shape(x)
      packed_shape = array_ops.stack(shape)
      return control_flow_ops.Assert(
          math_ops.reduce_all(math_ops.equal(x_shape, packed_shape)),
          ["Expected shape for Tensor %s is " % x.name,
           packed_shape, " but saw shape: ", x_shape])

    if sequence_length is not None:
      # Perform some shape validation
      with ops.control_dependencies(
          [_assert_has_shape(sequence_length, [batch_size])]):
        sequence_length = array_ops.identity(
            sequence_length, name="CheckSeqLen")

    inputs = nest.pack_sequence_as(structure=inputs, flat_sequence=flat_input)

    (outputs, final_state) = _dynamic_rnn_loop(
        cell,
        inputs,
        state,
        parallel_iterations=parallel_iterations,
        swap_memory=swap_memory,
        sequence_length=sequence_length,
        dtype=dtype)

    # Outputs of _dynamic_rnn_loop are always shaped [time, batch, depth].
    # If we are performing batch-major calculations, transpose output back
    # to shape [batch, time, depth]
    if not time_major:
      # (T,B,D) => (B,T,D)
      flat_output = nest.flatten(outputs)
      flat_output = [array_ops.transpose(output, [1, 0, 2])
                     for output in flat_output]
      outputs = nest.pack_sequence_as(
          structure=outputs, flat_sequence=flat_output)

    return (outputs, final_state)

其中呼叫了_dynamic_rnn_loop函式,程式碼如下:

def _dynamic_rnn_loop(cell,
                      inputs,
                      initial_state,
                      parallel_iterations,
                      swap_memory,
                      sequence_length=None,
                      dtype=None):
  """Internal implementation of Dynamic RNN.

  Args:
    cell: An instance of RNNCell.
    inputs: A `Tensor` of shape [time, batch_size, input_size], or a nested
      tuple of such elements.
    initial_state: A `Tensor` of shape `[batch_size, state_size]`, or if
      `cell.state_size` is a tuple, then this should be a tuple of
      tensors having shapes `[batch_size, s] for s in cell.state_size`.
    parallel_iterations: Positive Python int.
    swap_memory: A Python boolean
    sequence_length: (optional) An `int32` `Tensor` of shape [batch_size].
    dtype: (optional) Expected dtype of output. If not specified, inferred from
      initial_state.

  Returns:
    Tuple `(final_outputs, final_state)`.
    final_outputs:
      A `Tensor` of shape `[time, batch_size, cell.output_size]`.  If
      `cell.output_size` is a (possibly nested) tuple of ints or `TensorShape`
      objects, then this returns a (possibly nsted) tuple of Tensors matching
      the corresponding shapes.
    final_state:
      A `Tensor`, or possibly nested tuple of Tensors, matching in length
      and shapes to `initial_state`.

  Raises:
    ValueError: If the input depth cannot be inferred via shape inference
      from the inputs.
  """
  state = initial_state
  assert isinstance(parallel_iterations, int), "parallel_iterations must be int"

  state_size = cell.state_size

  flat_input = nest.flatten(inputs)
  flat_output_size = nest.flatten(cell.output_size)

  # Construct an initial output
  input_shape = array_ops.shape(flat_input[0])
  time_steps = input_shape[0]
  batch_size = input_shape[1]

  inputs_got_shape = tuple(input_.get_shape().with_rank_at_least(3)
                           for input_ in flat_input)

  const_time_steps, const_batch_size = inputs_got_shape[0].as_list()[:2]

  for shape in inputs_got_shape:
    if not shape[2:].is_fully_defined():
      raise ValueError(
          "Input size (depth of inputs) must be accessible via shape inference,"
          " but saw value None.")
    got_time_steps = shape[0].value
    got_batch_size = shape[1].value
    if const_time_steps != got_time_steps:
      raise ValueError(
          "Time steps is not the same for all the elements in the input in a "
          "batch.")
    if const_batch_size != got_batch_size:
      raise ValueError(
          "Batch_size is not the same for all the elements in the input.")

  # Prepare dynamic conditional copying of state & output
  def _create_zero_arrays(size):
    size = _state_size_with_prefix(size, prefix=[batch_size])
    return array_ops.zeros(
        array_ops.stack(size), _infer_state_dtype(dtype, state))

  flat_zero_output = tuple(_create_zero_arrays(output)
                           for output in flat_output_size)
  zero_output = nest.pack_sequence_as(structure=cell.output_size,
                                      flat_sequence=flat_zero_output)

  if sequence_length is not None:
    min_sequence_length = math_ops.reduce_min(sequence_length)
    max_sequence_length = math_ops.reduce_max(sequence_length)

  time = array_ops.constant(0, dtype=dtypes.int32, name="time")

  with ops.name_scope("dynamic_rnn") as scope:
    base_name = scope

  def _create_ta(name, dtype):
    return tensor_array_ops.TensorArray(dtype=dtype,
                                        size=time_steps,
                                        tensor_array_name=base_name + name)

  output_ta = tuple(_create_ta("output_%d" % i,
                               _infer_state_dtype(dtype, state))
                    for i in range(len(flat_output_size)))
  input_ta = tuple(_create_ta("input_%d" % i, flat_input[0].dtype)
                   for i in range(len(flat_input)))

  input_ta = tuple(ta.unstack(input_)
                   for ta, input_ in zip(input_ta, flat_input))

  def _time_step(time, output_ta_t, state):
    """Take a time step of the dynamic RNN.

    Args:
      time: int32 scalar Tensor.
      output_ta_t: List of `TensorArray`s that represent the output.
      state: nested tuple of vector tensors that represent the state.

    Returns:
      The tuple (time + 1, output_ta_t with updated flow, new_state).
    """

    input_t = tuple(ta.read(time) for ta in input_ta)
    # Restore some shape information
    for input_, shape in zip(input_t, inputs_got_shape):
      input_.set_shape(shape[1:])

    input_t = nest.pack_sequence_as(structure=inputs, flat_sequence=input_t)
    call_cell = lambda: cell(input_t, state)

    if sequence_length is not None:
      (output, new_state) = _rnn_step(
          time=time,
          sequence_length=sequence_length,
          min_sequence_length=min_sequence_length,
          max_sequence_length=max_sequence_length,
          zero_output=zero_output,
          state=state,
          call_cell=call_cell,
          state_size=state_size,
          skip_conditionals=True)
    else:
      (output, new_state) = call_cell()

    # Pack state if using state tuples
    output = nest.flatten(output)

    output_ta_t = tuple(
        ta.write(time, out) for ta, out in zip(output_ta_t, output))

    return (time + 1, output_ta_t, new_state)

  _, output_final_ta, final_state = control_flow_ops.while_loop(
      cond=lambda time, *_: time < time_steps,
      body=_time_step,
      loop_vars=(time, output_ta, state),
      parallel_iterations=parallel_iterations,
      swap_memory=swap_memory)

  # Unpack final output if not using output tuples.
  final_outputs = tuple(ta.stack() for ta in output_final_ta)

  # Restore some shape information
  for output, output_size in zip(final_outputs, flat_output_size):
    shape = _state_size_with_prefix(
        output_size, prefix=[const_time_steps, const_batch_size])
    output.set_shape(shape)

  final_outputs = nest.pack_sequence_as(
      structure=cell.output_size, flat_sequence=final_outputs)

  return (final_outputs, final_state)

這段程式碼看的還不是太清楚,留到以後分析。
自此我們對我們的語音識別系統中的網路結構已經有了一定的認識了,後面我們將分析一下該網路是如何執行起來的。

相關推薦

分析一個簡單語音識別系統

上回分析了run_model函式的configuration過程,其中load_placeholder_into_network函式用於構建該語音識別系統中RNN網路的基本結構,本回將分析以下該網路。 1.RNN簡介 人們並不是從每秒鐘他接收到的資訊開始

分析一個簡單語音識別系統

接著上回結束的地方,本回我們來分析sparse_tuple_to_texts函式和ndarray_to_text函式。首先分析sparse_tuple_to_texts函式。 1.sparse_tuple_to_texts函式 給出程式碼如下: def s

分析一個簡單語音識別系統

上回我們說到了get_audio_and_transcript函式、pad_sequences函式和sparse_tuple_from函式等3個函式,本回我們分析這3個函式分別實現了哪些功能。 1.get_audio_and_transcript函式 該

分析一個簡單語音識別系統

本回我們主要分析run_model中的configuration過程的相關函式。 1.run_model函式 第二回我們簡單介紹了run_model函式的結構,現在我們貼出程式碼如下所示: def run_model(self):

分析一個簡單語音識別系統

上回我們分析了模型的初始化,花開兩朵各表一枝,本回我們說一下上回提到的set_dirs.py。該檔案結構如下圖所示: Created with Raphaël 2.1.0get_home_dirget_data_dir/get_conf_dir/get_mod

分析一個簡單語音識別系統

RNN處理帶有時間序列的資料時具有很大的優勢,接下來幾篇文章將介紹如何使用RNN訓練一個簡單的語音識別系統。 主要參考該GitHub專案,https://github.com/silicon-vall

分析Binder【1】—— Binder例項篇

一個Binder例項 我們Binder的學習將從下面的一個例項開始。根據Android文件中的描述,建立一個Binder服務主要包括如下3步: 下面具體看一下在eclipse中是如何開發一個Binder應用的。 第

在樹莓派上建立一個簡單手寫體識別系統

首先得先把opencv安裝上。 在PC上我使用的是anaconda,直接輸入: conda install --channel https://conda.anaconda.org/menpo opencv3 測試程式碼: import cv2

PyQt實現一個簡單的License系統

    本文接著上一篇繼續講解“PyQt實現一個簡單的License系統”,主要包括: 3)如何用python建立一個GUI。 4)python如何調C DLL庫。 5)ctypes中型別處理。     上一篇文章只是簡單的將ui檔案轉換為py檔案,並執行,生成了一

第四章——語法分析方法

一、語法分析器的功能 語法分析器的任務:判斷所給單詞串是不是給定文法的正確句子。 1、確定的自頂向下分析思想: 從文法的識別符號出發、根據當前的輸入符號、唯一的確定一個產生式、用產生式右部的符號串代替相應的非終結符往下推。能構造成功則是句子,否則不是。 2、什麼

深入分析Netty--Bootstrap

1.使用示例 首先使用Netty構造如圖所示的框架,原始碼如下: // 指定mainReactor EventLoopGroup bossGroup = new NioEventLoopGroup(1); // 指定subReactor EventLoopGr

深入分析Netty--執行緒模型

上面這幅圖描述了netty的執行緒模型,其中mainReacotor,subReactor,Thread Pool是三個執行緒池。mainReactor負責處理客戶端的連線請求,並將accept的連線註冊到subReactor的其中一個執行緒上;subReactor負責處理客戶端通道上的資料讀

深入分析Netty--預備知識

netty是基於Java NIO封裝的網路通訊框架,只有充分理解了Java NIO才能理解好netty的底層設計。Java NIO有幾個重要的概念Channel,Buffer,Selector。NIO是基於Channel和Buffer操作的,資料只能通過Buffer寫入到Channel或者從Chan

編譯原理中的語法分析——

語法分析——自頂向下 語法分析在編譯原理的: 語法分析分為自頂向下和自下而上 自頂向下:(就是從文法的句子可以歸約出開始符,簡單的說就是從一個語法樹的底部推出語法樹的根) 自下而上:(就是從文法的開始符推出文法的句子,簡單的說就是從一棵語法樹的根推出

】設計一個列印當月日曆的程式

列印當月日曆,可以被分為兩個子問題:獲取使用者輸入、列印日曆。 使用cin來讀取輸入,而列印日曆再次分為兩個子問題:列印日曆頭、列印日曆主體。 日曆頭又包含:月份、年份、星期。 列印日曆體需要知道:當月第一天是星期幾、當月有多少天、哪一年是閏年、每個月有多少天。

深入分析Netty--EventLoop-1

netty執行緒模型 我們再次回顧這幅圖,通過先前的講解,現在是不是親切很多了。圖中綠色的acceptor應該是你最熟悉的部分,之前我們在ServerBootstrap中進行了詳細分析。我們知道了mainReactor是一個執行緒池,處理Accept事件負責接受客戶

文法分析小結:底向上的分析方法和分析方法有哪些

首先注意一點:無論是那種語法分析,語法都是從左至右的讀入符號! 自底向上分析法,也稱移進-歸約分析法。 它的實現思想是對輸入符號串自左向右進行掃描,並將輸入符逐個移入一個後進先出棧中,邊移入邊分析,一旦棧頂符號串形成某個句型的控制代碼時,(該控制代碼對應某產生式的右部),就

編譯原理-第四章 語法分析-4.4 的語法分析

自頂向下的語法分析 一、自頂向下語法分析 1.定義 2.不足 不能處理左遞迴 不能處理複雜的回溯技術 3.例 4.演算法構造 FIRST FOLLOW LL(1)文法 二、預測分析 1.遞迴下降的預測分析 遞迴下降預測分析器

計算機網絡()----讀書筆記

接收消息 技術 tel 出隊 電纜 outer sage 調度算法 結構 1.端系統和網絡核心、協議   處在因特網邊緣的部分就是連接在因特網上的所有的主機。這些主機又稱為端系統(end system)   網絡核心部分要向網絡邊緣中的大量主機提供連通性,使邊緣部分中的任何

, 逐步求精”的程序設計方法。

align 問題 重點 http .net 分解 法國 所在 定量 http://blog.csdn.net/rns521/article/details/6973395/ 結構化程序設計、面向對象程序設計、計算機輔助設計。 結構化程序設計支持“自頂向下, 逐步求精”的