基於Tensorflow的LSTM-CNN文字分類模型

阿新 • • 發佈：2019-02-07

題記

前段時間再看QA方面的文章，讀了一篇paper（《LSTM-based deep learning model for non-factoid answer selection》）中，使用了LSTM-CNN模型來做answer與question的語義抽取。受此啟發，使用這個模型對文字語義資訊進行抽取，加上一個softmax函式形成文字分類模型。

1.LSTM(Long Short-Term Memory)

LSTM在NLP中的應用實在太廣泛了，在Machine Translation，Text Classification，QA等領域都有著成熟的應用，具體通過對RNN的結構進行改進，加入Memory Cell與三個門控單元，對歷史資訊進行有效的控制。而不是像RNN一樣每次都將前一時刻的hidden state完全洗掉，從而增強了其處理長文字序列的能力，也解決了vanishing gradient的問題。

具體結構如圖所示：

Input Gate決定當前時刻LSTM單元的Input vector對memory cell中資訊的改變數，Forget Gate決定上一時刻歷史資訊對當前時刻memory cell中的資訊的影響程度，Output Gate對memory cell中資訊的輸出量進行控制。

將Input Gate，Output Gate，Forget Gate表示為：，，，LSTM更新方法為：

當前時刻的cell state 為，為LSTM單元最終輸出，使用sigmoid function 作為activation function，，，為LSTM的權重矩陣與偏置量。

2.CNN(Convolutional Neural Network)

CNN的結構類似Yoon Kim在《Convolutional neural networks for sentence classification》中提出的結構。

其中，卷積視窗的大小設定對最終的分類結果影響較大，借鑑N-gram語言模型的思想，通過提取相鄰n個詞進行區域性特徵的提取，從而捕捉上下文搭配詞語的語義資訊，對整個文字的語義進行表示。根據這種思想，將卷積視窗大小設定為n*m，n為視窗內詞的個數，m為詞向量維度。

同時使用多個卷積核生成feature maps，再進行max pooling操作，最後使用sotfmax進行分類。

3.LSTM-CNN Model

首先通過Embedding Layer將單詞轉化為詞向量，再輸入LSTM進行語義特徵提取，由於原始語料處理時進行了padding的操作，所以在LSTM輸出時乘以MASK矩陣來減小padding所帶來的影響。下一步將LSTM的輸出作為CNN的輸入，進行進一步的特徵提取。最後得到分類結果。

整個模型的結構如下：

4.程式碼

class LSTM_CNN_Model(object):



    def __init__(self,config,is_training=True):

        self.keep_prob=config.keep_prob
        self.batch_size = 64

        num_step=config.num_step
        self.input_data=tf.placeholder(tf.int32,[None,num_step])
        self.target = tf.placeholder(tf.int64,[None])
        self.mask_x = tf.placeholder(tf.float32,[num_step,None])

        class_num=config.class_num
        hidden_neural_size=config.hidden_neural_size
        vocabulary_size=config.vocabulary_size
        embed_dim=config.embed_dim
        hidden_layer_num=config.hidden_layer_num
 

        #build LSTM network

        lstm_cell = tf.contrib.rnn.BasicLSTMCell(hidden_neural_size,forget_bias=0.0,state_is_tuple=True)
        if self.keep_prob<1:
            lstm_cell =  tf.contrib.rnn.DropoutWrapper(
                lstm_cell,output_keep_prob=self.keep_prob
            )

        cell = tf.contrib.rnn.MultiRNNCell([lstm_cell]*hidden_layer_num,state_is_tuple=True)

        self._initial_state = cell.zero_state(self.batch_size,tf.float32)

        #embedding layer
        with tf.device("/cpu:0"),tf.name_scope("embedding_layer"):
            embedding = tf.get_variable("embedding",[vocabulary_size,embed_dim],dtype=tf.float32)
            inputs=tf.nn.embedding_lookup(embedding,self.input_data)

        if self.keep_prob<1:
            inputs = tf.nn.dropout(inputs,self.keep_prob)

        out_put=[]
        state=self._initial_state
        with tf.variable_scope("LSTM_layer"):
            for time_step in range(num_step):
                if time_step>0: tf.get_variable_scope().reuse_variables()
                (cell_output,state)=cell(inputs[:,time_step,:],state)
                out_put.append(cell_output)

        out_put=out_put*self.mask_x[:,:,None]

        with tf.name_scope("Conv_layer"):
            out_put = tf.transpose(out_put,[1,2,0])
            out_put = tf.reshape(out_put , [self.batch_size,hidden_neural_size,num_step,-1])

            W_conv = tf.get_variable(name="conv_w" , initializer=tf.truncated_normal(shape=[600,5,1,200],stddev=0.1))
            B_conv = tf.get_variable(name="conv_b", initializer=tf.constant(0.1,shape=[200]))

            conv_output = tf.nn.relu(tf.nn.conv2d(out_put , W_conv , strides=[1,1,1,1],padding='VALID') + B_conv)
            conv_output = tf.reshape(conv_output,[self.batch_size,36,200,1])
            max_pool_out = tf.nn.max_pool(conv_output,ksize=[1,36,1,1],strides=[1,1,1,1],padding='VALID')
            max_pool_out = tf.reshape(max_pool_out,[self.batch_size,200])


        with tf.name_scope("Softmax_layer_and_output"):
            softmax_w = tf.get_variable("softmax_w",[200,class_num],dtype=tf.float32)
            softmax_b = tf.get_variable("softmax_b",[class_num],dtype=tf.float32)
            self.logits = tf.matmul(max_pool_out,softmax_w)+softmax_b

        with tf.name_scope("loss"):
            self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.logits+1e-10,labels=self.target)
            self.cost = tf.reduce_mean(self.loss)

        with tf.name_scope("accuracy"):
            self.prediction = tf.argmax(self.logits,1)
            correct_prediction = tf.equal(self.prediction,self.target)
            self.correct_num=tf.reduce_sum(tf.cast(correct_prediction,tf.float32))
            self.accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name="accuracy")

        #add summary
        loss_summary = tf.summary.scalar("loss",self.cost)
        #add summary
        accuracy_summary=tf.summary.scalar("accuracy_summary",self.accuracy)

        if not is_training:
            return

        self.globle_step = tf.Variable(tf.constant(0),dtype=tf.int32,name="globle_step",trainable=False)
        self.lr = tf.Variable(tf.constant(0.8),dtype=tf.float32,trainable=False)

        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
                                      config.max_grad_norm)


        # Keep track of gradient values and sparsity (optional)
        grad_summaries = []
        for g, v in zip(grads, tvars):
            if g is not None:
                grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
                sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
                grad_summaries.append(grad_hist_summary)
                grad_summaries.append(sparsity_summary)
        self.grad_summaries_merged = tf.summary.merge(grad_summaries)

        self.summary =tf.summary.merge([loss_summary,accuracy_summary,self.grad_summaries_merged])

        optimizer = tf.train.GradientDescentOptimizer(self.lr)
        optimizer.apply_gradients(zip(grads, tvars))
        self.train_op=optimizer.apply_gradients(zip(grads, tvars))

        self.new_lr = tf.placeholder(tf.float32,shape=[],name="new_learning_rate")
        self._lr_update = tf.assign(self.lr,self.new_lr)

    def assign_new_lr(self,session,lr_value):
        session.run(self._lr_update,feed_dict={self.new_lr:lr_value})

5.實驗結果

實驗環境如下

GPU：NVIDIA GeForce GTX 1080

OS：Ubuntu 16.04

開發環境：Anaconda 2.3.1，TensorFlow 1.5.0rc1

實驗結果如下

訓練集準確率：

Loss Function：

最終在測試集和驗證集上準確率分別為：87.31%，91.17%

相較LSTM模型提高4%~5%

https://github.com/zjrn/LSTM-CNN_CLASSIFICATION.git

https://github.com/zjrn/LSTM-CNN_CLASSIFICATION.git

基於Tensorflow的LSTM-CNN文字分類模型

題記

1.LSTM(Long Short-Term Memory)

2.CNN(Convolutional Neural Network)

3.LSTM-CNN Model

4.程式碼

5.實驗結果

基於Tensorflow的LSTM-CNN文字分類模型

基於RNN的文字分類模型（Tensorflow）

CNN文字分類模型構建（torch版）

一種基於CNN的自動化提取n-gram feanture的文字分類模型

幾種使用了CNN（卷積神經網路）的文字分類模型

基於LSTM和遷移學習的文字分類模型說明(Tensorflow)

Keras實現CNN文字分類

CNN文字分類論文收集

tensorflow實現基於LSTM的文字分類方法

基於NaiveBayes的文字分類之Spark實現

基於gibbsLDA的文字分類

Text-CNN 文字分類

Tensorflow實現的CNN文字分類

用Tensorflow實現CNN文字分類(詳細解釋及TextCNN程式碼解釋)

CNN文字分類

【NLP】CNN文字分類原理及python程式碼實現

Python【極簡】文字分類模型

使用Keras和預訓練的詞向量訓練新聞文字分類模型

深度學習之文字分類模型-前饋神經網路(Feed-Forward Neural Networks)

基於Text-CNN模型的中文文字分類實戰

基於Tensorflow的LSTM-CNN文字分類模型

題記

1.LSTM(Long Short-Term Memory)

2.CNN(Convolutional Neural Network)

3.LSTM-CNN Model

4.程式碼

5.實驗結果

相關推薦