文字分類之CNN模型(TensorFlow實現版本)
阿新 • • 發佈:2018-12-02
前言
最近在琢磨文字分類相關的深度學習模型,也研讀了以下三篇使用卷積神經網路CNN實現的文字分類論文:
(1)《Convolutional Neural Networks for Sentence Classification》
(2)《Character-level Convolutional Networks for Text Classification》
(3)《Effective Use of Word Order for Text Categorization with Convolutional Neural Networks》
此部落格也有對一些文字分類論文思路進行講解: https://blog.csdn.net/guoyuhaoaaa/article/details/53188918
模型實現
這幾天主要實現了第一篇論文的CNN模型,使用了20newsgroup的資料集,實現三個模型如下:
- CNN-rand
句子中的的word vector都是隨機初始化的,同時當做CNN訓練過程中需要優化的引數; - CNN-static
句子中的word vector是使用word2vec預先對Google News dataset (about 100 billion words)進行訓練好的詞向量表中的詞向量。且在CNN訓練過程中作為固定的輸入,不作為優化的引數; - CNN-nonstatic
句子中的word vector是使用word2vec預先對Google News dataset (about 100 billion words)進行訓練好的詞向量表中的詞向量。在CNN訓練過程中作為固定的輸入,做為CNN訓練過程中需要優化的引數;
整體思路如圖所示(摘自論文1):
包括以下幾個部分:
* 輸入層
* 卷積層
抽取Feature Map, 也就是我們所需的文字特徵
* 全連線層
通過Max-pooling操作,即將每個Feature Map向量中最大的一個值抽取出來,組成一個一維向量
* 輸出層
該層的輸入為池化操作後形成的一維向量,經過啟用函式ReLU
更詳細的講解可以看這篇文章:https://www.jianshu.com/p/fe428f0b32c1
部分程式碼
具體的程式碼以及我踩過的坑可以看我的github:
https://github.com/DilicelSten/CNN_learning/blob/master/simple%20cnn/
import tensorflow as tf
# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
self.W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -0.25, 0.25),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
# create a convolution + maxpool layer for each fliter size
pooled_outputs = []
for i, filter_size in enumerate(filter_sizes):
with tf.name_scope("conv-maxpool-%s" % filter_size):
# convolution layer
filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv")
# apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# maxpooling over the outputs
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_length - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
padding="VALID",
name="pool")
pooled_outputs.append(pooled)
# combine all the pooled features
num_filters_total = num_filters * len(filter_sizes)
self.h_pool = tf.concat(pooled_outputs, 3)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
# add dropout
with tf.name_scope("dropout"):
self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)
# final (unnormalized) scores and predictions
with tf.name_scope("output"):
W = tf.get_variable(
"W",
shape=[num_filters_total, num_classes],
initializer=tf.contrib.layers.xavier_initializer())
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
l2_loss += tf.nn.l2_loss(W)
l2_loss += tf.nn.l2_loss(b)
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.predictions = tf.argmax(self.scores, 1, name="predictions")