1. 程式人生 > >Tensorflow不定長卷積與解卷積

Tensorflow不定長卷積與解卷積

Tensorflow不定長卷積與解卷積

  在用CNN處理某些影象或時序輸入時,需要考慮輸入長度不固定的情況。例如在做場景文字識別時,輸入的影象是已經被檢測出來的長方形的包含文字的影象,這些 “檢測框” 的長度不一。一般有兩種做法,第一種從資料下手,將輸入 paddingresize,所謂 padding 即給定一個固定長度,將短於該長度的樣本補零,將長於該長度的樣本截斷或丟棄,所謂 resize 就是對樣本上下采樣,週期訊號還可以簡單重複或擷取。第二種從模型下手,採用能接受任意長度輸入的模型,比如RNN。實際上,CNN結構中限制輸入大小的是FC層,如果去掉FC層,也可以使CNN接受任意長度輸入。
  本文給出如何用Tensorflow實現接受不定長輸入的卷積層和解卷積層。


卷積層:接受任意尺寸輸入

卷積層tf.nn.conv2d本身就可接受任意尺寸的輸入,真正制約的實際上是FC層。

def conv2d(x, channel, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, padding='VALID', name='conv2d'):
	with tf.variable_scope(name):
		w = tf.get_variable('weights', [k_h, k_w, x.get_shape()[-1], channel], 
				initializer=tf.truncated_normal_initializer(
stddev=stddev)) biases = tf.get_variable('biases', shape=[channel], initializer=tf.zeros_initializer()) conv = tf.nn.conv2d(x, w, strides=[1, d_h, d_w, 1], padding=padding) conv = tf.nn.bias_add(conv, biases) return conv

測試如下,對於不同尺寸(H,W)的輸入,卷積層都能得到正確的輸出。

import numpy as np
import
tensorflow as tf x = tf.placeholder(shape=(64,None,None,1), dtype=tf.float32) y = conv2d(x, 16, 2,2,1,1, name='conv') with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # test for different pair of (H,W) for H in range(25, 30): for W in range(25, 30): out = sess.run(y, feed_dict={x:np.random.normal(size=(64,H,W,1))}) print "H={}, W={}, ".format(H, W), out.shape
H=25, W=25,  (64, 24, 24, 16)
H=25, W=26,  (64, 24, 25, 16)
H=25, W=27,  (64, 24, 26, 16)
H=25, W=28,  (64, 24, 27, 16)
H=25, W=29,  (64, 24, 28, 16)
...

卷積層:處理同batch中padding過的不等長樣本

  雖然輸入卷積層的資料往往是[N,H,W,C]大小(即[batch_size, height, width, channel])的Tensor,但實際上可能是做過padding的不等長樣本。若有樣本實際長度為 L < N L<N ,則對其後半段長度為 N L N-L 的部分的卷積結果是沒有意義的,不應參與輸出或訓練。為了解決這個問題,可以用Masking將卷積結果的一部分遮蓋掉,因此需要根據卷積核與步長定義一個計算卷積輸出尺寸的函式。
  方便起見,定義一層卷積,卷積核大小(2,2),步長(2,2),由此定義函式get_conv_lens來計算卷積輸出的尺寸,接著用tf.sequence_mask得到掩膜,再與卷積結果相乘。由最終結果可見,超過預定長度的部分都變成了0,因此在計算梯度時,這部分資料將被忽略不計。

def get_conv_lens(lengths):
    return tf.floor_div(lengths - 1, 2) 

x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
lens = tf.placeholder(shape=(3), dtype=tf.int32)

'''Masking'''
conv_lens = get_conv_lens(lens)
mask = tf.sequence_mask(conv_lens)
mask = tf.expand_dims(tf.expand_dims(mask, -1), -1)
mask = tf.to_float(mask)

y = conv2d(x, channel=1, k_w=2, k_h=2, d_w=2, d_h=2, name='conv')
y = tf.multiply(y, mask)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    feed_dict = {
        x:np.random.normal(size=(3, 9, 9, 1)),
        lens:[5, 7, 9]
    }
    out, l = sess.run([y, conv_lens], feed_dict=feed_dict)

print 'conv_output_lens:', l
for i in range(3):
    print "Sample {}, len:{}".format(i, l[i])
    print out[i, :, :, 0]
conv_output_lens: [2 3 4]
Sample 0, len:2
[[-0.01515051 -0.05866513 -0.03959468 -0.04031102]
 [ 0.01430748 -0.01260181  0.06486305  0.01679482]
 [ 0.         -0.          0.         -0.        ]
 [ 0.         -0.          0.         -0.        ]]
Sample 1, len:3
[[-0.00693851 -0.06252004  0.04867405  0.00893762]
 [-0.01766482 -0.01576694  0.00474036  0.05953841]
 [ 0.02577864 -0.05794765  0.07342847  0.02103793]
 [-0.         -0.         -0.          0.        ]]
Sample 2, len:4
[[ 0.0050444  -0.01620883 -0.01921686 -0.01786101]
 [ 0.02900701  0.02657226 -0.00322832 -0.07596755]
 [-0.03624581  0.05622911 -0.00773423  0.04726247]
 [ 0.0512427   0.01688698 -0.03030321  0.01135093]]

解卷積層:接受任意尺寸輸入

  解卷積層tf.nn.conv2d_transpose要求給定輸出尺寸output_shape,為了實現接受任意尺寸輸入,output_shape的中間兩維應當是非定值,根據卷積核與步長,動態計算解卷積後的尺寸。這兩個值可以是Tensor,但不能是None。因此不能用get_shape().as_list()獲取輸入尺寸再計算,它只會返回None,可以用tf.shape(),它會返回一個Tensor

def deconv2d(x, channel, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name='deconv2d'):
    # output_shape: the output_shape of deconv op
	def get_deconv_lens(H, k, d):
		return tf.multiply(H, d) + k - 1

	shape = tf.shape(x)
	H, W = shape[1], shape[2]
	N, _, _, C = x.get_shape().as_list()
	
	with tf.variable_scope(name):
		w = tf.get_variable('weights', [k_h, k_w, channel, x.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
		biases = tf.get_variable('biases', shape=[channel], initializer=tf.zeros_initializer())
    
	H1 = get_deconv_lens(H, k_h, d_h)
	W1 = get_deconv_lens(W, k_w, d_w)
	deconv = tf.nn.conv2d_transpose(x, w, output_shape=[N, H1, W1, channel], strides=[1, d_h, d_w, 1], padding='VALID')
	deconv = tf.nn.bias_add(deconv, biases)

	return deconv

測試如下,對於不同尺寸(H,W)的輸入,解卷積層都能得到正確的輸出。

x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
y = deconv2d(x, k_w=2, k_h=2, d_w=2, d_h=2, channel=18)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for H in range(4, 9):
        for W in range(4, 9):
            feed_dict = {x:np.random.normal(size=(3, H, W, 1))}
            out = sess.run(y, feed_dict=feed_dict)
            print "H={}, W={}, ".format(H,W), out.shape
H=4, W=4,  (3, 9, 9, 18)
H=4, W=5,  (3, 9, 11, 18)
H=4, W=6,  (3, 9, 13, 18)
H=4, W=7,  (3, 9, 15, 18)
H=4, W=8,  (3, 9, 17, 18)
...

解卷積層:處理同batch中padding過的不等長樣本

def get_deconv_lens(lens):
    return tf.multiply(lens, 2) + 1

x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
lens = tf.placeholder(shape=(3), dtype=tf.int32)

deconv_lens = get_deconv_lens(lens)
mask = tf.sequence_mask(deconv_lens)
mask = tf.expand_dims(tf.expand_dims(mask, -1), -1)
mask = tf.to_float(mask)

y = deconv2d(x, k_h=2, k_w=2, d_h=2, d_w=1, channel=1)
y = tf.multiply(y, mask)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    H, W = 3, 3
    feed_dict = {
        x:np.random.normal(size=(3, H, W, 1)),
        lens:[1,2,3]}
    out, ls = sess.run([y, deconv_lens], feed_dict=feed_dict)
    print "H={}, W={}, ".format(H, W), out.shape
    print "deconv_output_lengths:", ls
    for i in range(3):
        print "Sample {}: len={}".format(i, ls[i])
        print out[i, :, :, 0]
H=3, W=3,  (3, 7, 4, 1)
deconv_output_lengths: [3 5 7]
Sample 0: len=3
[[-0.0066564  -0.0110612   0.01348369  0.02262646]
 [-0.00557604  0.0001205   0.01187689 -0.00099025]
 [ 0.00473566  0.01897985  0.01987015  0.0026021 ]
 [ 0.          0.          0.         -0.        ]
 [-0.          0.          0.         -0.        ]
 [-0.          0.         -0.          0.        ]
 [ 0.          0.          0.          0.        ]]
Sample 1: len=5
[[-0.01708268 -0.02356728  0.01687649  0.01737293]
 [-0.01431008  0.00434665  0.00883375 -0.00076033]
 [ 0.01680406  0.04338101  0.01744103 -0.01432209]
 [ 0.01407669  0.01264412 -0.00865467  0.00062681]
 [-0.01829475 -0.02353049  0.01568151  0.0104046 ]
 [-0.          0.          0.         -0.        ]
 [ 0.          0.          0.          0.        ]]
Sample 2: len=7
[[ 4.8716185e-03 -5.5075656e-03 -1.9949302e-02  2.1265401e-03]
 [ 4.0809335e-03 -1.1483297e-02  2.0447900e-03 -9.3068171e-05]
 [-1.1571520e-02 -5.5464655e-03  2.3397136e-02  4.2484370e-03]
 [-9.6934121e-03  1.1671140e-02  1.3168794e-03 -1.8593312e-04]
 [-1.0423118e-02 -3.6943398e-02 -2.9132590e-02  5.2677775e-03]
 [-8.7314006e-03 -1.6249334e-02  4.1774949e-03 -2.3054463e-04]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]]

一些經驗與心得

1. 如果想組成類似U-Net的結構

不定長輸入是允許的,但需要將輸入padding成一些固定的長度,可取的值並不是連續的,這個在多層Conv-Deconv中尤其顯著。該問題本質上是由卷積層tf.nn.conv2d在做步長大於1的卷積時,遇到無法整除的情況時,輸出尺寸向上取整導致的。

2. 多層Conv或Deconv的Masking

不需要逐層masking,一次性計算好最後的長度lengths,一步到位得到mask,再乘上輸出即可。

3. Conv如何得到定長的輸出

在分類任務中常需要後續接一個FC跟Softmax,假設Conv的輸出是[N, T, W, C],這裡T是變長的,可理解為timestep,將輸出做Mask-meanpooling,即將masking之後的結果在T這一維度疊加,得到[N, 1, W, C]大小的Tensor,然後在除以長度lengths,最後做flatten展開即可。