深度有趣 | 04 影象風格遷移
影象風格遷移是指,將一幅內容圖的內容,和一幅或多幅風格圖的風格融合在一起,從而生成一些有意思的圖片
以下是將一些藝術作品的風格,遷移到一張內容圖之後的效果

我們使用 TensorFlow
和 Keras
分別來實現影象風格遷移,主要用到深度學習中的卷積神經網路,即CNN
準備
安裝包
pip install numpy scipy tensorflow keras 複製程式碼
再準備一些風格圖片,和一張內容圖片
原理
為了將風格圖的風格和內容圖的內容進行融合,所生成的圖片,在內容上應當儘可能接近內容圖,在風格上應當儘可能接近風格圖
因此需要定義 內容損失函式 和 風格損失函式 ,經過加權後作為總的損失函式
實現步驟如下
- 隨機產生一張圖片
- 在每輪迭代中,根據總的損失函式,調整圖片的畫素值
- 經過多輪迭代,得到優化後的圖片
內容損失函式
兩張圖片在內容上相似,不能僅僅靠簡單的純畫素比較
CNN具有抽象和理解影象的能力,因此可以考慮將各個卷積層的輸出作為影象的內容
以 VGG19
為例,其中包括了多個卷積層、池化層,以及最後的全連線層

這裡我們使用 conv4_2
的輸出作為影象的內容表示,定義內容損失函式如下
L_{content}(\vec{p},\vec{x},l)=\frac{1}{2}\sum_{i,j}{(F_{ij}^{l}-P_{ij}^{l})}^2 複製程式碼
風格損失函式
風格是一個很難說清楚的概念,可能是筆觸、紋理、結構、佈局、用色等等
這裡我們使用卷積層各個特徵圖之間的互相關作為影象的風格,以 conv1_1
為例
Gram
Gram
矩陣的計算如下,如果有64個特徵圖,那麼 Gram
矩陣的大小便是 64*64
,第 i
行第 j
列的值表示第 i
個特徵圖和第 j
個特徵圖之間的互相關,用內積計算
G_{ij}^l=\sum_k{F_{ik}^l F_{jk}^l} 複製程式碼
風格損失函式定義如下,對多個卷積層的風格表示差異進行加權
E_l=\frac{1}{4N_l^2 M_l^2}\sum_{i,j}(G_{ij}^l-A_{ij}^l)^2 L_{style}(\vec{a},\vec{x})=\sum_{l=0}^{L}\omega_l E_l 複製程式碼
這裡我們使用 conv1_1
、 conv2_1
、 conv3_1
、 conv4_1
、 conv5_1
五個卷積層,進行風格損失函式的計算,不同的權重會導致不同的遷移效果
總的損失函式
總的損失函式即內容損失函式和風格損失函式的加權,不同的權重會導致不同的遷移效果
L_{total}(\vec{p},\vec{a},\vec{x})=\alpha L_{content}(\vec{p},\vec{x})+\beta L_{style}(\vec{a},\vec{x}) 複製程式碼
TensorFlow實現
載入庫
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import scipy.io import scipy.misc import os import time def the_current_time(): print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(time.time())))) 複製程式碼
定義一些變數
CONTENT_IMG = 'content.jpg' STYLE_IMG = 'style5.jpg' OUTPUT_DIR = 'neural_style_transfer_tensorflow/' if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) IMAGE_W = 800 IMAGE_H = 600 COLOR_C = 3 NOISE_RATIO = 0.7 BETA = 5 ALPHA = 100 VGG_MODEL = 'imagenet-vgg-verydeep-19.mat' MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3)) 複製程式碼
載入 VGG19
模型
def load_vgg_model(path): ''' Details of the VGG19 model: - 0 is conv1_1 (3, 3, 3, 64) - 1 is relu - 2 is conv1_2 (3, 3, 64, 64) - 3 is relu - 4 is maxpool - 5 is conv2_1 (3, 3, 64, 128) - 6 is relu - 7 is conv2_2 (3, 3, 128, 128) - 8 is relu - 9 is maxpool - 10 is conv3_1 (3, 3, 128, 256) - 11 is relu - 12 is conv3_2 (3, 3, 256, 256) - 13 is relu - 14 is conv3_3 (3, 3, 256, 256) - 15 is relu - 16 is conv3_4 (3, 3, 256, 256) - 17 is relu - 18 is maxpool - 19 is conv4_1 (3, 3, 256, 512) - 20 is relu - 21 is conv4_2 (3, 3, 512, 512) - 22 is relu - 23 is conv4_3 (3, 3, 512, 512) - 24 is relu - 25 is conv4_4 (3, 3, 512, 512) - 26 is relu - 27 is maxpool - 28 is conv5_1 (3, 3, 512, 512) - 29 is relu - 30 is conv5_2 (3, 3, 512, 512) - 31 is relu - 32 is conv5_3 (3, 3, 512, 512) - 33 is relu - 34 is conv5_4 (3, 3, 512, 512) - 35 is relu - 36 is maxpool - 37 is fullyconnected (7, 7, 512, 4096) - 38 is relu - 39 is fullyconnected (1, 1, 4096, 4096) - 40 is relu - 41 is fullyconnected (1, 1, 4096, 1000) - 42 is softmax ''' vgg = scipy.io.loadmat(path) vgg_layers = vgg['layers'] def _weights(layer, expected_layer_name): W = vgg_layers[0][layer][0][0][2][0][0] b = vgg_layers[0][layer][0][0][2][0][1] layer_name = vgg_layers[0][layer][0][0][0][0] assert layer_name == expected_layer_name return W, b def _conv2d_relu(prev_layer, layer, layer_name): W, b = _weights(layer, layer_name) W = tf.constant(W) b = tf.constant(np.reshape(b, (b.size))) return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b) def _avgpool(prev_layer): return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') graph = {} graph['input']= tf.Variable(np.zeros((1, IMAGE_H, IMAGE_W, COLOR_C)), dtype='float32') graph['conv1_1']= _conv2d_relu(graph['input'], 0, 'conv1_1') graph['conv1_2']= _conv2d_relu(graph['conv1_1'], 2, 'conv1_2') graph['avgpool1'] = _avgpool(graph['conv1_2']) graph['conv2_1']= _conv2d_relu(graph['avgpool1'], 5, 'conv2_1') graph['conv2_2']= _conv2d_relu(graph['conv2_1'], 7, 'conv2_2') graph['avgpool2'] = _avgpool(graph['conv2_2']) graph['conv3_1']= _conv2d_relu(graph['avgpool2'], 10, 'conv3_1') graph['conv3_2']= _conv2d_relu(graph['conv3_1'], 12, 'conv3_2') graph['conv3_3']= _conv2d_relu(graph['conv3_2'], 14, 'conv3_3') graph['conv3_4']= _conv2d_relu(graph['conv3_3'], 16, 'conv3_4') graph['avgpool3'] = _avgpool(graph['conv3_4']) graph['conv4_1']= _conv2d_relu(graph['avgpool3'], 19, 'conv4_1') graph['conv4_2']= _conv2d_relu(graph['conv4_1'], 21, 'conv4_2') graph['conv4_3']= _conv2d_relu(graph['conv4_2'], 23, 'conv4_3') graph['conv4_4']= _conv2d_relu(graph['conv4_3'], 25, 'conv4_4') graph['avgpool4'] = _avgpool(graph['conv4_4']) graph['conv5_1']= _conv2d_relu(graph['avgpool4'], 28, 'conv5_1') graph['conv5_2']= _conv2d_relu(graph['conv5_1'], 30, 'conv5_2') graph['conv5_3']= _conv2d_relu(graph['conv5_2'], 32, 'conv5_3') graph['conv5_4']= _conv2d_relu(graph['conv5_3'], 34, 'conv5_4') graph['avgpool5'] = _avgpool(graph['conv5_4']) return graph 複製程式碼
內容損失函式
def content_loss_func(sess, model): def _content_loss(p, x): N = p.shape[3] M = p.shape[1] * p.shape[2] return (1 / (4 * N * M)) * tf.reduce_sum(tf.pow(x - p, 2)) return _content_loss(sess.run(model['conv4_2']), model['conv4_2']) 複製程式碼
風格損失函式
STYLE_LAYERS = [('conv1_1', 0.5), ('conv2_1', 1.0), ('conv3_1', 1.5), ('conv4_1', 3.0), ('conv5_1', 4.0)] def style_loss_func(sess, model): def _gram_matrix(F, N, M): Ft = tf.reshape(F, (M, N)) return tf.matmul(tf.transpose(Ft), Ft) def _style_loss(a, x): N = a.shape[3] M = a.shape[1] * a.shape[2] A = _gram_matrix(a, N, M) G = _gram_matrix(x, N, M) return (1 / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow(G - A, 2)) return sum([_style_loss(sess.run(model[layer_name]), model[layer_name]) * w for layer_name, w in STYLE_LAYERS]) 複製程式碼
隨機產生一張初始圖片
def generate_noise_image(content_image, noise_ratio=NOISE_RATIO): noise_image = np.random.uniform(-20, 20, (1, IMAGE_H, IMAGE_W, COLOR_C)).astype('float32') input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio) return input_image 複製程式碼
載入圖片
def load_image(path): image = scipy.misc.imread(path) image = scipy.misc.imresize(image, (IMAGE_H, IMAGE_W)) image = np.reshape(image, ((1, ) + image.shape)) image = image - MEAN_VALUES return image 複製程式碼
儲存圖片
def save_image(path, image): image = image + MEAN_VALUES image = image[0] image = np.clip(image, 0, 255).astype('uint8') scipy.misc.imsave(path, image) 複製程式碼
呼叫以上函式並訓練模型
the_current_time() with tf.Session() as sess: content_image = load_image(CONTENT_IMG) style_image = load_image(STYLE_IMG) model = load_vgg_model(VGG_MODEL) input_image = generate_noise_image(content_image) sess.run(tf.global_variables_initializer()) sess.run(model['input'].assign(content_image)) content_loss = content_loss_func(sess, model) sess.run(model['input'].assign(style_image)) style_loss = style_loss_func(sess, model) total_loss = BETA * content_loss + ALPHA * style_loss optimizer = tf.train.AdamOptimizer(2.0) train = optimizer.minimize(total_loss) sess.run(tf.global_variables_initializer()) sess.run(model['input'].assign(input_image)) ITERATIONS = 2000 for i in range(ITERATIONS): sess.run(train) if i % 100 == 0: output_image = sess.run(model['input']) the_current_time() print('Iteration %d' % i) print('Cost: ', sess.run(total_loss)) save_image(os.path.join(OUTPUT_DIR, 'output_%d.jpg' % i), output_image) 複製程式碼
在GPU上跑,花了5分鐘左右,2000輪迭代後是這個樣子

對比原圖

Keras實現
Keras官方提供了影象風格遷移的例子
ofollow,noindex">github.com/fchollet/ke…
程式碼裡引入了一個 total variation loss
,翻譯為全變差正則,據說可以讓生成的影象更平滑
conv5_2
程式碼使用方法如下
python neural_style_transfer.py path_to_your_base_image.jpg path_to_your_reference.jpg prefix_for_results 複製程式碼
--iter --content_weight --style_weight --tv_weight
新建資料夾 neural_style_transfer_keras
python main_keras.py content.jpg style5.jpg neural_style_transfer_keras/output 複製程式碼
生成的圖片長這樣,10次迭代,花了1分鐘左右
