機器學習筆記（二十二）：TensorFlow實戰十四（影象風格遷移）

阿新 • • 發佈：2018-12-15

1 - 引言

相信大家都使用過一種濾鏡，可以把一張照片轉換成不同風格的照片，如下圖所示：
在這裡插入圖片描述
那麼我們就來利用TensorFlow來實現以下這個演算法，這個演算法出自Gatys的A Neural Algorithm of Artistic Style論文，十分有趣，讓我們來詳細的介紹一下這個演算法吧

2 - 利用VGG提取特徵

總得來說，就是利用一個訓練好的卷積神經網路 VGG-19，這個網路在ImageNet 上已經訓練過了。

給定一張風格影象和一張普通影象，風格影象經過VGG-19 的時候在每個卷積層會得到很多 feature maps, 這些feature maps 組成一個集合，同樣的，普通影象通過 VGG-19 的時候也會得到很多 feature maps，這些feature maps 組成一個集合，然後生成一張隨機噪聲影象 , 隨機噪聲影象通過VGG-19 的時候也會生成很多feature maps，這些 feature maps 構成集合和分別對應集合和 , 最終的優化函式是希望調整讓隨機噪聲影象最後看起來既保持普通影象的內容, 又有一定的風格影象的風格。
在這裡插入圖片描述

3 - 神經風格轉換

總得來說，就是利用一個訓練好的卷積神經網路 VGG-19，這個網路在ImageNet 上已經訓練過了。

所以我們先要構建這些損失函式：

構建內容損失函式 $j_{content}(C,G)$
構建風格損失函式 $J_{style}(S,G)$
把它們放在一起構造總代價函式 $J(G)=\alpha J_{content}(C,G)+\beta J_{style}(S,G)$

3.1 - 計算內容損失

根據內容損失函式
$J_{content}(C,G)=\frac{1}{4*n_h*n_w*n_c}\sum_{所有條目}(a^{(C)}-a^{(G)})^2$

3.2 - 計算風格損失

根據風格損失函式
$J_{style}(S,G)=\frac{1}{4*n_c^2*(n_h*n_w)^2}\sum_{i=1}^{n_c}\sum_{j=1}^{n_c}(G_{ij}^(S)-G_{ij}^(G))^2$

3.3 - 總體成本優化公式

最後我們要建立一個最小化風格的內容成本函式
$J(G)= \alpha J_{content}(C,G)+\beta J_{style}(S,G)$

4 - 使用TensorFlow實現演算法

程式碼執行前先確保下載好了VGG-19模型

import os
import sys
import numpy as np
import scipy.io
import scipy.misc
import tensorflow as tf
 
# Output folder for the images.
OUTPUT_DIR = 'output/'
# Style image to use.
STYLE_IMAGE = '/images/ocean.jpg'
# Content image to use.
CONTENT_IMAGE = '/images/Taipei101.jpg'
# Image dimensions constants.
IMAGE_WIDTH = 800
IMAGE_HEIGHT = 600
COLOR_CHANNELS = 3
 
###############################################################################
# Algorithm constants
###############################################################################
# 設定隨機噪聲影象與內容影象的比率
NOISE_RATIO = 0.6
# 設定迭代次數
ITERATIONS = 1000
# 設定內容影象與風格影象的權重
alpha = 1
beta = 500
# 載入VGG-19 MODEL及設定均值
VGG_Model = 'Downloads/imagenet-vgg-verydeep-19.mat'
MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))
# 設定需要用到的卷積層
CONTENT_LAYERS = [('conv4_2', 1.)]
STYLE_LAYERS = [('conv1_1', 0.2), ('conv2_1', 0.2), ('conv3_1', 0.2), ('conv4_1', 0.2), ('conv5_1', 0.2)]
 
# 生成隨機噪聲圖，與content圖以一定比率融合
def generate_noise_image(content_image, noise_ratio = NOISE_RATIO):
    """
    Returns a noise image intermixed with the content image at a certain ratio.
    """
    noise_image = np.random.uniform(
            -20, 20,
            (1, IMAGE_HEIGHT, IMAGE_WIDTH, COLOR_CHANNELS)).astype('float32')
    # White noise image from the content representation. Take a weighted average
    # of the values
    img = noise_image * noise_ratio + content_image * (1 - noise_ratio)
    return img
 
def load_image(path):
    image = scipy.misc.imread(path)
    # Resize the image for convnet input, there is no change but just
    # add an extra dimension.
    image = np.reshape(image, ((1,) + image.shape))
    # Input to the VGG net expects the mean to be subtracted.
    image = image - MEAN_VALUES
    return image
 
def save_image(path, image):
    # Output should add back the mean.
    image = image + MEAN_VALUES
    # Get rid of the first useless dimension, what remains is the image.
    image = image[0]
    image = np.clip(image, 0, 255).astype('uint8')
    scipy.misc.imsave(path, image)
 
 
def build_net(ntype, nin, nwb=None):
    if ntype == 'conv':
        return tf.nn.relu(tf.nn.conv2d(nin, nwb[0], strides=[1, 1, 1, 1], padding='SAME') + nwb[1])
    elif ntype == 'pool':
        return tf.nn.avg_pool(nin, ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1], padding='SAME')
 
def get_weight_bias(vgg_layers, i):
    weights = vgg_layers[i][0][0][2][0][0]
    weights = tf.constant(weights)
    bias = vgg_layers[i][0][0][2][0][1]
    bias = tf.constant(np.reshape(bias, (bias.size)))
    return weights, bias
 
 
def build_vgg19(path):
    net = {}
    vgg_rawnet = scipy.io.loadmat(path)
    vgg_layers = vgg_rawnet['layers'][0]
    net['input'] = tf.Variable(np.zeros((1, IMAGE_HEIGHT, IMAGE_WIDTH, 3)).astype('float32'))
    net['conv1_1'] = build_net('conv', net['input'], get_weight_bias(vgg_layers, 0))
    net['conv1_2'] = build_net('conv', net['conv1_1'], get_weight_bias(vgg_layers, 2))
    net['pool1'] = build_net('pool', net['conv1_2'])
    net['conv2_1'] = build_net('conv', net['pool1'], get_weight_bias(vgg_layers, 5))
    net['conv2_2'] = build_net('conv', net['conv2_1'], get_weight_bias(vgg_layers, 7))
    net['pool2'] = build_net('pool', net['conv2_2'])
    net['conv3_1'] = build_net('conv', net['pool2'], get_weight_bias(vgg_layers, 10))
    net['conv3_2'] = build_net('conv', net['conv3_1'], get_weight_bias(vgg_layers, 12))
    net['conv3_3'] = build_net('conv', net['conv3_2'], get_weight_bias(vgg_layers, 14))
    net['conv3_4'] = build_net('conv', net['conv3_3'], get_weight_bias(vgg_layers, 16))
    net['pool3'] = build_net('pool', net['conv3_4'])
    net['conv4_1'] = build_net('conv', net['pool3'], get_weight_bias(vgg_layers, 19))
    net['conv4_2'] = build_net('conv', net['conv4_1'], get_weight_bias(vgg_layers, 21))
    net['conv4_3'] = build_net('conv', net['conv4_2'], get_weight_bias(vgg_layers, 23))
    net['conv4_4'] = build_net('conv', net['conv4_3'], get_weight_bias(vgg_layers, 25))
    net['pool4'] = build_net('pool', net['conv4_4'])
    net['conv5_1'] = build_net('conv', net['pool4'], get_weight_bias(vgg_layers, 28))
    net['conv5_2'] = build_net('conv', net['conv5_1'], get_weight_bias(vgg_layers, 30))
    net['conv5_3'] = build_net('conv', net['conv5_2'], get_weight_bias(vgg_layers, 32))
    net['conv5_4'] = build_net('conv', net['conv5_3'], get_weight_bias(vgg_layers, 34))
    net['pool5'] = build_net('pool', net['conv5_4'])
    return net
 
 
def content_layer_loss(p, x):
 
    M = p.shape[1] * p.shape[2]
    N = p.shape[3]
    loss = (1. / (2 * N * M)) * tf.reduce_sum(tf.pow((x - p), 2))
    return loss
 
 
def content_loss_func(sess, net):
 
    layers = CONTENT_LAYERS
    total_content_loss = 0.0
    for layer_name, weight in layers:
        p = sess.run(net[layer_name])
        x = net[layer_name]
        total_content_loss += content_layer_loss(p, x)*weight
 
    total_content_loss /= float(len(layers))
    return total_content_loss
 
 
def gram_matrix(x, area, depth):
 
    x1 = tf.reshape(x, (area, depth))
    g = tf.matmul(tf.transpose(x1), x1)
    return g
 
def style_layer_loss(a, x):
 
    M = a.shape[1] * a.shape[2]
    N = a.shape[3]
    A = gram_matrix(a, M, N)
    G = gram_matrix(x, M, N)
    loss = (1. / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow((G - A), 2))
    return loss
 
 
def style_loss_func(sess, net):
 
    layers = STYLE_LAYERS
    total_style_loss = 0.0
    for layer_name, weight in layers:
        a = sess.run(net[layer_name])
        x = net[layer_name]
        total_style_loss += style_layer_loss(a, x) * weight
    total_style_loss /= float(len(layers))
    return total_style_loss
 
 
def main():
    net = build_vgg19(VGG_Model)
    sess = tf.Session()
    sess.run(tf.initialize_all_variables())
 
    content_img = load_image(CONTENT_IMAGE)
    style_img = load_image(STYLE_IMAGE)
 
    sess.run([net['input'].assign(content_img)])
    cost_content = content_loss_func(sess, net)
 
    sess.run([net['input'].assign(style_img)])
    cost_style = style_loss_func(sess, net)
 
    total_loss = alpha * cost_content + beta * cost_style
    optimizer = tf.train.AdamOptimizer(2.0)
 
    init_img = generate_noise_image(content_img)
 
    train_op = optimizer.minimize(total_loss)
    sess.run(tf.initialize_all_variables())
    sess.run(net['input'].assign(init_img))
 
    for it in range(ITERATIONS):
        sess.run(train_op)
        if it % 100 == 0:
            # Print every 100 iteration.
            mixed_image = sess.run(net['input'])
            print('Iteration %d' % (it))
            print('sum : ', sess.run(tf.reduce_sum(mixed_image)))
            print('cost: ', sess.run(total_loss))
 
            if not os.path.exists(OUTPUT_DIR):
                os.mkdir(OUTPUT_DIR)
 
            filename = 'output/%d.png' % (it)
            save_image(filename, mixed_image)
 
if __name__ == '__main__':
    main()

5 - 實驗結果

將晴空下的盧浮宮設為待融合的原始圖片（很好看有木有）：
在這裡插入圖片描述

與之融合的圖片，山水畫（也很美有木有）：
在這裡插入圖片描述

那麼我們利用影象風格遷移融合的效果是什麼呢？
在這裡插入圖片描述
是不是很酷呢？你們也可以換成其他圖片來試一試效果。

機器學習筆記（二十二）：TensorFlow實戰十四（影象風格遷移）

1 - 引言

2 - 利用VGG提取特徵

3 - 神經風格轉換

3.1 - 計算內容損失

3.2 - 計算風格損失

3.3 - 總體成本優化公式

4 - 使用TensorFlow實現演算法

5 - 實驗結果

機器學習筆記（二十）：TensorFlow實戰十二（TensorBoard視覺化）

機器學習筆記（二十二）：TensorFlow實戰十四（影象風格遷移）

機器學習筆記（十九）：TensorFlow實戰十一（多執行緒輸入資料）

機器學習筆記（十八）：TensorFlow實戰十（影象資料處理）

機器學習筆記（十）：TensorFlow實戰二（深層神經網路）

機器學習筆記（十二）：TensorFlow實戰四（影象識別與卷積神經網路）

機器學習筆記（二十一）：TensorFlow實戰十三（遷移學習）

機器學習筆記（十一）： TensorFlow實戰三（MNIST數字識別問題）

機器學習筆記（十五）：TensorFlow實戰七（經典卷積神經網路：VGG）

機器學習筆記（十四）：TensorFlow實戰六（經典卷積神經網路：AlexNet ）

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

機器學習筆記（十六）：TensorFlow實戰八（經典卷積神經網路：GoogLeNet）

機器學習筆記（九）：Tensorflow 實戰一（Tensorflow入門）

機器學習筆記（十三）：TensorFlow實戰五（經典卷積神經網路： LeNet -5 ）

【OS學習筆記】十三保護模式一：全域性描述符表（GDT）

機器學習筆記（十二）：聚類

機器學習筆記（十二）：TensorFlow實現四（影象識別與卷積神經網路）

吳恩達機器學習筆記（十二）-支援向量機

機器學習筆記（十二）——馬爾科夫模型

（原創）(二)機器學習筆記之數據預處理

機器學習筆記（二十二）：TensorFlow實戰十四（影象風格遷移）

1 - 引言

2 - 利用VGG提取特徵

3 - 神經風格轉換

3.1 - 計算內容損失

3.2 - 計算風格損失

3.3 - 總體成本優化公式

4 - 使用TensorFlow實現演算法

5 - 實驗結果

相關推薦