深度有趣 | 18 二次元頭像生成
圖片爬取自getchu.com/,是一個日本二次元遊戲網站,包含大量遊戲人物立繪,共爬取31,970張

頭像擷取
之前介紹的dlib可用於提取人臉,但不適用於二次元頭像
使用OpenCV從每張圖片中擷取頭像部分,用到以下專案, ofollow,noindex">github.com/nagadomi/lb…
對於檢測結果適當放大範圍,以包含更多人物細節
# -*- coding: utf-8 -*- import cv2 cascade = cv2.CascadeClassifier('lbpcascade_animeface.xml') image = cv2.imread('imgs/二次元頭像示例.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.equalizeHist(gray) faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(64, 64)) for i, (x, y, w, h) in enumerate(faces): cx = x + w // 2 cy = y + h // 2 x0 = cx - int(0.75 * w) x1 = cx + int(0.75 * w) y0 = cy - int(0.75 * h) y1 = cy + int(0.75 * h) if x0 < 0: x0 = 0 if y0 < 0: y0 = 0 if x1 >= image.shape[1]: x1 = image.shape[1] - 1 if y1 >= image.shape[0]: y1 = image.shape[0] - 1 w = x1 - x0 h = y1 - y0 if w > h: x0 = x0 + w // 2 - h // 2 x1 = x1 - w // 2 + h // 2 w = h else: y0 = y0 + h // 2 - w // 2 y1 = y1 - h // 2 + w // 2 h = w face = image[y0: y0 + h, x0: x0 + w, :] face = cv2.resize(face, (128, 128)) cv2.imwrite('face_%d.jpg' % i, face) 複製程式碼

標籤提取
使用 Illustration2Vec
從二次元圖片中抽取豐富的標籤, github.com/rezoo/illus…
Illustration2Vec用到 chainer
這個深度學習框架,以及一些其他庫,如果沒有則安裝
pip install chainer Pillow scikit-image 複製程式碼
Illustration2Vec可以完成以下三項功能:
- 將每張圖片表示為一個4096維的向量
- 指定閾值,並提取概率高於閾值的標籤
- 指定一些標籤,並返回對應的概率
舉個例子,提取全部可能的標籤,以0.5為閾值
# -*- coding: utf-8 -*- import i2v from imageio import imread illust2vec = i2v.make_i2v_with_chainer('illust2vec_tag_ver200.caffemodel', 'tag_list.json') img = imread('imgs/二次元頭像示例.jpg') tags = illust2vec.estimate_plausible_tags([img], threshold=0.5) print(tags) tags = illust2vec.estimate_specific_tags([img], ['blue eyes', 'red hair']) print(tags) 複製程式碼
也可以指定標籤並獲取對應的概率
[{'blue eyes': 0.9488178491592407, 'red hair': 0.0025324225425720215}] 複製程式碼
預處理
在伺服器上處理全部圖片,即擷取頭像、提取標籤
載入庫
# -*- coding: utf-8 -*- import i2v import cv2 import glob import os from imageio import imread from tqdm import tqdm import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pickle 複製程式碼
讀取圖片路徑
images = glob.glob('characters/*.jpg') print(len(images)) 複製程式碼
載入兩個模型
illust2vec = i2v.make_i2v_with_chainer('illust2vec_tag_ver200.caffemodel', 'tag_list.json') cascade = cv2.CascadeClassifier('lbpcascade_animeface.xml') OUTPUT_DIR = 'faces/' if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) 複製程式碼
提取全部頭像,共檢測到27772張
num = 0 for x in tqdm(range(len(images))): img_path = images[x] image = cv2.imread(img_path) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.equalizeHist(gray) faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(64, 64)) for (x, y, w, h) in faces: cx = x + w // 2 cy = y + h // 2 x0 = cx - int(0.75 * w) x1 = cx + int(0.75 * w) y0 = cy - int(0.75 * h) y1 = cy + int(0.75 * h) if x0 < 0: x0 = 0 if y0 < 0: y0 = 0 if x1 >= image.shape[1]: x1 = image.shape[1] - 1 if y1 >= image.shape[0]: y1 = image.shape[0] - 1 w = x1 - x0 h = y1 - y0 if w > h: x0 = x0 + w // 2 - h // 2 x1 = x1 - w // 2 + h // 2 w = h else: y0 = y0 + h // 2 - w // 2 y1 = y1 - h // 2 + w // 2 h = w face = image[y0: y0 + h, x0: x0 + w, :] face = cv2.resize(face, (128, 128)) cv2.imwrite(os.path.join(OUTPUT_DIR, '%d.jpg' % num), face) num += 1 print(num) 複製程式碼
感興趣的標籤包括以下34個:
- 13種頭髮顏色:blonde hair, brown hair, black hair, blue hair, pink hair, purple hair, green hair, red hair, silver hair, white hair, orange hair, aqua hair, grey hair
- 5種髮型:long hair, short hair, twintails, drill hair, ponytail
- 10種眼睛顏色:blue eyes, red eyes, brown eyes, green eyes, purple eyes, yellow eyes, pink eyes, aqua eyes, black eyes, orange eyes
- 6種其他屬性:blush, smile, open mouth, hat, ribbon, glasses
頭髮顏色、髮型和眼睛顏色取概率最高的一種,其他屬性概率高於0.25則以存在處理
fw = open('face_tags.txt', 'w') tags = ['blonde hair', 'brown hair', 'black hair', 'blue hair', 'pink hair', 'purple hair', 'green hair', 'red hair', 'silver hair', 'white hair', 'orange hair', 'aqua hair', 'grey hair', 'long hair', 'short hair', 'twintails', 'drill hair', 'ponytail', 'blue eyes', 'red eyes', 'brown eyes', 'green eyes', 'purple eyes', 'yellow eyes', 'pink eyes', 'aqua eyes', 'black eyes', 'orange eyes', 'blush', 'smile', 'open mouth', 'hat', 'ribbon', 'glasses'] fw.write('id,' + ','.join(tags) + '\n') images = glob.glob(os.path.join(OUTPUT_DIR, '*.jpg')) for x in tqdm(range(len(images))): img_path = images[x] image = imread(img_path) result = illust2vec.estimate_specific_tags([image], tags)[0] hair_colors = [[h, result[h]] for h in tags[0:13]] hair_colors.sort(key=lambda x:x[1], reverse=True) for h in tags[0:13]: if h == hair_colors[0][0]: result[h] = 1 else: result[h] = 0 hair_styles = [[h, result[h]] for h in tags[13:18]] hair_styles.sort(key=lambda x:x[1], reverse=True) for h in tags[13:18]: if h == hair_styles[0][0]: result[h] = 1 else: result[h] = 0 eye_colors = [[h, result[h]] for h in tags[18:28]] eye_colors.sort(key=lambda x:x[1], reverse=True) for h in tags[18:28]: if h == eye_colors[0][0]: result[h] = 1 else: result[h] = 0 for h in tags[28:]: if result[h] > 0.25: result[h] = 1 else: result[h] = 0 fw.write(img_path + ',' + ','.join([str(result[t]) for t in tags]) + '\n') fw.close() 複製程式碼
這樣一來,便得到了27772張二次元頭像,以及每張頭像對應的34個標籤值
獲取每張頭像的4096維向量表示
illust2vec = i2v.make_i2v_with_chainer("illust2vec_ver200.caffemodel") img_all = [] vec_all = [] for x in tqdm(range(len(images))): img_path = images[x] image = imread(img_path) vector = illust2vec.extract_feature([image])[0] img_all.append(image / 255.) vec_all.append(vector) img_all = np.array(img_all) vec_all = np.array(vec_all) 複製程式碼
隨機選擇2000張頭像,進行tSNE降維視覺化
from sklearn.manifold import TSNE from imageio import imsave data_index = np.arange(img_all.shape[0]) np.random.shuffle(data_index) data_index = data_index[:2000] tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000) two_d_vectors = tsne.fit_transform(vec_all[data_index, :]) puzzles = np.ones((6400, 6400, 3)) xmin = np.min(two_d_vectors[:, 0]) xmax = np.max(two_d_vectors[:, 0]) ymin = np.min(two_d_vectors[:, 1]) ymax = np.max(two_d_vectors[:, 1]) for i, vector in enumerate(two_d_vectors): x, y = two_d_vectors[i, :] x = int((x - xmin) / (xmax - xmin) * (6400 - 128) + 64) y = int((y - ymin) / (ymax - ymin) * (6400 - 128) + 64) puzzles[y - 64: y + 64, x - 64: x + 64, :] = img_all[data_index[i]] imsave('二次元頭像降維視覺化.png', puzzles) 複製程式碼
視覺化結果如下,相似的頭像確實被聚到了一起

模型
使用ACGAN結構,但和CelebA中用的DCGAN不同,這次使用更深更復雜的網路來實現G和D,參考自SRGAN, arxiv.org/abs/1609.04…
生成器結構如下:
- 使用16個殘差塊,即ResNet中的shortcut思想
- 使用Sub-pixel CNN代替deconvolution, arxiv.org/abs/1609.05…

Sub-pixel CNN原理如下,把多個層拼接成一個層,從而達到增加高度和寬度、減少深度的目的

判別器結構如下,使用10個殘差塊,輸出端包括兩支,分別完成判別和分類任務

實現
載入庫
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import os from imageio import imread, imsave, mimsave import glob from tqdm import tqdm 複製程式碼
載入圖片
images = glob.glob('faces/*.jpg') print(len(images)) 複製程式碼
載入標籤
tags = pd.read_csv('face_tags.txt') tags.index = tags['id'] tags.head() 複製程式碼
定義一些常量、網路tensor、輔助函式,批大小設為2的冪比較合適,這裡設為64,考慮學習率衰減
batch_size = 64 z_dim = 128 WIDTH = 128 HEIGHT = 128 LABEL = 34 LAMBDA = 0.05 BETA = 3 OUTPUT_DIR = 'samples' if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) X = tf.placeholder(dtype=tf.float32, shape=[batch_size, HEIGHT, WIDTH, 3], name='X') X_perturb = tf.placeholder(dtype=tf.float32, shape=[batch_size, HEIGHT, WIDTH, 3], name='X_perturb') Y = tf.placeholder(dtype=tf.float32, shape=[batch_size, LABEL], name='Y') noise = tf.placeholder(dtype=tf.float32, shape=[batch_size, z_dim], name='noise') noise_y = tf.placeholder(dtype=tf.float32, shape=[batch_size, LABEL], name='noise_y') is_training = tf.placeholder(dtype=tf.bool, name='is_training') global_step = tf.Variable(0, trainable=False) add_global = global_step.assign_add(1) initial_learning_rate = 0.0002 learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step=global_step, decay_steps=20000, decay_rate=0.5) def lrelu(x, leak=0.2): return tf.maximum(x, leak * x) def sigmoid_cross_entropy_with_logits(x, y): return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y) def conv2d(inputs, kernel_size, filters, strides, padding='same', use_bias=True): return tf.layers.conv2d(inputs=inputs, kernel_size=kernel_size, filters=filters, strides=strides, padding=padding, use_bias=use_bias) def batch_norm(inputs, is_training=is_training, decay=0.9): return tf.contrib.layers.batch_norm(inputs, is_training=is_training, decay=decay) 複製程式碼
判別器部分
def d_block(inputs, filters): h0 = lrelu(conv2d(inputs, 3, filters, 1)) h0 = conv2d(h0, 3, filters, 1) h0 = lrelu(tf.add(h0, inputs)) return h0 def discriminator(image, reuse=None): with tf.variable_scope('discriminator', reuse=reuse): h0 = image f = 32 for i in range(5): if i < 3: h0 = lrelu(conv2d(h0, 4, f, 2)) else: h0 = lrelu(conv2d(h0, 3, f, 2)) h0 = d_block(h0, f) h0 = d_block(h0, f) f = f * 2 h0 = lrelu(conv2d(h0, 3, f, 2)) h0 = tf.contrib.layers.flatten(h0) Y_ = tf.layers.dense(h0, units=LABEL) h0 = tf.layers.dense(h0, units=1) return h0, Y_ 複製程式碼
生成器部分
def g_block(inputs): h0 = tf.nn.relu(batch_norm(conv2d(inputs, 3, 64, 1, use_bias=False))) h0 = batch_norm(conv2d(h0, 3, 64, 1, use_bias=False)) h0 = tf.add(h0, inputs) return h0 def generator(z, label): with tf.variable_scope('generator', reuse=None): d = 16 z = tf.concat([z, label], axis=1) h0 = tf.layers.dense(z, units=d * d * 64) h0 = tf.reshape(h0, shape=[-1, d, d, 64]) h0 = tf.nn.relu(batch_norm(h0)) shortcut = h0 for i in range(16): h0 = g_block(h0) h0 = tf.nn.relu(batch_norm(h0)) h0 = tf.add(h0, shortcut) for i in range(3): h0 = conv2d(h0, 3, 256, 1, use_bias=False) h0 = tf.depth_to_space(h0, 2) h0 = tf.nn.relu(batch_norm(h0)) h0 = tf.layers.conv2d(h0, kernel_size=9, filters=3, strides=1, padding='same', activation=tf.nn.tanh, name='g', use_bias=True) return h0 複製程式碼
損失函式,這裡的gp項來自DRAGAN, arxiv.org/abs/1705.07… ,WGAN使用真實樣本和合成樣本的插值,而DRAGAN使用真實樣本和干擾樣本的插值
g = generator(noise, noise_y) d_real, y_real = discriminator(X) d_fake, y_fake = discriminator(g, reuse=True) loss_d_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_real, tf.ones_like(d_real))) loss_d_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.zeros_like(d_fake))) loss_g_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(d_fake, tf.ones_like(d_fake))) loss_c_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(y_real, Y)) loss_c_fake = tf.reduce_mean(sigmoid_cross_entropy_with_logits(y_fake, noise_y)) loss_d = loss_d_real + loss_d_fake + BETA * loss_c_real loss_g = loss_g_fake + BETA * loss_c_fake alpha = tf.random_uniform(shape=[batch_size, 1, 1, 1], minval=0., maxval=1.) interpolates = alpha * X + (1 - alpha) * X_perturb grad = tf.gradients(discriminator(interpolates, reuse=True)[0], [interpolates])[0] slop = tf.sqrt(tf.reduce_sum(tf.square(grad), axis=[1])) gp = tf.reduce_mean((slop - 1.) ** 2) loss_d += LAMBDA * gp vars_g = [var for var in tf.trainable_variables() if var.name.startswith('generator')] vars_d = [var for var in tf.trainable_variables() if var.name.startswith('discriminator')] 複製程式碼
定義優化器
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): optimizer_d = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.5).minimize(loss_d, var_list=vars_d) optimizer_g = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.5).minimize(loss_g, var_list=vars_g) 複製程式碼
合成圖片的函式
def montage(images): if isinstance(images, list): images = np.array(images) img_h = images.shape[1] img_w = images.shape[2] n_plots = int(np.ceil(np.sqrt(images.shape[0]))) if len(images.shape) == 4 and images.shape[3] == 3: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1, 3)) * 0.5 elif len(images.shape) == 4 and images.shape[3] == 1: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1, 1)) * 0.5 elif len(images.shape) == 3: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1)) * 0.5 else: raise ValueError('Could not parse image shape of {}'.format(images.shape)) for i in range(n_plots): for j in range(n_plots): this_filter = i * n_plots + j if this_filter < images.shape[0]: this_img = images[this_filter] m[1 + i + i * img_h:1 + i + (i + 1) * img_h, 1 + j + j * img_w:1 + j + (j + 1) * img_w] = this_img return m 複製程式碼
整理資料
X_all = [] Y_all = [] for i in tqdm(range(len(images))): image = imread(images[i]) image = (image / 255. - 0.5) * 2 X_all.append(image) y = list(tags.loc[images[i]]) Y_all.append(y[1:]) X_all = np.array(X_all) Y_all = np.array(Y_all) print(X_all.shape, Y_all.shape) 複製程式碼
定義隨機產生標籤的函式,原始資料中標籤分佈不均勻,但我們希望G能學到各種標籤,所以均勻地生成各類標籤
def get_random_tags(): y = np.random.uniform(0.0, 1.0, [batch_size, LABEL]).astype(np.float32) y[y > 0.75] = 1 y[y <= 0.75] = 0 for i in range(batch_size): hc = np.random.randint(0, 13) hs = np.random.randint(13, 18) ec = np.random.randint(18, 28) y[i, :28] = 0 y[i, hc] = 1 # hair color y[i, hs] = 1 # hair style y[i, ec] = 1 # eye color return y 複製程式碼
訓練模型,CelebA中男女比例均衡,因此每次迭代隨機取一批資料訓練即可。但現在由於原始資料中各類標籤分佈不均勻,所以需要完整地迭代資料
sess = tf.Session() sess.run(tf.global_variables_initializer()) z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32) y_samples = get_random_tags() for i in range(batch_size): y_samples[i, :28] = 0 y_samples[i, i // 8 % 13] = 1 # hair color y_samples[i, i // 8 % 5 + 13] = 1 # hair style y_samples[i, i // 8 % 10 + 18] = 1 # eye color samples = [] loss = {'d': [], 'g': []} offset = 0 for i in tqdm(range(60000)): if offset + batch_size > X_all.shape[0]: offset = 0 if offset == 0: data_index = np.arange(X_all.shape[0]) np.random.shuffle(data_index) X_all = X_all[data_index, :, :, :] Y_all = Y_all[data_index, :] X_batch = X_all[offset: offset + batch_size, :, :, :] Y_batch = Y_all[offset: offset + batch_size, :] X_batch_perturb = X_batch + 0.5 * X_batch.std() * np.random.random(X_batch.shape) offset += batch_size n = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32) ny = get_random_tags() _, d_ls = sess.run([optimizer_d, loss_d], feed_dict={X: X_batch, X_perturb: X_batch_perturb, Y: Y_batch, noise: n, noise_y: ny, is_training: True}) n = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32) ny = get_random_tags() _, g_ls = sess.run([optimizer_g, loss_g], feed_dict={noise: n, noise_y: ny, is_training: True}) loss['d'].append(d_ls) loss['g'].append(g_ls) _, lr = sess.run([add_global, learning_rate]) if i % 500 == 0: print(i, d_ls, g_ls, lr) gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False}) gen_imgs = (gen_imgs + 1) / 2 imgs = [img[:, :, :] for img in gen_imgs] gen_imgs = montage(imgs) plt.axis('off') plt.imshow(gen_imgs) imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % i), gen_imgs) plt.show() samples.append(gen_imgs) plt.plot(loss['d'], label='Discriminator') plt.plot(loss['g'], label='Generator') plt.legend(loc='upper right') plt.savefig('Loss.png') plt.show() mimsave(os.path.join(OUTPUT_DIR, 'samples.gif'), samples, fps=10) 複製程式碼
生成的二次元頭像如下,每一行對應的頭髮顏色、髮型、眼睛顏色相同,其他屬性隨機。少部分結果不太好,可能是某些噪音或條件的問題

儲存模型
saver = tf.train.Saver() saver.save(sess, './anime_acgan', global_step=60000) 複製程式碼
在單機上載入模型,進行以下三項嘗試:
- 按原始標籤分佈隨機生成樣本
- 生成指定標籤的樣本
- 固定噪音,按原始標籤分佈生成樣本
# -*- coding: utf-8 -*- import tensorflow as tf import numpy as np import matplotlib.pyplot as plt from imageio import imsave batch_size = 64 z_dim = 128 LABEL = 34 def montage(images): if isinstance(images, list): images = np.array(images) img_h = images.shape[1] img_w = images.shape[2] n_plots = int(np.ceil(np.sqrt(images.shape[0]))) if len(images.shape) == 4 and images.shape[3] == 3: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1, 3)) * 0.5 elif len(images.shape) == 4 and images.shape[3] == 1: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1, 1)) * 0.5 elif len(images.shape) == 3: m = np.ones( (images.shape[1] * n_plots + n_plots + 1, images.shape[2] * n_plots + n_plots + 1)) * 0.5 else: raise ValueError('Could not parse image shape of {}'.format(images.shape)) for i in range(n_plots): for j in range(n_plots): this_filter = i * n_plots + j if this_filter < images.shape[0]: this_img = images[this_filter] m[1 + i + i * img_h:1 + i + (i + 1) * img_h, 1 + j + j * img_w:1 + j + (j + 1) * img_w] = this_img return m def get_random_tags(): y = np.random.uniform(0.0, 1.0, [batch_size, LABEL]).astype(np.float32) p_other = [0.6, 0.6, 0.25, 0.04488882, 0.3, 0.05384738] for i in range(batch_size): for j in range(len(p_other)): if y[i, j + 28] < p_other[j]: y[i, j + 28] = 1 else: y[i, j + 28] = 0 phc = [0.15968645, 0.21305391, 0.15491921, 0.10523116, 0.07953927, 0.09508879, 0.03567429, 0.07733163, 0.03157895, 0.01833307, 0.02236442, 0.00537514, 0.00182371] phs = [0.52989922,0.37101264,0.12567589,0.00291153,0.00847864] pec = [0.28350664, 0.15760678, 0.17862742, 0.13412254, 0.14212126, 0.0543913, 0.01020637, 0.00617501, 0.03167493, 0.00156775] for i in range(batch_size): y[i, :28] = 0 hc = np.random.random() for j in range(len(phc)): if np.sum(phc[:j]) < hc < np.sum(phc[:j + 1]): y[i, j] = 1 break hs = np.random.random() for j in range(len(phs)): if np.sum(phs[:j]) < hs < np.sum(phs[:j + 1]): y[i, j + 13] = 1 break ec = np.random.random() for j in range(len(pec)): if np.sum(pec[:j]) < ec < np.sum(pec[:j + 1]): y[i, j + 18] = 1 break return y sess = tf.Session() sess.run(tf.global_variables_initializer()) saver = tf.train.import_meta_graph('./anime_acgan-60000.meta') saver.restore(sess, tf.train.latest_checkpoint('./')) graph = tf.get_default_graph() g = graph.get_tensor_by_name('generator/g/Tanh:0') noise = graph.get_tensor_by_name('noise:0') noise_y = graph.get_tensor_by_name('noise_y:0') is_training = graph.get_tensor_by_name('is_training:0') # 隨機生成樣本 z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32) y_samples = get_random_tags() gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False}) gen_imgs = (gen_imgs + 1) / 2 imgs = [img[:, :, :] for img in gen_imgs] gen_imgs = montage(imgs) gen_imgs = np.clip(gen_imgs, 0, 1) imsave('1_二次元頭像隨機生成.jpg', gen_imgs) # 生成指定標籤的樣本 all_tags = ['blonde hair', 'brown hair', 'black hair', 'blue hair', 'pink hair', 'purple hair', 'green hair', 'red hair', 'silver hair', 'white hair', 'orange hair', 'aqua hair', 'grey hair', 'long hair', 'short hair', 'twintails', 'drill hair', 'ponytail', 'blue eyes', 'red eyes', 'brown eyes', 'green eyes', 'purple eyes', 'yellow eyes', 'pink eyes', 'aqua eyes', 'black eyes', 'orange eyes', 'blush', 'smile', 'open mouth', 'hat', 'ribbon', 'glasses'] for i, tags in enumerate([['blonde hair', 'twintails', 'blush', 'smile', 'ribbon', 'red eyes'], ['silver hair', 'long hair', 'blush', 'smile', 'open mouth', 'blue eyes']]): z_samples = np.random.uniform(-1.0, 1.0, [batch_size, z_dim]).astype(np.float32) y_samples = np.zeros([1, LABEL]) for tag in tags: y_samples[0, all_tags.index(tag)] = 1 y_samples = np.repeat(y_samples, batch_size, 0) gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False}) gen_imgs = (gen_imgs + 1) / 2 imgs = [img[:, :, :] for img in gen_imgs] gen_imgs = montage(imgs) gen_imgs = np.clip(gen_imgs, 0, 1) imsave('%d_二次元頭像指定標籤.jpg' % (i + 2), gen_imgs) # 固定噪音隨機標籤 z_samples = np.random.uniform(-1.0, 1.0, [1, z_dim]).astype(np.float32) z_samples = np.repeat(z_samples, batch_size, 0) y_samples = get_random_tags() gen_imgs = sess.run(g, feed_dict={noise: z_samples, noise_y: y_samples, is_training: False}) gen_imgs = (gen_imgs + 1) / 2 imgs = [img[:, :, :] for img in gen_imgs] gen_imgs = montage(imgs) gen_imgs = np.clip(gen_imgs, 0, 1) imsave('4_二次元頭像固定噪音.jpg', gen_imgs) 複製程式碼
按原始標籤分佈隨機生成樣本

生成金髮、雙馬尾、臉紅、微笑、繫絲帶、紅眼睛的頭像

生成銀髮、長髮、臉紅、微笑、張嘴、藍眼睛的頭像

固定噪音隨機標籤,使得頭像主體大致相同但各種細節不一樣

掌握以上內容後,也可以在CelebA上訓練受40個01屬性控制的ACGAN模型,而且比二次元頭像更簡單一些