深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

阿新 • • 發佈：2018-12-15

一、背景介紹

這篇文章主要參考我的上一篇文章：深度學習（一）——deepNN模型實現攝像頭實時識別人臉表情（C++和python3.6混合程式設計）。由於上一篇文章的模型所採用的資料集為fer2013，前面也介紹過這個基於這個資料集的模型識別人臉表情的準確率大概在70%左右，而fer2013資料集僅僅包含了7類常見的表情，無法對於更多樣的表情進行識別。因此本文針對這個問題，自己採集人臉表情資料，採集人臉資料的方法可以參考我的文章：從零開始製作人臉表情的資料集，採集好資料集之後進行模型訓練，實現誇張人臉表情的實時監測，用的模型還是deepNN。關於整個檔案的結構，可以直接參考文章最後面。

主要參考：

二、資料集準備

1. haarcascade_frontalface_default.xml檔案

這裡跟之前文章的思路是一樣的，仍然用到了haarcascade_frontalface_default.xml檔案。如何獲取該檔案，我在上一篇文章中有詳細說明，這裡不再過多介紹。

2.誇張人臉表情資料集

首先設計10類誇張人臉表情，我這裡取了吃驚，大哭，高興，撅嘴，皺眉，擡頭，低頭，向左看，向右看，憂鬱這10類表情。下面的關鍵是獲取這10類表情的資料。具體關於表情資料集的獲取及製作，可以參考：從零開始製作人臉表情的資料集。這裡需要注意的是，向左看和向右看的影象不能進行映象處理！！！

自動製作好表情之後，仍需要自己進行簡單的手動處理，主要去除一些明顯的非人臉影象。之後儘量保證每個每類的照片數量近似相等，這裡我選擇每類都有表情影象100張。

三、模型實現

1.製作資料標籤

因為模型是以分類的思想來做的，因此我們需要對每一類的表情資料打上標籤（label）。我的想法是，在每個資料夾下讀取相應的圖片，然後將其路徑和標籤一起儲存在一個txt文字中。這裡先給出程式碼：

# 生成影象及標籤檔案https://blog.csdn.net/u010682375/article/details/77746489
import os

def generate(dir,label):
    files = os.listdir(dir)
    files.sort()
    print('start...')

    listText = open(dir + '\\' + 'zzz_list.txt', 'w')
    for file in files:
        fileType = os.path.split(file)
        if fileType[1] == '.txt':
            continue
        name = file + ' ' + str(int(label)) + '\n'
        listText.write(dir + name)
    listText.close()
    print('down!')


if __name__ == '__main__':
    generate('data/chijing/', 0)
    generate('data/daku/', 1)
    generate('data/gaoxing/', 2)
    generate('data/juezui/', 3)
    generate('data/zhoumei/', 4)
    generate('data/taitou/', 5)
    generate('data/ditou/', 6)
    generate('data/xiangzuokan/', 7)
    generate('data/xiangyoukan/', 8)
    generate('data/youyu/', 9)

一共有10種表情，所以自然有10種label，且label的編號從0~9。編寫好上述程式之後執行程式，在每個表情資料資料夾下面都會生成一個txt文件，以吃驚表情為例，在'data/chijing/'路徑下，找到zzz_list.txt檔案，開啟即可看到：

裡面記錄了所有吃驚表情的圖片路徑及標籤。

接下來我們需要手動做的是，將這10類表情的txt檔案彙總成一個txt檔案，放在目錄'data/'路徑下，並命名為list.txt，即將所有的影象和標籤製作完畢。

2.批量讀取資料

做好資料集和標籤之後，接下來是編寫資料讀取函式。這個函式的主要功能就是，輸入list.txt檔案，它能夠自動提取txt裡面的所有圖片及其相對應的標籤。下面先直接給出程式碼：

import numpy as np
from PIL import Image

def load_data(txt_dir):

    fopen = open(txt_dir, 'r')
    lines = fopen.read().splitlines()   # 逐行讀取txt
    count = len(open(txt_dir, 'rU').readlines())      # 計算txt有多少行

    data_set = np.empty((count, 128, 128, 1), dtype="float32")
    label = np.zeros((count, 10), dtype="uint8")

    i = 0
    for line in lines:

        line = line.split(" ")          # 利用空格進行分割

        img = Image.open(line[0])
        print(i, img.size)
        # img = skimage.io.image(line[0])
        label[i, int(line[1])] = 1

        img = img.convert('L')          # 轉灰度影象
        array = np.asarray(img, dtype="float32")
        data_set[i, :, :, 0] = array

        i += 1

    return data_set, label


if __name__ == '__main__':
    txt_dir = 'data/list.txt'
    data_set, label = load_data(txt_dir)
    print(data_set.shape)
    print(label.shape)

編寫完上述程式碼可以直接執行，如果程式碼和txt檔案沒問題的話，那最終會輸出data和label的維度。

3.訓練模型

準備好了資料之後，接下來則是訓練模型。下面先給出訓練模型的程式碼：

import os
import tensorflow as tf
import numpy as np
from read_data import load_data


EMOTIONS = ['chijing', 'daku', 'gaoxing', 'juezui', 'zhoumei',
            'taitou', 'ditou', 'xiangzuokan', 'xiangyoukan', 'youyu']

def deepnn(x):
    x_image = tf.reshape(x, [-1, 128, 128, 1])

    # conv1
    w_conv1 = weight_variables([5, 5, 1, 64])
    b_conv1 = bias_variable([64])
    h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
    # pool1
    h_pool1 = maxpool(h_conv1)
    # norm1
    norm1 = tf.nn.lrn(h_pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

    # conv2
    w_conv2 = weight_variables([3, 3, 64, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(norm1, w_conv2) + b_conv2)
    norm2 = tf.nn.lrn(h_conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
    h_pool2 = maxpool(norm2)

    # Fully connected layer
    w_fc1 = weight_variables([32 * 32 * 64, 384])
    b_fc1 = bias_variable([384])
    h_conv3_flat = tf.reshape(h_pool2, [-1, 32 * 32 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_conv3_flat, w_fc1) + b_fc1)

    # Fully connected layer
    w_fc2 = weight_variables([384, 192])
    b_fc2 = bias_variable([192])
    h_fc2 = tf.matmul(h_fc1, w_fc2) + b_fc2

    # linear
    w_fc3 = weight_variables([192, 10])         # 一共10類
    b_fc3 = bias_variable([10])                 # 一共10類
    y_conv = tf.add(tf.matmul(h_fc2, w_fc3), b_fc3)

    return y_conv


def weight_variables(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, w):
    return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')


def maxpool(x):
    return tf.nn.max_pool(x, ksize=[1, 3, 3, 1],
                            strides=[1, 2, 2, 1], padding='SAME')


def train_model():

    # 構建模型----------------------------------------------------------
    x = tf.placeholder(tf.float32, [None, 16384])
    y_ = tf.placeholder(tf.float32, [None, 10])

    y_conv = deepnn(x)

    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # 構建完畢----------------------------------------------------------

    # 讀取資料
    data_set, label = load_data('./data/list.txt')
    max_train_epochs = 30001
    batch_size = 100

    if not os.path.exists('./models/emotion_model'):
        os.makedirs('./models/emotion_model')

    with tf.Session() as sess:
        saver = tf.train.Saver()
        sess.run(tf.global_variables_initializer())

        batch_num = int(data_set.shape[0] / batch_size)

        for i in range(max_train_epochs):
            for j in range(batch_num):
                train_image = data_set[j * batch_size:j * batch_size + batch_size]
                train_image = train_image.reshape(-1, 128*128)
                train_label = label[j * batch_size:j * batch_size + batch_size]
                train_label = np.reshape(train_label, [-1, 10])

                train_step.run(feed_dict={x: train_image, y_: train_label})

            if i % 1 == 0:
                train_accuracy = accuracy.eval(feed_dict={
                    x: train_image, y_: train_label})
                print('epoch %d, training accuracy %f' % (i, train_accuracy))

            if i % 50 == 0:
                saver.save(sess, './models/emotion_model', global_step=i + 1)


if __name__ == '__main__':
    train_model()

編寫訓練模型程式碼的思路很簡單：首先是編寫deepNN模型結構，其次是在train函式中編寫網路結構及相關引數，然後讀取訓練資料傳入模型，進行訓練並儲存訓練結果即可。編寫好了之後直接執行。模型每訓練50個epoch會儲存一次，模型儲存的路徑為'./models/emotion_model'。

4.模型測試

訓練好之後，最後一步就是模型的測試。這一步主要做的就是，載入訓練好的模型，並開啟攝像頭，實時判斷人臉表情。下面直接給出程式碼：

from train_model import *

EMOJI_DIR = './files/emotion/'
CASC_PATH = './haarcascade_frontalface_alt.xml'
cascade_classifier = cv2.CascadeClassifier(CASC_PATH)

def format_image(image):
    '''
    函式功能：轉換影象的格式
    '''
    if len(image.shape) > 2 and image.shape[2] == 3:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    faces = cascade_classifier.detectMultiScale(
        image, scaleFactor=1.3, minNeighbors=5)

    # None is no face found in image
    if not len(faces) > 0:
        return None, None

    max_are_face = faces[0]
    for face in faces:
        if face[2] * face[3] > max_are_face[2] * max_are_face[3]:
            max_are_face = face

    # face to image
    face_coor = max_are_face
    image = image[face_coor[1]:(face_coor[1] + face_coor[2]), face_coor[0]:(face_coor[0] + face_coor[3])]

    # Resize image to network size
    try:
        image = cv2.resize(image, (128, 128), interpolation=cv2.INTER_CUBIC)
    except Exception:
        print("[+} Problem during resize")
        return None, None
    return image, face_coor


def face_dect(image):
    """
    檢測影象中有多少張臉
    """
    if len(image.shape) > 2 and image.shape[2] == 3:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    faces = cascade_classifier.detectMultiScale(
        image, scaleFactor=1.3, minNeighbors=5)

    if not len(faces) > 0:
        return None
    max_face = faces[0]
    for face in faces:
        if face[2] * face[3] > max_face[2] * max_face[3]:
            max_face = face
    face_image = image[max_face[1]:(max_face[1] + max_face[2]), max_face[0]:(max_face[0] + max_face[3])]
    try:
        image = cv2.resize(face_image, (48, 48), interpolation=cv2.INTER_CUBIC) / 255.
    except Exception:
        print("[+} Problem during resize")
        return None
    return face_image, image


def resize_image(image, size):
    try:
        image = cv2.resize(image, size, interpolation=cv2.INTER_CUBIC) / 255.
    except Exception:
        print("+} Problem during resize")
        return None
    return image


def image_to_tensor(image):
    tensor = np.asarray(image).reshape(-1, 128*128) * 1 / 255.0
    return tensor


def demo(modelPath, showBox=False):
    # 構建模型---------------------------------------------------
    face_x = tf.placeholder(tf.float32, [None, 128*128])
    y_conv = deepnn(face_x)
    probs = tf.nn.softmax(y_conv)
    # 構建完畢---------------------------------------------------

    # 儲存器
    saver = tf.train.Saver()
    ckpt = tf.train.get_checkpoint_state(modelPath)
    sess = tf.Session()

    # 載入模型
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
        print('Restore model sucsses!!')

    # 載入emoji
    feelings_faces = []
    for index, emotion in enumerate(EMOTIONS):
        feelings_faces.append(cv2.imread(EMOJI_DIR + emotion + '.png', -1))

    video_captor = cv2.VideoCapture(0)

    emoji_face = []
    result = None

    while True:
        # 開啟攝像頭並做準備
        ret, frame = video_captor.read()
        detected_face, face_coor = format_image(frame)
        if showBox:
            if face_coor is not None:
                [x, y, w, h] = face_coor
                cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)

        if cv2.waitKey(10):
            if detected_face is not None:
                # 如果存在人臉影象，則儲存一張樣片，並進行表情識別
                tensor = image_to_tensor(detected_face)
                # 識別人臉的情緒，並計算情緒分類的概率
                result = sess.run(probs, feed_dict={face_x: tensor})
        if result is not None:
            for index, emotion in enumerate(EMOTIONS):
                # 輸出字型，內容為emotion的各個概率，顏色為綠色
                cv2.putText(frame, emotion, (10, index * 20 + 20), cv2.FONT_HERSHEY_PLAIN, 1, (0, 255, 0), 1)
                # 輸出矩形框出人臉
                cv2.rectangle(frame, (130, index * 20 + 10), (130 + int(result[0][index] * 100), (index + 1) * 20 + 4),
                              (255, 0, 0), -1)
                # 輸出對應的emoji_face
                emoji_face = feelings_faces[np.argmax(result[0])]
                emoji_face = cv2.resize(emoji_face, (120, 120))

            for c in range(0, 3):
                frame[300:420, 10:130, c] = emoji_face[:, :, c] * (emoji_face[:, :, 2] / 255.0) + frame[200:320, 10:130,
                                            c] * (1.0 - emoji_face[:, :, 2] / 255.0)
        cv2.imshow('face', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break


def main(CHECKPOINT_DIR):
    if True:
        demo(CHECKPOINT_DIR)


if __name__ == '__main__':
    CHECKPOINT_DIR = './files/ckpt'
    main(CHECKPOINT_DIR)

直接執行上述程式碼，即可開啟攝像頭實現人臉表情的監測。我自己只訓練了5000次，感覺模型的效果並不好，有一些表情目前還無法準確識別。後續我還會進行更深一步研究。

四、分析總結

1.自己採集資料時一定要注意向左看和向右看的影象不能進行映象處理。

2.模型效果目前並不好，後續我覺得可以新增更多的資料量進行訓練。

3.整個檔案的所有結構為：

-- get_image.py            # 爬取資料集的程式
-- img_preprocessing.py    # 人臉資料裁剪及其預處理
-- img_augument.py         # 資料增廣程式
-- make_label.py           # 製作人臉標籤，生成txt的程式
-- read_data.py            # 利用list.txt讀取影象資料及標籤的程式
-- train_model.py          # 利用deepNN訓練模型
-- test.py                 # 測試程式，利用攝像頭實時判斷人臉表情
-- haarcascade_frontalface_alt.xml
-- data                    # 處理好的資料集
    |------ list.txt
    |------ chijing
                |------ img01.jpg
                |------ ......
    |------ daku
                |------ img01.jpg
                |------ ......
    |------ gaoxing
                |------ img01.jpg
                |------ ......
    |------ juezui
                |------ img01.jpg
                |------ ......
    |------ zhoumei
                |------ img01.jpg
                |------ ......
    |------ taitou
                |------ img01.jpg
                |------ ......
    |------ ditou
                |------ img01.jpg
                |------ ......
    |------ xiangzuokan
                |------ img01.jpg
                |------ ......
    |------ xiangyoukan
                |------ img01.jpg
                |------ ......
    |------ youyu
                |------ img01.jpg
                |------ ......

深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

一、背景介紹

二、資料集準備

1. haarcascade_frontalface_default.xml檔案

2.誇張人臉表情資料集

三、模型實現

1.製作資料標籤

2.批量讀取資料

3.訓練模型

4.模型測試

四、分析總結

深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

如何在 GPU 深度學習雲服務裡，使用自己的資料集？

從零開始製作基於Unity引擎的寶石消消樂——開篇設計（一）

動手學深度學習(三)——丟棄法(從零開始)

動手學深度學習(一)——線性迴歸從零開始

Pytorch打怪路（三）Pytorch建立自己的資料集2

Pytorch打怪路（三）Pytorch建立自己的資料集1

R-FCN在linux下的配置（py-R-FCN）並訓練自己的資料集

【計算機視覺】【神經網路與深度學習】YOLO v2 detection訓練自己的資料

深度學習中常見的打標籤工具和資料集集合

pytorch人臉識別——自己製作資料集

caffe自己製作資料集的時候出現的問題，及解決方法

（補充）趕鴨子上架學D3.jsdataenter的（二）---data，datum，update，enter，exit基礎概念（b站從零開始畫圖表學習筆記，感謝up主睿小狼）

springmvc學習筆記（一） -- 從零搭建，基礎入門

優秀開源軟件學習系列（一）——從零學習Spring4以及學習方法分享

Docker學習總結（13）——從零開始搭建Jenkins+Docker自動化整合環境

深度學習筆記二：多層感知機（MLP）與神經網路結構

深度學習Caffe實戰筆記（20）Windows平臺 Faster-RCNN 訓練自己的資料集

<吳恩達老師深度學習筆記二>第一周，深度學習介紹（未完待續）

<吳恩達老師深度學習筆記二>第一週，深度學習介紹（未完待續）

深度學習（二）——從零自己製作資料集到利用deepNN實現誇張人臉表情的實時監測（tensorflow實現）

一、背景介紹

二、資料集準備

1. haarcascade_frontalface_default.xml檔案

2.誇張人臉表情資料集

三、模型實現

1.製作資料標籤

2.批量讀取資料

3.訓練模型

4.模型測試

四、分析總結

相關推薦