1. 程式人生 > >人臉對齊:DCNN的人臉關鍵點檢測

人臉對齊:DCNN的人臉關鍵點檢測

一:目標 
人臉關鍵點檢測是在人臉檢測的基礎上,對人臉上的特徵點例如眼睛、鼻子、嘴巴等進行定位。本例是使用caffe框架實現的結果,效果如下: 
這裡寫圖片描述 

二:資料來源的製作 
       因為lmdb不支援多標籤,所以這裡使用的是hdf5格式,支援多標籤。 
       卷積神經網路可以用於分類和迴歸任務,做分類任務時最後一個全連線層的輸出維度為類別數,接著Softmax層採用Softmax Loss計算損失函式,而如果做迴歸任務,最後一個全連線層的輸出維度則是要回歸的座標值的個數,採用的是歐幾裡何損失Euclidean Loss。 
       訓練卷積神經網路來回歸特徵點座標。如果只採用一個網路來做迴歸訓練的話,會發現得到的特徵點座標不夠準確,採用級聯迴歸CNN的方法,進行分段式特徵點定位,可以更快速、準確的定位人臉特徵點。如果採用更大的網路,特徵點的預測會更加準確,但耗時會增加;為了在速度和效能上找到一個平衡點,使用較小的網路,所以使用級聯的思想,先進行粗檢測,然後微調特徵點。具體思路如下圖: 
這裡寫圖片描述

1、 首先在整個人臉影象(紅色方框)上訓練一個網路來對人臉特徵點座標進行粗迴歸,實際採用的網路其輸入大小為39*39的人臉區域灰度圖,預測時可以得到特徵點的大概位置,第一層分為三波,分別是對五個點、左右眼和鼻子、鼻子和嘴巴。 
2、 設計另一個迴歸網路,以人臉特徵點(取得是level1訓練之後得到的特徵點)周圍的區域性區域影象(level2和level3中的黃色區域)作為輸入進行訓練,實際採用的網路為其輸入大小為15*15的特徵點區域性區域灰度圖,以預測到更加準確的特徵點位置。這裡level3比level2定義的輸入區域要小一點。

這裡寫圖片描述

如上圖3所示為的卷積網路結構,level1網路的輸入層使用的是39*39的單通道灰色影象,經過四個帶池化層的卷積層,最後經過全連線層,輸出一個維度為10的結果,代表5個特徵點的座標值,在最後一層是歐幾里得損失層,計算的是網路預測的座標值與真實值(都是相對值)之間的均值誤差的積累。網路結構檔案見1_F_train.prototxt。 
solver超引數檔案見1_F_solver.prototxt,選用的是CPU模式(如果有GPU資源,選用GPU即可): 
第一層訓練完成之後,得到預測結果,這是已經得到預測的特徵點位置,但可能不夠精確,接下來進入第二、三層訓練,得到更加精確的結構。第一層使用的是一個較深一點的網路,估計關鍵點的位置;第二、三層共享一個較淺一點的網路,實現高精度。 
第二層訓練,以第一層訓練得到的5個特徵點為基礎,每個特徵點做兩組資料集,即以第一組資料集特徵點為中心,區域性框大小為(2*0.18*W,2*0.18*H),其中W和H為人臉框的寬和高,並對此區域性框做隨機的微小平移使得特徵點在區域性框中的位置隨機,裁剪出一個大小為15*15的區域性框影象,第二組資料和第一組資料一樣,只是框比例取0.16(第三層的兩組資料比例為0.11、0.12,其餘和第二層一樣)。對每個特徵點,針對這兩組資料集採用同樣的網路,得到兩組模型;預測時,採用兩組模型預測的均值作為預測結果,提高預測的準確度。第二層網路程式碼見level2.py 
第二層網路配置見2_LE1_solver.prototxt。 
第二層超引數配置見2_LE1_solver.prototxt 
第三層網路程式碼見leve3.py 
第三層網路配置見3_LE1_train.prototxt 
第三層超引數配置見3_LE1_solver.prototxt

執行leve1.py 生成第一階段需要的hdf5資料來源

level1.py

#!/usr/bin/env python2.7

# coding: utf-8


import os
import time
import math
from os.path import join, exists
import cv2
import numpy as np
import h5py
from utils_common import shuffle_in_unison_scary, logger, createDir, processImage
from utils_common import getDataFromTxt
from
utils import show_landmark,flip,rotate TRAIN = '/home/tom/PycharmProjects/pythonPro/deep_landmark/cnn-face-data' OUTPUT = '/home/tom/PycharmProjects/pythonPro/deep_landmark/dataset1/train' if not exists(OUTPUT): os.mkdir(OUTPUT) assert(exists(TRAIN) and exists(OUTPUT)) def generate_hdf5(ftxt, output, fname, argument=False): data = getDataFromTxt(ftxt) F_imgs = [] F_landmarks = [] EN_imgs = [] EN_landmarks = [] NM_imgs = [] NM_landmarks = [] for (imgPath, bbox, landmarkGt) in data: img = cv2.imread(imgPath, cv2.CV_LOAD_IMAGE_GRAYSCALE) assert(img is not None) logger("process %s" % imgPath) # F f_bbox = bbox.subBBox(-0.05, 1.05, -0.05, 1.05) f_face = img[int(round(f_bbox.top)):int(round(f_bbox.bottom+1)),int(round(f_bbox.left)):int(round(f_bbox.right+1))] ## data argument if argument and np.random.rand() > -1: ### flip face_flipped, landmark_flipped = flip(f_face, landmarkGt) face_flipped = cv2.resize(face_flipped, (39, 39)) F_imgs.append(face_flipped.reshape((1, 39, 39))) F_landmarks.append(landmark_flipped.reshape(10)) ### rotation """ if np.random.rand() > 0.5: face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \ bbox.reprojectLandmark(landmarkGt), 5) landmark_rotated = bbox.projectLandmark(landmark_rotated) face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39)) F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39))) F_landmarks.append(landmark_rotated.reshape(10)) ### flip with rotation face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated) face_flipped = cv2.resize(face_flipped, (39, 39)) F_imgs.append(face_flipped.reshape((1, 39, 39))) F_landmarks.append(landmark_flipped.reshape(10)) ### rotation if np.random.rand() > 0.5: face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \ bbox.reprojectLandmark(landmarkGt), -5) landmark_rotated = bbox.projectLandmark(landmark_rotated) face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39)) F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39))) F_landmarks.append(landmark_rotated.reshape(10)) ### flip with rotation face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated) face_flipped = cv2.resize(face_flipped, (39, 39)) F_imgs.append(face_flipped.reshape((1, 39, 39))) F_landmarks.append(landmark_flipped.reshape(10)) """ f_face = cv2.resize(f_face, (39, 39)) en_face = f_face[:31, :] nm_face = f_face[8:, :] f_face = f_face.reshape((1, 39, 39)) f_landmark = landmarkGt.reshape((10)) F_imgs.append(f_face) F_landmarks.append(f_landmark) # EN # en_bbox = bbox.subBBox(-0.05, 1.05, -0.04, 0.84) # en_face = img[en_bbox.top:en_bbox.bottom+1,en_bbox.left:en_bbox.right+1] ## data argument if argument and np.random.rand() > 0.5: ### flip face_flipped, landmark_flipped = flip(en_face, landmarkGt) face_flipped = cv2.resize(face_flipped, (31, 39)).reshape((1, 31, 39)) landmark_flipped = landmark_flipped[:3, :].reshape((6)) EN_imgs.append(face_flipped) EN_landmarks.append(landmark_flipped) en_face = cv2.resize(en_face, (31, 39)).reshape((1, 31, 39)) en_landmark = landmarkGt[:3, :].reshape((6)) EN_imgs.append(en_face) EN_landmarks.append(en_landmark) # NM # nm_bbox = bbox.subBBox(-0.05, 1.05, 0.18, 1.05) # nm_face = img[nm_bbox.top:nm_bbox.bottom+1,nm_bbox.left:nm_bbox.right+1] ## data argument if argument and np.random.rand() > 0.5: ### flip face_flipped, landmark_flipped = flip(nm_face, landmarkGt) face_flipped = cv2.resize(face_flipped, (31, 39)).reshape((1, 31, 39)) landmark_flipped = landmark_flipped[2:, :].reshape((6)) NM_imgs.append(face_flipped) NM_landmarks.append(landmark_flipped) nm_face = cv2.resize(nm_face, (31, 39)).reshape((1, 31, 39)) nm_landmark = landmarkGt[2:, :].reshape((6)) NM_imgs.append(nm_face) NM_landmarks.append(nm_landmark) #imgs, landmarks = process_images(ftxt, output) F_imgs, F_landmarks = np.asarray(F_imgs), np.asarray(F_landmarks) EN_imgs, EN_landmarks = np.asarray(EN_imgs), np.asarray(EN_landmarks) NM_imgs, NM_landmarks = np.asarray(NM_imgs),np.asarray(NM_landmarks) F_imgs = processImage(F_imgs) shuffle_in_unison_scary(F_imgs, F_landmarks) EN_imgs = processImage(EN_imgs) shuffle_in_unison_scary(EN_imgs, EN_landmarks) NM_imgs = processImage(NM_imgs) shuffle_in_unison_scary(NM_imgs, NM_landmarks) # full face base = join(OUTPUT, '1_F') createDir(base) output = join(base, fname) logger("generate %s" % output) with h5py.File(output, 'w') as h5: h5['data'] = F_imgs.astype(np.float32) h5['landmark'] = F_landmarks.astype(np.float32) # eye and nose base = join(OUTPUT, '1_EN') createDir(base) output = join(base, fname) logger("generate %s" % output) with h5py.File(output, 'w') as h5: h5['data'] = EN_imgs.astype(np.float32) h5['landmark'] = EN_landmarks.astype(np.float32) # nose and mouth base = join(OUTPUT, '1_NM') createDir(base) output = join(base, fname) logger("generate %s" % output) with h5py.File(output, 'w') as h5: h5['data'] = NM_imgs.astype(np.float32) h5['landmark'] = NM_landmarks.astype(np.float32) if __name__ == '__main__': # train data h5_path = '/home/tom/PycharmProjects/pythonPro/deep_landmark/dataset1/' train_txt = join(TRAIN, 'trainImageList.txt') generate_hdf5(train_txt, OUTPUT, 'train.h5', argument=True) test_txt = join(TRAIN, 'testImageList.txt') generate_hdf5(test_txt, OUTPUT, 'test.h5') with open(join(OUTPUT, '1_F/train.txt'), 'w') as fd: fd.write(h5_path+'train/1_F/train.h5') with open(join(OUTPUT, '1_EN/train.txt'), 'w') as fd: fd.write(h5_path+'train/1_EN/train.h5') with open(join(OUTPUT, '1_NM/train.txt'), 'w') as fd: fd.write(h5_path+'train/1_NM/train.h5') with open(join(OUTPUT, '1_F/test.txt'), 'w') as fd: fd.write(h5_path+'train/1_F/test.h5') with open(join(OUTPUT, '1_EN/test.txt'), 'w') as fd: fd.write(h5_path+'train/1_EN/test.h5') with open(join(OUTPUT, '1_NM/test.txt'), 'w') as fd: fd.write(h5_path+'train/1_NM/test.h5') # Done

utils.py

# coding: utf-8
"""
    functions
"""

import os
import cv2
import numpy as np


def show_landmark(face, landmark):
    """
        view face with landmark for visualization
    """
    face_copied = face.copy().astype(np.uint8)
    for (x, y) in landmark:
        xx = int(face.shape[0]*x)
        yy = int(face.shape[1]*y)
        cv2.circle(face_copied, (xx, yy), 2, (0,0,0), -1)
    cv2.imshow("face_rot", face_copied)
    cv2.waitKey(0)


def rotate(img, bbox, landmark, alpha):
    """
        given a face with bbox and landmark, rotate with alpha
        and return rotated face with bbox, landmark (absolute position)
    """
    center = ((bbox.left+bbox.right)/2, (bbox.top+bbox.bottom)/2)
    rot_mat = cv2.getRotationMatrix2D(center, alpha, 1)
    img_rotated_by_alpha = cv2.warpAffine(img, rot_mat, img.shape)
    landmark_ = np.asarray([(rot_mat[0][0]*x+rot_mat[0][1]*y+rot_mat[0][2],
                 rot_mat[1][0]*x+rot_mat[1][1]*y+rot_mat[1][2]) for (x, y) in landmark])
    face = img_rotated_by_alpha[bbox.top:bbox.bottom+1,bbox.left:bbox.right+1]
    return (face, landmark_)


def flip(face, landmark):
    """
        flip the face and 
        exchange the eyes and mouths point
        face_flipped_by_x = output array of the same size and type as src
        landmark_ = changed landmark point
    """
    face_flipped_by_x = cv2.flip(face, 1)
    landmark_ = np.asarray([(1-x, y) for (x, y) in landmark])
    landmark_[[0, 1]] = landmark_[[1, 0]]
    landmark_[[3, 4]] = landmark_[[4, 3]]
    return (face_flipped_by_x, landmark_)

def randomShift(landmarkGt, shift):
    """
        Random Shift one time
    """
    diff = np.random.rand(5, 2)
    diff = (2*diff - 1) * shift
    landmarkP = landmarkGt + diff
    return landmarkP

def randomShiftWithArgument(landmarkGt, shift):
    """
        Random Shift more
    """
    N = 2
    landmarkPs = np.zeros((N, 5, 2))
    for i in range(N):
        landmarkPs[i] = randomShift(landmarkGt, shift)
    return landmarkPs

utils_common.py

# coding: utf-8

import os
from os.path import join, exists
import time
import cv2
import numpy as np
from cnns import getCNNs


def logger(msg):
    """
        log message
    """
    now = time.ctime()
    print("[%s] %s" % (now, msg))

def createDir(p):
    if not os.path.exists(p):
        os.mkdir(p)

def shuffle_in_unison_scary(a, b):
    rng_state = np.random.get_state()
    np.random.shuffle(a)
    np.random.set_state(rng_state)
    np.random.shuffle(b)

def drawLandmark(img, bbox, landmark):
    cv2.rectangle(img, (bbox.left, bbox.top), (bbox.right, bbox.bottom), (0,0,255), 2)
    for x, y in landmark:
        cv2.circle(img, (int(x), int(y)), 2, (0,255,0), -1)
    return img

def getDataFromTxt(txt, with_landmark=True):
    """
        Generate data from txt file
        return [(img_path, bbox, landmark)]
            bbox: [left, right, top, bottom]
            landmark: [(x1, y1), (x2, y2), ...]
    """
    dirname = os.path.dirname(txt)
    with open(txt, 'r') as fd:
        lines = fd.readlines()

    result = []
    for line in lines:
        line = line.strip()
        components = line.split(' ')
        img_path = os.path.join(dirname, components[0].replace('\\', '/')) # file path
        # bounding box, (left, right, top, bottom)
        bbox = (components[1], components[2], components[3], components[4])
        bbox = [int(_) for _ in bbox]
        # landmark
        if not with_landmark:
            result.append((img_path, BBox(bbox)))
            continue
        landmark = np.zeros((5, 2))
        for index in range(0, 5):
            rv = (float(components[5+2*index]), float(components[5+2*index+1]))
            landmark[index] = rv
        for index, one in enumerate(landmark):
            rv = ((one[0]-bbox[0])/(bbox[1]-bbox[0]), (one[1]-bbox[2])/(bbox[3]-bbox[2]))
            landmark[index] = rv
        result.append((img_path, BBox(bbox), landmark))
    return result

def getPatch(img, bbox, point, padding):
    """
        Get a patch iamge around the given point in bbox with padding
        point: relative_point in [0, 1] in bbox
    """
    point_x = bbox.x + point[0] * bbox.w
    point_y = bbox.y + point[1] * bbox.h
    patch_left = point_x - bbox.w * padding
    patch_right = point_x + bbox.w * padding
    patch_top = point_y - bbox.h * padding
    patch_bottom = point_y + bbox.h * padding
    patch = img[int(round(patch_top)): int(round(patch_bottom+1)), int(round(patch_left)): int(round(patch_right+1))]
    patch_bbox = BBox([patch_left, patch_right, patch_top, patch_bottom])
    return patch, patch_bbox


def processImage(imgs):
    """
        process images before feeding to CNNs
        imgs: N x 1 x W x H
    """
    imgs = imgs.astype(np.float32)
    for i, img in enumerate(imgs):
        m = img.mean()
        s = img.std()
        imgs[i] = (img - m) / s
    return imgs

def dataArgument(data):
    """
        dataArguments
        data:
            imgs: N x 1 x W x H
            bbox: N x BBox
            landmarks: N x 10
    """
    pass

class BBox(object):
    """
        Bounding Box of face
    """
    def __init__(self, bbox):
        self.left = bbox[0]
        self.right = bbox[1]
        self.top = bbox[2]
        self.bottom = bbox[3]
        self.x = bbox[0]
        self.y = bbox[2]
        self.w = bbox[1] - bbox[0]
        self.h = bbox[3] - bbox[2]

    def expand(self, scale=0.05):
        bbox = [self.left, self.right, self.top, self.bottom]
        bbox[0] -= int(self.w * scale)
        bbox[1] += int(self.w * scale)
        bbox[2] -= int(self.h * scale)
        bbox[3] += int(self.h * scale)
        return BBox(bbox)

    def project(self, point):
        x = (point[0]-self.x) / self.w
        y = (point[1]-self.y) / self.h
        return np.asarray([x, y])

    def reproject(self, point):
        x = self.x + self.w*point[0]
        y = self.y + self.h*point[1]
        return np.asarray([x, y])

    def reprojectLandmark(self, landmark):
        p = np.zeros((len(landmark), 2))
        for i in range(len(landmark)):
            p[i] = self.reproject(landmark[i])
        return p

    def projectLandmark(self, landmark):
        p = np.zeros((len(landmark), 2))
        for i in range(len(landmark)):
            p[i] = self.project(landmark[i])
        return p

    def subBBox(self, leftR, rightR, topR, bottomR):
        leftDelta = self.w * leftR
        rightDelta = self.w * rightR
        topDelta = self.h * topR
        bottomDelta = self.h * bottomR
        left = self.left + leftDelta
        right = self.left + rightDelta
        top = self.top + topDelta
        bottom = self.top + bottomDelta
        return BBox([left, right, top, bottom])