1. 程式人生 > >MTCNN與facenet實現實時人臉識別

MTCNN與facenet實現實時人臉識別

MTCNN+facenet實現實時人臉識別

整體思路

利用MTCNN進行人臉框提取,將提取後的人臉框送入facenet中提取出embedding,利用SVM對embedding進行分類,整個過程以視訊提取幀作為輸入,實現了利用攝像頭實時進行人臉檢測識別的功能。
mtcnn採用https://github.com/AITTSMD實現,facenet採用https://github.com/davidsandberg實現,程式是在facenet環境基礎上實現。

步驟1–框架搭建

下載MTCNN的程式及模型(https://github.com/AITTSMD/MTCNN-Tensorflow)
下載facenet的程式及模型(https://github.com/davidsandberg/facenet)
對於facenet的程式,我們只用到了src/facenet.py程式,因此您可以僅僅下載該python檔案;對於模型,github上作者給的模型連結下載不了(我自身原因),可以自行上網搜尋模型下載。

將facenet.py檔案拷貝至MTCNN-Tensorflow-master/test內(mtcnn中的資料夾);

在MTCNN-Tensorflow-master/test資料夾下新建一個RealtimeIdentification.py檔案,本文中所有程式實現都在此py檔案內進行,匯入一些需要用到的包

import sys
sys.path.append("../")
import tensorflow as tf
import numpy as np
import argparse
import facenet
import os
import sys
import math
import pickle
from scipy import misc
from Detection.MtcnnDetector import MtcnnDetector
from Detection.detector import Detector
from Detection.fcn_detector import
FcnDetector from train_models.mtcnn_model import P_Net, R_Net, O_Net import cv2 from sklearn.svm import SVC

步驟2–人臉資料庫構造

拍攝所要識別的人臉圖片,每個人臉拍8張不同角度的照片(當然也可以更多照片),將人臉的名字以資料夾的形式儲存。我共收集了五個人的人臉資訊,每個人對應一個資料夾,如下圖所示:
資料夾內的照片如下圖所示:
在這裡插入圖片描述因為facenet的圖片輸入尺寸為160×160×3,因此圖片的尺寸要設定為160×160×3。
人臉圖片收集完畢之後,便可以利用facenet來構造我們的人臉資料庫,資料庫中存放的為每個人臉的embedding資訊(即每個人臉用1×512維特徵進行表示)。

在RealtimeIdentification.py檔案內建立構造資料庫的函式,如下所示

// An highlighted block
def face2database(picture_path,model_path,database_path,batch_size=90,image_size=160):
    #提取特徵到資料庫
    #picture_path為人臉資料夾的所在路徑
    #model_path為facenet模型路徑
    #database_path為人臉資料庫路徑
    with tf.Graph().as_default():
        with tf.Session() as sess:
            dataset = facenet.get_dataset(picture_path)
            paths, labels = facenet.get_image_paths_and_labels(dataset)
            print('Number of classes: %d' % len(dataset))
            print('Number of images: %d' % len(paths))
            # Load the model
            print('Loading feature extraction model')
            facenet.load_model(model_path)
            # Get input and output tensors
            images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
            embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
            phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
            embedding_size = embeddings.get_shape()[1]
            # Run forward pass to calculate embeddings
            print('Calculating features for images')
            nrof_images = len(paths)
            nrof_batches_per_epoch = int(math.ceil(1.0*nrof_images / batch_size))
            emb_array = np.zeros((nrof_images, embedding_size))
            for i in range(nrof_batches_per_epoch):
                start_index = i*batch_size
                end_index = min((i+1)*batch_size, nrof_images)
                paths_batch = paths[start_index:end_index]
                images = facenet.load_data(paths_batch, False, False,image_size)
                feed_dict = { images_placeholder:images, phase_train_placeholder:False }
                emb_array[start_index:end_index,:] = sess.run(embeddings, feed_dict=feed_dict)
            np.savez(database_path,emb=emb_array,lab=labels)
            print("資料庫特徵提取完畢!")
            #emb_array裡存放的是圖片特徵,labels為對應的標籤
 if __name__ == "__main__":
    picture_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_database"
    model_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/20180408-102900"
    database_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/Database.npz"
    SVCpath="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/SVCmodel.pkl"
    face2database(picture_path,model_path,database_path)
    ClassifyTrainSVC(database_path,SVCpath)
    #RTrecognization(model_path,SVCpath,database_path)

執行結束之後便可以在MTCNN-Tensorflow-master/test/中生成Database.npz的檔案。

步驟3–訓練SVM分類器

當MTCNN提取人臉框送入facenet提取特徵之後,需要利用特徵進行人臉分類,採用SVM分類器進行分類,程式如下所示

def ClassifyTrainSVC(database_path,SVCpath):
    #database_path為人臉資料庫
    #SVCpath為分類器儲存的位置
    Database=np.load(database_path)
    name_lables=Database['lab']
    embeddings=Database['emb']
    name_unique=np.unique(name_lables)
    labels=[]
    for i in range(len(name_lables)):
        for j in range(len(name_unique)):
            if name_lables[i]==name_unique[j]:
                labels.append(j)
    print('Training classifier')
    model = SVC(kernel='linear', probability=True)
    model.fit(embeddings, labels)
    with open(SVCpath, 'wb') as outfile:
        pickle.dump((model,name_unique), outfile)
        print('Saved classifier model to file "%s"' % SVCpath)
if __name__ == "__main__":
    picture_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_database"
    model_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/20180408-102900"
    database_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/Database.npz"
    SVCpath="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/SVCmodel.pkl"
    #face2database(picture_path,model_path,database_path)
    ClassifyTrainSVC(database_path,SVCpath)
    #RTrecognization(model_path,SVCpath,database_path)

執行結束之後便可以在test/face_models/facenet/中生成SVCmodel.pkl模型檔案。

步驟4–實時人臉檢測識別

def RTrecognization(facenet_model_path,SVCpath,database_path):
    #facenet_model_path為facenet模型路徑
    #SVCpath為SVM分類模型路徑
    #database_path為人臉庫資料
    with tf.Graph().as_default():
        with tf.Session() as sess:
            # Load the model
            print('Loading feature extraction model')
            facenet.load_model(facenet_model_path)
            with open(SVCpath, 'rb') as infile:
                    (classifymodel, class_names) = pickle.load(infile)
            print('Loaded classifier model from file "%s"' % SVCpath)

            # Get input and output tensors
            images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
            embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
            phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
            embedding_size = embeddings.get_shape()[1]
            Database=np.load(database_path)

            test_mode = "onet"
            thresh = [0.9, 0.6, 0.7]
            min_face_size = 24
            stride = 2
            slide_window = False
            shuffle = False
            #vis = True
            detectors = [None, None, None]
            prefix = ['../data/MTCNN_model/PNet_landmark/PNet', '../data/MTCNN_model/RNet_landmark/RNet', '../data/MTCNN_model/ONet_landmark/ONet']
            epoch = [18, 14, 16]
            model_path = ['%s-%s' % (x, y) for x, y in zip(prefix, epoch)]
            PNet = FcnDetector(P_Net, model_path[0])
            detectors[0] = PNet
            RNet = Detector(R_Net, 24, 1, model_path[1])
            detectors[1] = RNet
            ONet = Detector(O_Net, 48, 1, model_path[2])
            detectors[2] = ONet
            mtcnn_detector = MtcnnDetector(detectors=detectors, min_face_size=min_face_size,
                               stride=stride, threshold=thresh, slide_window=slide_window)
            video_capture = cv2.VideoCapture(0)
            # video_capture.set(3, 340)
            # video_capture.set(4, 480)
            video_capture.set(3, 800)
            video_capture.set(4, 800)
            corpbbox = None
            while True:
                 t1 = cv2.getTickCount()
                 ret, frame = video_capture.read()
                 if ret:
                    image = np.array(frame)
                    img_size=np.array(image.shape)[0:2]
                    boxes_c,landmarks = mtcnn_detector.detect(image)
                    # print(boxes_c.shape)
                    # print(boxes_c)
                    # print(img_size)
                    t2 = cv2.getTickCount()
                    t = (t2 - t1) / cv2.getTickFrequency()
                    fps = 1.0 / t
                    for i in range(boxes_c.shape[0]):
                        bbox = boxes_c[i, :4]#檢測出的人臉區域,左上x,左上y,右下x,右下y
                        score = boxes_c[i, 4]#檢測出人臉區域的得分
                        corpbbox = [int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])]
                        
                        x1=np.maximum(int(bbox[0])-16,0)
                        y1=np.maximum(int(bbox[1])-16,0)
                        x2=np.minimum( int(bbox[2])+16,img_size[1])
                        y2=np.minimum( int(bbox[3])+16,img_size[0])
                        crop_img=image[y1:y2,x1:x2]
                        scaled=misc.imresize(crop_img,(160,160),interp='bilinear')
                        img=load_image(scaled,False, False,160)
                        img=np.reshape(img,(-1,160,160,3))
                        feed_dict = { images_placeholder:img, phase_train_placeholder:False }
                        embvecor=sess.run(embeddings, feed_dict=feed_dict)
                        embvecor=np.array(embvecor)
                        #利用人臉特徵與資料庫中所有人臉進行一一比較的方法
                        # tmp=np.sqrt(np.sum(np.square(embvecor-Database['emb'][0])))
                        # tmp_lable=Database['lab'][0]
                        # for j in range(len(Database['emb'])):
                        #     t=np.sqrt(np.sum(np.square(embvecor-Database['emb'][j])))
                        #     if t<tmp:
                        #         tmp=t
                        #         tmp_lable=Database['lab'][j]
                        # print(tmp)

                        #利用SVM對人臉特徵進行分類
                        predictions = classifymodel.predict_proba(embvecor)
                        best_class_indices = np.argmax(predictions, axis=1)
                        tmp_lable=class_names[best_class_indices]
                        best_class_probabilities = predictions[np.arange(len(best_class_indices)), best_class_indices]
                        print(best_class_probabilities)
                        if best_class_probabilities<0.3:
                            tmp_lable="others"
                        cv2.rectangle(frame, (corpbbox[0], corpbbox[1]),
                          (corpbbox[2], corpbbox[3]), (255, 0, 0), 1)
                        cv2.putText(frame, '{0}'.format(tmp_lable), (corpbbox[0], corpbbox[1] - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                        (0, 0, 255), 2)
                    cv2.putText(frame, '{:.4f}'.format(t) + " " + '{:.3f}'.format(fps), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (255, 0, 255), 2)
                    for i in range(landmarks.shape[0]):
                        for j in range(len(landmarks[i])//2):
                            cv2.circle(frame, (int(landmarks[i][2*j]),int(int(landmarks[i][2*j+1]))), 2, (0,0,255))            
        # time end
                    cv2.imshow("", frame)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
                 else:

                    print('device not find')
                    break
            video_capture.release()
            cv2.destroyAllWindows()

if __name__ == "__main__":
    picture_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_database"
    model_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/20180408-102900"
    database_path="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/Database.npz"
    SVCpath="/home/zhangx/CVstudy/RealTime/MTCNN-Tensorflow-master/test/face_models/facenet/SVCmodel.pkl"
    #face2database(picture_path,model_path,database_path)
    #ClassifyTrainSVC(database_path,SVCpath)
    RTrecognization(model_path,SVCpath,database_path)

起初我是用MTCNN檢測出的人臉經過facenet提取embedding後將其與人臉庫中所有特徵進行比較,選取歐式距離最小的人臉庫label作為人臉框的label,不過在除錯的時候發現這種方法速度較慢,且隨著人臉資料庫的資料的增大,速度會越來越慢,做不到實時性,因此改用例一個分類模型放在最後用於分類。
利用SVM模型進行分類時,以得分最高的類別作為分類類別,在此之外,設定了一個閾值0.3,預測人臉類別低於0.3的將其判定為“others“,即非資料庫人物。
最終效果如下圖所示,每秒識別5~6張圖片,基本上可以做到實時性。
在這裡插入圖片描述在這裡插入圖片描述