faced: CPU Real Time face detection using Deep Learning

阿新 • • 發佈：2018-12-29

What is the problem?

There are many scenarios where a single class object detection is needed. This means that we want to detect the location of all objects that belong to a specific class in an image. For example, we could be detecting faces for a face identification system or people for pedestrian tracking.

What is more, most of the time we would like to run these models in real time. In order to achieve this, we have a feed of images providing samples at rate x and we need a model to run in less than rate x for each of the samples. Then, we can process images as soon as they are available.

The most accessible and used solution nowadays to solve this task (and many others in computer vision) is to perform

transfer learning on previously trained models (in general standard models trained on huge datasets like those found in Tensorflow Hub or in TF Object Detection API)

There are plenty of trained object detection architectures (e.g. FasterRCNN, SSD or YOLO) that achieve impressive accuracy within real-time performance running on GPUs

GPUs are expensive but necessary in the training phase. However, in inference having a dedicated GPU to achieve real-time performance is not viable. All of the general object detection models (as those mentioned above) fail to run in real time without a GPU.

Then, how can we revisit the object detection problem for single class objects to achieve real-time performance but on CPU?

Main idea: simpler tasks require less learnable features

All of the above mentioned architectures were designed to detect multiple object classes (trained on COCO or PASCAL VOC datasets). In order to be able to classify each bounding box to it’s appropriate class, these architectures require a massive amount of feature extraction. This translates to huge amount of learnable parameters, huge amount of filters, huge amount of layers. In other words, this networks are big.

If we define simpler tasks (rather than multiple-class bounding box classification) then we can think of the network needing to learn less features to perform the task. Detecting a face in an image is obviously more simple than detecting cars, people, traffic signs and dogs (all within the same model). The amount of features required by a Deep Learning model in order to recognize faces (or any single class object) will be less than the amount of features for detecting tens of classes at the same time. The required information to perform the first task is less than the latter task.

Single class object detection models will need less learnable features. Less parameters mean that the network will be smaller. Smaller networks run faster because it requires less computations.

Then, the question is: how small can we go to achieve real time performance on CPU but keeping accuracy?

This is faced main concept: building the smallest possible network to (hopefully) run in real time in CPU while keeping accuracy.

The architecture

faced is an ensemble of 2 neural networks, both implemented using Tensorflow.

Main network

faced main architecture is heavily based on YOLO’s architecture. Basically, it’s a Fully Convolutional Network (FCN) that runs a 288x288 input image through a series of convolutional and pooling layers (no other layer types are involved).

Convolutional layers are in charge of extracting space-aware features. Pooling layers increase the receptive field of consequent convolutional layers.

The architecture’s output is a 9x9 grid (versus 13x13 grid in YOLO). Each grid cell is in charge of predicting whether a face is inside that cell (versus YOLO where each cell can detect up to 5 different object).

Each grid cell has 5 associated values. The first one is the probability p of that cell containing the center of a face. The other 4 values are the (x_center, y_center, width, height) of the detected face (relative to the cell).

faced: CPU Real Time face detection using Deep Learning

What is the problem?

Main idea: simpler tasks require less learnable features

The architecture

Main network

faced: CPU Real Time face detection using Deep Learning

CPU Real-time Face Detection and Alignment-68 using MTCNN

YOLO前篇---Real-Time Grasp Detection Using Convolutional Neural Networks

經典計算機視覺論文筆記——《Robust Real-Time Face Detection》

FaceBoxes: A CPU Real-time Face Detector with High Accuracy（論文解析）

行人檢測論文筆記：Robust Real-Time Face Detection

魯棒的實時人臉檢測：Robust Real-Time Face Detection

10年後再看Robust Real-Time Face Detection(一)

10年後再看Robust Real-Time Face Detection(二) 之學習分類函式

10年後再看Robust Real-Time Face Detection(二) 之積分圖

YOLO(You Only Look Once):Real-Time Object Detection

論文閱讀筆記（六）Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

《You Only Look Once: Unified, Real-Time Object Detection》論文筆記

論文翻譯——Scalable Object Detection using Deep Neural Networks

Train C4: Real-time pedestrian detection models——C4行人檢測演算法訓練過程

C4: Real-time pedestrian detection——C4實時行人檢測演算法

論文閱讀筆記二十六：Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks（CVPR 2016）

You Only Look Once: Unified, Real-Time Object Detection 論文閱讀

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

faced: CPU Real Time face detection using Deep Learning

What is the problem?

Main idea: simpler tasks require less learnable features

The architecture

Main network

相關推薦