Share
Face detection is a prerequisite when you need to go deeper into any problem or task for facial processing. It’s true, zoning as well as accurately determining the position of the object aims to eliminate the disturbance and the non-object image, which is significantly increasing the accuracy of tasks such as analysis, classification, and forecasting.
Related articles: Face recognition in the face attendance machine.
Currently, there are many methods for performing facial recognition, but basically it takes three steps:
Face detection, proceed to filter out the faces that appear in the image.
From the facial images, extract the most basic features of the face.
Conduct facial characterization matching with the database to give the user an identification.
Face detection is a sub-problem in the Object Detection problem. Face detection is the process of determining the area of the image that contains the face that appears in the image. The input of almost every face detection algorithm is an image. The output of the algorithm is the area of the image containing a rectangular face that can be represented by 4 points (or 2 points and length, width) accompanied by the probability of the face in that area.
Output description of the face detection algorithm
2.2.1 Two Stage
Typical two-stage detection algorithms such as RCNN, Fast-RCNN, and Faster-RCNN. The so-called two-stage is due to the way the model processes to extract areas that possibly contain objects from the image.
For example, with Faster-RCNN, in stage 1, the image will be given a sub-network called RPN (Region Proposal Network) with the task of extracting areas on the image that can contain objects based on anchors. After obtaining the characteristic areas from the RPN, the Faster-RCNN model will continue to classify objects and locate them by dividing them into two branches at the end of the model (Object classification & bound box regression).
2.2.2 One Stage
Another approach is One-stage Object Detection with some typical models such as SSD, Yolo, and RetinaNet. It's called a one-stage because, in the design of the model, there is no selection of characteristic areas (object-capable areas) such as Faster-RCNN's RPN.
One-stage OD models treat the object localization as a regression problem (with 4 offset coordinates, e.g. x, y, w, h) and also rely on pre-defining boxes called anchors to do that. These models often have faster predictive speeds. However, the "accuracy" of the model is often inferior to the two-stage object detection. Of course, some one-stage models are a little better than the two-stages such as Retina-Net with the design of the network under the FPN (Feature Pyramid Network) and Focal Loss.
You may be interested in: Face Mask Detection | Application in Covid | Face Masks Detection.
2.3.1 Intersection Over Union (IOU)
IoU is the ratio between the intersection and compound of the predicted area and the real object area
In there:
Area of Overlap is the area of intersection between the predicted bounding box and ground truth.
The area of Union consists of a combination of predicted bounding box and ground truth.
2.3.2 Precision and Recall
The measured IoU result will usually have a value in the range (0.1) with each detection having its own value. To determine whether it is a wrong prediction or not, we will need to rely on a given threshold, if the IOU is greater or equal to the threshold, we will define the bounding box that will contain the object to be searched for and vice versa.
True/False Positive/Negative in Object Detection:
True Positive: A predictive model is an object (Positive) and is an object (True)
False Positive: The predictive model is an object (positive), but in fact, the bounding box does not contain any objects to be identified.
False Negative: The bounding box defined does not contain an object but it is wrong.
Precision is the parameter that represents the correct prediction ratio to the total prediction of the model while Recall is the parameter that represents the correct prediction ratio to the total ground truth. Therefore, in the Face Detection problem, precision and recall are shown as follows.
2.3.3. mAP (mean Average Precision)
mAP is simply the average AP score of the n class, defined with the formula
in there
Currently, there are many public datasets with extremely large sizes for model training, as well as a variety of test datasets. These datasets are widely used, as scales for evaluating and comparing the effectiveness of Face Detection models:
* Wider Face datasets: includes wider face hard, medium, and easy
* Face Detection Data Set and Benchmark (FDDB)
Face Detection is a basic problem but is extremely important for face analysis tasks. Therefore, facial detection algorithms have always been prioritized for continuous development and optimization, so now we have achieved certain results that can be mentioned such as:
+ Multi-task Cascaded Convolutional Network (MTCNN) - https://github.com/ipazc/mtcnn
+ RetinaFace - https://github.com/deepinsight/insightface
+ Dual Shot Face Detector - https://github.com/Tencent/FaceDetection-DSFD
+ FaceBoxes - https://github.com/sfzhang15/FaceBoxes
It can be said that most of the facial detection models used on timekeepers today have achieved very good accuracy in most cases. In particular, the model that Rabiloo chooses and uses achieves good performance in terms of time, which can run optimally on low-profile devices, especially embedded circuit devices with extremely cheap costs. Therefore, we can spend more resources on other tasks such as facial recognition, gender, age, or emotional analysis... Furthermore, the team also handled and captured faces with difficult angles or wearing face masks, and used techniques to filter off the disturbance, significantly increasing the accuracy of the model.
Share