Robust and reliable facial landmark detection under challenging conditions

Publication Type:
Thesis
Issue Date:
2024
Full metadata record
Facial landmark detection is crucial technology for many downstream tasks, such as talking head, face editing, facial emotion recognition and face recognition. Despite the recent progress brought by deep learning, there are still many challenges in this field that render the robustness of corresponding algorithms fragile in real-world scenarios. In this research, we primarily focus on improving the robustness for facial landmark detection algorithms from three different aspects. We first improve the robustness and reliability of the lightweight facial landmark detection model through the facial boundaries contained in low-level features. This approach ensures that facial landmark detection maintains competitive performance on platforms with limited computational ability. Additionally, by sharing features and employing a unique training strategy, the proposed method also demonstrates superior accuracy, even with limited parameters, in other face-related tasks, such as head pose estimation and face tracking. Then, we enhance the fragile robustness of facial landmark detection under heavy occlusion through inherent relation learning and uncertainty estimation. By learning a case-dependent inherent relation between landmarks, the proposed method can localize the occluded landmarks accurately based on the regular face shape and visible landmarks. Furthermore, we have evolved the method into a coarse-to-fine framework, which starts from a statistical mean shape to target shapes with multi stages. It also estimates the uncertainty for each landmark at each stage and adjusts the receptive field for the subsequent stage. The coarse-to-fine framework, adaptive inherent relation and dynamic receptive field yields highly competitive performance on extreme occlusion conditions. Finally, we achieve zero-shot facial landmark detection for the first time through a novel paradigm, significantly improving the robustness to locate landmarks that were unseen during training. Unlike previous works that set each landmark as an independent regression target, our approach utilizes labeled landmarks as anchors to learn a mapping from a plane to human faces. With the learned mapping, our method can localize any landmark, even those unseen in the training dataset. Additionally, because the paradigm unifies the learning targets of different facial landmark datasets, we can utilize multiple datasets with varying annotation formats to develop a unified large-scale model, which significantly enhances the robustness in various challenging conditions. Extensive experiments have been carried out, and the results show that our proposed methods significantly enhance the robustness and reliability of facial landmark detection under such challenging conditions.
Please use this identifier to cite or link to this item: