Robust and reliable facial landmark detection under challenging conditions

Xia, Jiahao

Robust and reliable facial landmark detection under challenging conditions

Xia, Jiahao

Permalink

Publication Type:: Thesis
Issue Date:: 2024

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (24.41 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xia, Jiahao
dc.date.accessioned	2025-05-22T02:49:44Z
dc.date.available	2025-05-22T02:49:44Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/10453/187456
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Facial landmark detection is crucial technology for many downstream tasks, such as talking head, face editing, facial emotion recognition and face recognition. Despite the recent progress brought by deep learning, there are still many challenges in this field that render the robustness of corresponding algorithms fragile in real-world scenarios. In this research, we primarily focus on improving the robustness for facial landmark detection algorithms from three different aspects. We first improve the robustness and reliability of the lightweight facial landmark detection model through the facial boundaries contained in low-level features. This approach ensures that facial landmark detection maintains competitive performance on platforms with limited computational ability. Additionally, by sharing features and employing a unique training strategy, the proposed method also demonstrates superior accuracy, even with limited parameters, in other face-related tasks, such as head pose estimation and face tracking. Then, we enhance the fragile robustness of facial landmark detection under heavy occlusion through inherent relation learning and uncertainty estimation. By learning a case-dependent inherent relation between landmarks, the proposed method can localize the occluded landmarks accurately based on the regular face shape and visible landmarks. Furthermore, we have evolved the method into a coarse-to-fine framework, which starts from a statistical mean shape to target shapes with multi stages. It also estimates the uncertainty for each landmark at each stage and adjusts the receptive field for the subsequent stage. The coarse-to-fine framework, adaptive inherent relation and dynamic receptive field yields highly competitive performance on extreme occlusion conditions. Finally, we achieve zero-shot facial landmark detection for the first time through a novel paradigm, significantly improving the robustness to locate landmarks that were unseen during training. Unlike previous works that set each landmark as an independent regression target, our approach utilizes labeled landmarks as anchors to learn a mapping from a plane to human faces. With the learned mapping, our method can localize any landmark, even those unseen in the training dataset. Additionally, because the paradigm unifies the learning targets of different facial landmark datasets, we can utilize multiple datasets with varying annotation formats to develop a unified large-scale model, which significantly enhances the robustness in various challenging conditions. Extensive experiments have been carried out, and the results show that our proposed methods significantly enhance the robustness and reliability of facial landmark detection under such challenging conditions.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/187456/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2024 Jiahao Xia
dc.rights	au.edu.uts.lib/cph
dc.title	Robust and reliable facial landmark detection under challenging conditions	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Facial landmark detection is crucial technology for many downstream tasks, such as talking head, face editing, facial emotion recognition and face recognition. Despite the recent progress brought by deep learning, there are still many challenges in this field that render the robustness of corresponding algorithms fragile in real-world scenarios. In this research, we primarily focus on improving the robustness for facial landmark detection algorithms from three different aspects. We first improve the robustness and reliability of the lightweight facial landmark detection model through the facial boundaries contained in low-level features. This approach ensures that facial landmark detection maintains competitive performance on platforms with limited computational ability. Additionally, by sharing features and employing a unique training strategy, the proposed method also demonstrates superior accuracy, even with limited parameters, in other face-related tasks, such as head pose estimation and face tracking. Then, we enhance the fragile robustness of facial landmark detection under heavy occlusion through inherent relation learning and uncertainty estimation. By learning a case-dependent inherent relation between landmarks, the proposed method can localize the occluded landmarks accurately based on the regular face shape and visible landmarks. Furthermore, we have evolved the method into a coarse-to-fine framework, which starts from a statistical mean shape to target shapes with multi stages. It also estimates the uncertainty for each landmark at each stage and adjusts the receptive field for the subsequent stage. The coarse-to-fine framework, adaptive inherent relation and dynamic receptive field yields highly competitive performance on extreme occlusion conditions. Finally, we achieve zero-shot facial landmark detection for the first time through a novel paradigm, significantly improving the robustness to locate landmarks that were unseen during training. Unlike previous works that set each landmark as an independent regression target, our approach utilizes labeled landmarks as anchors to learn a mapping from a plane to human faces. With the learned mapping, our method can localize any landmark, even those unseen in the training dataset. Additionally, because the paradigm unifies the learning targets of different facial landmark datasets, we can utilize multiple datasets with varying annotation formats to develop a unified large-scale model, which significantly enhances the robustness in various challenging conditions. Extensive experiments have been carried out, and the results show that our proposed methods significantly enhance the robustness and reliability of facial landmark detection under such challenging conditions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/187456