Detecting Facial Action Units From Global-Local Fine-Grained Expressions

Zhang, W; Li, L; Ding, Y; Chen, W; Deng, Z; Yu, X

Detecting Facial Action Units From Global-Local Fine-Grained Expressions

Zhang, W Li, L Ding, Y Chen, W Deng, Z Yu, X

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34, (2), pp. 983-994
Issue Date:: 2024-02-01

Recently Added

	Filename	Description	Size
	1675192.pdf	Published version	2.54 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is new to OPUS and is not currently available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, W
dc.contributor.author	Li, L
dc.contributor.author	Ding, Y
dc.contributor.author	Chen, W
dc.contributor.author	Deng, Z
dc.contributor.author	Yu, X https://orcid.org/0000-0002-0269-5649
dc.date.accessioned	2024-08-21T05:28:48Z
dc.date.available	2024-08-21T05:28:48Z
dc.date.issued	2024-02-01
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34, (2), pp. 983-994
dc.identifier.issn	1051-8215
dc.identifier.issn	1558-2205
dc.identifier.uri	http://hdl.handle.net/10453/180478
dc.description.abstract	Since Facial Action Unit (AU) annotations require domain expertise, common AU datasets only contain a limited number of subjects. As a result, a crucial challenge for AU detection is addressing identity overfitting. We find that AUs and facial expressions are highly associated, and existing facial expression datasets often contain a large number of identities. In this paper, we aim to utilize the expression datasets without AU labels to facilitate AU detection. Specifically, we develop a novel AU detection framework aided by the Global-Local facial Expressions Embedding, dubbed GLEE-Net. Our GLEE-Net consists of three branches to extract identity-independent expression features for AU detection. We introduce a global branch for modeling the overall facial expression while eliminating the impacts of identities. We also design a local branch focusing on specific local face regions. The combined output of global and local branches is firstly pre-trained on an expression dataset as an identity-independent expression embedding, and then finetuned on AU datasets. Therefore, we significantly alleviate the issue of limited identities. Furthermore, we introduce a 3D global branch that extracts expression coefficients through 3D face reconstruction to consolidate 2D AU descriptions. Finally, a Transformer-based multi-label classifier is employed to fuse all the representations for AU detection. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art on the widely-used DISFA, BP4D and BP4D+ datasets.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/DP220100800
dc.relation	http://purl.org/au-research/grants/arc/DE230100477
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology
dc.relation.isbasedon	10.1109/TCSVT.2023.3288903
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	4006 Communications engineering
dc.subject.classification	4009 Electronics, sensors and digital hardware
dc.subject.classification	4603 Computer vision and multimedia computation
dc.title	Detecting Facial Action Units From Global-Local Fine-Grained Expressions
dc.type	Journal Article
utslib.citation.volume	34
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	recently_added	*
dc.date.updated	2024-08-21T05:28:45Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	34
utslib.citation.issue	2

Abstract:

Since Facial Action Unit (AU) annotations require domain expertise, common AU datasets only contain a limited number of subjects. As a result, a crucial challenge for AU detection is addressing identity overfitting. We find that AUs and facial expressions are highly associated, and existing facial expression datasets often contain a large number of identities. In this paper, we aim to utilize the expression datasets without AU labels to facilitate AU detection. Specifically, we develop a novel AU detection framework aided by the Global-Local facial Expressions Embedding, dubbed GLEE-Net. Our GLEE-Net consists of three branches to extract identity-independent expression features for AU detection. We introduce a global branch for modeling the overall facial expression while eliminating the impacts of identities. We also design a local branch focusing on specific local face regions. The combined output of global and local branches is firstly pre-trained on an expression dataset as an identity-independent expression embedding, and then finetuned on AU datasets. Therefore, we significantly alleviate the issue of limited identities. Furthermore, we introduce a 3D global branch that extracts expression coefficients through 3D face reconstruction to consolidate 2D AU descriptions. Finally, a Transformer-based multi-label classifier is employed to fuse all the representations for AU detection. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art on the widely-used DISFA, BP4D and BP4D+ datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180478