SAANet: Siamese action-units attention network for improving dynamic facial expression recognition

Elsevier BV
Publication Type:
Journal Article
Neurocomputing, 2020, 413, pp. 145-157
Issue Date:
Filename Description Size
1-s2.0-S092523122031050X-main.pdf2.53 MB
Adobe PDF
Full metadata record
© 2020 Elsevier B.V. Facial expression recognition (FER) has a wide variety of applications ranging from human–computer interaction, robotics to health care. Although FER has made significant progress with the success of Convolutional Neural Network (CNN), it is still challenging especially for the video-based FER due to the dynamic changes in facial actions. Since the specific divergences exists among different expressions, we introduce a metric learning framework with a siamese cascaded structure that learns a fine-grained distinction for different expressions in video-based task. We also develop a pairwise sampling strategy for such metric learning framework. Furthermore, we propose a novel action-units attention mechanism tailored to FER task to extract spatial contexts from the emotion regions. This mechanism works as a sparse self-attention fashion to enable a single feature from any position to perceive features of the action-units (AUs) parts (eyebrows, eyes, nose, and mouth). Besides, an attentive pooling module is designed to select informative items over the video sequences by capturing the temporal importance. We conduct the experiments on four widely used datasets (CK+, Oulu-CASIA, MMI, and AffectNet), and also do experiment on the wild dataset AFEW to further investigate the robustness of our proposed method. Results demonstrate that our approach outperforms existing state-of-the-art methods. More in details, we give the ablation study of each component.
Please use this identifier to cite or link to this item: