Recognition of Emotions in User-generated Videos with Transferred Emotion Intensity Learning

Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Journal Article
IEEE Transactions on Multimedia, 2021, PP, (99), pp. 1-1
Issue Date:
Filename Description Size
frm_intensity_v0.4_final.pdfAccepted version3.78 MB
Adobe PDF
Full metadata record
Recognition of emotions in user-generated videos has attracted considerable research attention. Most existing approaches focus on learning frame-level features and fail to consider frame-level emotion intensities which are critical for video representation. In this research, we aim to extract frame-level features and emotion intensities through transferring emotional information from an image emotion dataset. To achieve this goal, we propose an end-to-end network for joint emotion recognition and intensity learning with unsupervised adversarial adaptation. The proposed network consists of a classification stream, an intensity learning stream and an adversarial adaptation module. The classification stream is used to generate pseudo intensity maps with the class activation mapping method to train the intensity learning subnetwork. The intensity learning stream is built upon an improved feature pyramid network in which features from different scales are cross-connected. The adversarial adaptation module is employed to reduce the domain difference between the source dataset and target video frames. By aligning cross domain features, we enable our network to learn on the source data while generalizing to video frames. Finally, we apply a weighted sum pooling method to frame-level features and emotion intensities to generate video-level features. We evaluate the proposed method on two benchmark datasets, i.e., VideoEmotion-8 and Ekman-6. The experimental results show that the proposed method achieves improved performance compared to previous state-of-the-art methods.
Please use this identifier to cite or link to this item: