Learning Robust Features for Recognition of Emotions in Images and Videos

Publication Type:
Thesis
Issue Date:
2019
Full metadata record
Today, recognition of emotions in images and videos has attracted increasing research attention. In terms of video emotion recognition, most existing approaches are based on spatial features extracted from video frames. The performance of these approaches is mainly restricted due to the broad affective gap between spatial image features and high-level emotions. To bridge the affective gap, we propose to recognize emotions with kernelized features. A polynomial kernel function is constructed based on rewritten the equation of the discrete Fourier transform as the linear kernel. Moreover, we propose to apply the sparse representation method to kernelized features to reduce the impact of noise contained in video frames. This method can further help contribute to performance improvement. In the second work, we develop a weighted sum pooling method for video emotion representation. We present an end-to-end deep network for simultaneously image emotion classification and emotion intensity map prediction. The proposed network is build based on the feature pyramid network. The class activation mapping technique is utilized to generate pseudo intensity maps to train the network. The proposed network is first trained on a large-scale image emotion dataset and then used to extracted features and intensity maps for video frames. We empirically show that this approach is effective to improve recognition performance. Recent work has shown that using local region information helps to improve image emotion recognition performance. In the third work, we develop an end-to-end deep neural network for image emotion recognition by utilizing emotion intensity. The proposed network is composed of an intensity prediction stream and a classification stream. The class activation mapping technique is used to generated pseudo intensity maps to guide the intensity prediction network for emotion intensity learning. The predicted intensity maps are integrated to the classification stream for final recognition. The two streams are trained cooperatively with each other to improve the overall performance. In the fourth work, we present a dual pattern learning network architecture with adversarial adaptation (DPLAANet). Unlike conventional networks, the proposed architecture has two input branches. The dual input structure allows the network to have a considerably large number of image pairs for training. This can help address the overfitting issue due to limited training data. Moreover, we introduce to use the adversarial training approach to reduce the domain difference between training data and test data. The experimental results show that the DPLAANets are effective for several benchmark datasets.
Please use this identifier to cite or link to this item: