Learning deep facial expression features from image and optical flow sequences using 3D CNN

Publication Type:
Journal Article
Citation:
Visual Computer, 2018, 34 (10), pp. 1461 - 1475
Issue Date:
2018-10-01
Metrics:
Full metadata record
Files in This Item:
Filename Description Size
Zhao2018_Article_LearningDeepFacialExpressionFe.pdfPublished Version1.92 MB
Adobe PDF
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature. Facial expression is highly correlated with the facial motion. According to whether the temporal information of facial motion is used or not, the facial expression features can be classified as static and dynamic features. The former, which mainly includes the geometric features and appearance features, can be extracted by convolution or other learning filters; the latter, which are aimed to model the dynamic properties of facial motion, can be calculated through optical flow or other methods, respectively. When 3D convolutional neural networks (CNNs) are introduced, the extraction of two different types of features mentioned above becomes easy. In this paper, one 3D CNN architecture is presented to learn the static and dynamic features from facial image sequences and extract high-level dynamic features from optical flow sequences. Two types of dense optical flow, which contain the tracking information of facial muscle movement, are calculated according to different image pair construction methods. One is the common optical flow, and the other is an enhanced optical flow which is called accumulative optical flow. Four components of each type of optical flow are used in experiments. Three databases, two acted databases and one nearly realistic database, are selected to conduct the experiments. The experiments on the two acted databases achieve state-of-the-art accuracy, and indicate that the vertical component of optical flow has an advantage over other components in recognizing facial expression. The experimental results on the three selected databases show that more discriminative features can be learned from image sequences than from optical flow or accumulative optical flow sequences, and the accumulative optical flow contains more motion information than optical flow if the frame distance of the image pairs used to calculate them is not too large.
Please use this identifier to cite or link to this item: