Deciphering True Emotions: Micro-Expression Detection and Recognition using Deep Nets

Takalkar, Madhumita Abhijeet

Deciphering True Emotions: Micro-Expression Detection and Recognition using Deep Nets

Takalkar, Madhumita Abhijeet

Permalink

Publication Type:: Thesis
Issue Date:: 2020

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (160.61 kB)

Adobe PDF

Download thesisAdobe PDF (6.16 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Takalkar, Madhumita Abhijeet
dc.date.accessioned	2020-11-10T23:20:58Z
dc.date.available	2020-11-10T23:20:58Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/10453/143877
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Micro-expressions are anticipated as the outcome of deliberate manipulation or involuntary repression of emotions when an individual feels emotion but tries to conceal the facial movements. The micro-expression interpretation tends to recognise a person’s deceit and actual mental state. Therefore, micro-expression detection and recognition has significant opportunities for emotion analysis in psychotherapy, forensics, border protection, and negotiations, among others. Since such gestures are quick and hard to spot with the naked eyes, the inclination towards automated micro-expression recognition is an obvious step forward in the domain. Micro-expression research has drawn various interests within the computer vision field notable in localisation, magnification and recognition. Earlier studies primarily implemented single handcraft descriptors and classifiers for recognising micro-expressions. Modern techniques emphasise on deploying Convolutional Neural Networks (CNNs) or hybrid strategies that integrate handcraft descriptors and CNNs. Owing to the existence of a few datasets, the recognition of micro-expressions is still a concern. Nevertheless, efficiency is often influenced by the feature selection and training approach. Our work, presented in this thesis, introduces various approaches that we have developed to detect and recognise facial micro-expressions using deep networks. In the initial stages of this work we design a dual-stream model with attention networks for the task of micro-expression detection from images. We implement Local- and Global-level Attention Networks (LGAttNet) to concentrate on local facial regions as well as full face to boost the chances of extracting relevant micro-expression features. Unlike previous detection methods where frame difference is calculated to detect micro-expressions, our framework uses attention network to focus on various parts of a face to identify the presence of the micro-expression. We developed LGAttNet to be a supervised detection framework where a traditional Artificial Neural Network (ANN) is trained as a binary classifier. LGAttNet is a novel documented approach that utilises attention network for micro-expression detection from image and video frame sequence. The next stage of this thesis focuses on recognising micro-expression from an image using CNN. We propose to implement a CNN network by performing fine-tuning on a pre-trained CNN network. Fine-tuning is carried out to retrain the last convolutional layer of the CNN network to be able to learn appropriate micro-expression features and predict the micro-expression classes accurately. This fine-tuned CNN network gained acceptable accuracy for recognising micro-expressions from image frames. Thirdly, we extend the outcome of this stage to be implemented on video data; hence we explore the approach of combining handcrafted descriptors with the CNN derived features. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and VGGFace CNN network are combined in late fusion technique to extract a comprehensive feature representation of the video. Softmax and SVM are trained for classification. The employed hybrid approach is one of the first attempts to implement handcrafted descriptors and deep features for micro-expression recognition. Finally, we consider the factor of gender affecting the tendency to express micro-expressions. We have built a multi-task learning architecture with two streams extracting different features to achieve the same task of micro-expression recognition based on gender, GEME. We incorporated a dynamic image concept to convert a video into a single frame, and gender features and micro-expression features are added at each level and given to the micro-expression stream. Inclusion of the gender features with the micro-expression features elevates the feature details respective to the individual participant, and the network learns unique gender features while extracting micro-expression features. Concisely, we have introduced four novel concepts for micro-expression detection and recognition. The work described in this thesis establishes a connection between computer vision and psychotherapy, and aids to expedite the micro-expression analysis process for quick assessment wherever necessary.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/143877/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Deciphering True Emotions: Micro-Expression Detection and Recognition using Deep Nets	en_AU
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Micro-expressions are anticipated as the outcome of deliberate manipulation or involuntary repression of emotions when an individual feels emotion but tries to conceal the facial movements. The micro-expression interpretation tends to recognise a person’s deceit and actual mental state. Therefore, micro-expression detection and recognition has significant opportunities for emotion analysis in psychotherapy, forensics, border protection, and negotiations, among others. Since such gestures are quick and hard to spot with the naked eyes, the inclination towards automated micro-expression recognition is an obvious step forward in the domain. Micro-expression research has drawn various interests within the computer vision field notable in localisation, magnification and recognition. Earlier studies primarily implemented single handcraft descriptors and classifiers for recognising micro-expressions. Modern techniques emphasise on deploying Convolutional Neural Networks (CNNs) or hybrid strategies that integrate handcraft descriptors and CNNs. Owing to the existence of a few datasets, the recognition of micro-expressions is still a concern. Nevertheless, efficiency is often influenced by the feature selection and training approach. Our work, presented in this thesis, introduces various approaches that we have developed to detect and recognise facial micro-expressions using deep networks. In the initial stages of this work we design a dual-stream model with attention networks for the task of micro-expression detection from images. We implement Local- and Global-level Attention Networks (LGAttNet) to concentrate on local facial regions as well as full face to boost the chances of extracting relevant micro-expression features. Unlike previous detection methods where frame difference is calculated to detect micro-expressions, our framework uses attention network to focus on various parts of a face to identify the presence of the micro-expression. We developed LGAttNet to be a supervised detection framework where a traditional Artificial Neural Network (ANN) is trained as a binary classifier. LGAttNet is a novel documented approach that utilises attention network for micro-expression detection from image and video frame sequence. The next stage of this thesis focuses on recognising micro-expression from an image using CNN. We propose to implement a CNN network by performing fine-tuning on a pre-trained CNN network. Fine-tuning is carried out to retrain the last convolutional layer of the CNN network to be able to learn appropriate micro-expression features and predict the micro-expression classes accurately. This fine-tuned CNN network gained acceptable accuracy for recognising micro-expressions from image frames. Thirdly, we extend the outcome of this stage to be implemented on video data; hence we explore the approach of combining handcrafted descriptors with the CNN derived features. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and VGGFace CNN network are combined in late fusion technique to extract a comprehensive feature representation of the video. Softmax and SVM are trained for classification. The employed hybrid approach is one of the first attempts to implement handcrafted descriptors and deep features for micro-expression recognition. Finally, we consider the factor of gender affecting the tendency to express micro-expressions. We have built a multi-task learning architecture with two streams extracting different features to achieve the same task of micro-expression recognition based on gender, GEME. We incorporated a dynamic image concept to convert a video into a single frame, and gender features and micro-expression features are added at each level and given to the micro-expression stream. Inclusion of the gender features with the micro-expression features elevates the feature details respective to the individual participant, and the network learns unique gender features while extracting micro-expression features. Concisely, we have introduced four novel concepts for micro-expression detection and recognition. The work described in this thesis establishes a connection between computer vision and psychotherapy, and aids to expedite the micro-expression analysis process for quick assessment wherever necessary.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143877