Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition

Ding, C; Tao, D

Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition

Ding, C Tao, D

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4), pp. 1002 - 1014
Issue Date:: 2018-04-01

Closed Access

	Filename	Description	Size
	07917252.pdf	Published Version	1.28 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ding, C	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2018-04-01	en_US
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4), pp. 1002 - 1014	en_US
dc.identifier.issn	0162-8828	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131654
dc.description.abstract	© 1979-2012 IEEE. Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low-and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.	en_US
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	en_US
dc.relation.isbasedon	10.1109/TPAMI.2017.2700390	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.mesh	Face	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Neural Networks (Computer)	en_US
dc.subject.mesh	Image Processing, Computer-Assisted	en_US
dc.subject.mesh	Video Recording	en_US
dc.subject.mesh	Databases, Factual	en_US
dc.subject.mesh	Pattern Recognition, Automated	en_US
dc.subject.mesh	Biometric Identification	en_US
dc.subject.mesh	Neural Networks, Computer	en_US
dc.title	Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	40	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	40	en_US

Abstract:

© 1979-2012 IEEE. Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low-and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131654