Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition

Zhu, X; Li, L; Zhang, W; Rao, T; Xu, M; Huang, Q; Xu, D

Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition

Zhu, X Li, L Zhang, W Rao, T Xu, M

Huang, Q Xu, D

Permalink

Publication Type:: Conference Proceeding
Citation:: IJCAI International Joint Conference on Artificial Intelligence, 2017, pp. 3595 - 3601
Issue Date:: 2017-01-01

Closed Access

	Filename	Description	Size
	0503.pdf	Published version	8.69 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Li, L	en_US
dc.contributor.author	Zhang, W	en_US
dc.contributor.author	Rao, T	en_US
dc.contributor.author	Xu, M https://orcid.org/0000-0001-9581-8849	en_US
dc.contributor.author	Huang, Q	en_US
dc.contributor.author	Xu, D https://orcid.org/0000-0003-2775-9730	en_US
dc.date.issued	2017-01-01	en_US
dc.identifier.citation	IJCAI International Joint Conference on Artificial Intelligence, 2017, pp. 3595 - 3601	en_US
dc.identifier.isbn	9780999241103	en_US
dc.identifier.issn	1045-0823	en_US
dc.identifier.uri	http://hdl.handle.net/10453/126347
dc.description.abstract	Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features within a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.	en_US
dc.relation.ispartof	IJCAI International Joint Conference on Artificial Intelligence	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US

Abstract:

Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features within a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/126347