A three-level framework for affective content analysis and its case studies

Xu, M; Wang, J; He, X; Jin, JS; Luo, S; Lu, H

A three-level framework for affective content analysis and its case studies

Xu, M

Wang, J He, X

Jin, JS Luo, S Lu, H

Permalink

Publication Type:: Journal Article
Citation:: Multimedia Tools and Applications, 2014, 70 (2), pp. 757 - 779
Issue Date:: 2014-01-01

Closed Access

	Filename	Description	Size
	2012004142OK.pdf		766.98 kB	Adobe PDF	View/Open
	art%3A10.1007%2Fs11042-012-1046-8.pdf	Published Version	1.13 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, M https://orcid.org/0000-0001-9581-8849	en_US
dc.contributor.author	Wang, J	en_US
dc.contributor.author	He, X https://orcid.org/0000-0001-8962-540X	en_US
dc.contributor.author	Jin, JS	en_US
dc.contributor.author	Luo, S	en_US
dc.contributor.author	Lu, H	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	Multimedia Tools and Applications, 2014, 70 (2), pp. 757 - 779	en_US
dc.identifier.issn	1380-7501	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121706
dc.description.abstract	Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations. © 2012 Springer Science+Business Media, LLC.	en_US
dc.relation.ispartof	Multimedia Tools and Applications	en_US
dc.relation.isbasedon	10.1007/s11042-012-1046-8	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.classification	Software Engineering	en_US
dc.title	A three-level framework for affective content analysis and its case studies	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	70	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0805 Distributed Computing	en_US
dc.location.activity	Wollongong, AUSTRALIA
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Systems, Management and Leadership
pubs.organisational-group	/University of Technology Sydney/Strength - CRIN - Realtime Information Networks
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	closed_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	70	en_US

Abstract:

Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations. © 2012 Springer Science+Business Media, LLC.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/22898