A three-level framework for affective content analysis and its case studies

Publication Type:
Journal Article
Multimedia Tools and Applications, 2014, 70 (2), pp. 757 - 779
Issue Date:
Filename Description Size
Thumbnailart%3A10.1007%2Fs11042-012-1046-8.pdfPublished Version1.13 MB
Adobe PDF
Full metadata record
Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations. © 2012 Springer Science+Business Media, LLC.
Please use this identifier to cite or link to this item: