Multiple features but few labels? A symbiotic solution exemplified for video analysis

Ma, Z; Sebe, N; Yang, Y; Hauptmann, AG

Multiple features but few labels? A symbiotic solution exemplified for video analysis

Ma, Z Sebe, N Yang, Y

Hauptmann, AG

Permalink

Publication Type:: Conference Proceeding
Citation:: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, 2014, pp. 77 - 86
Issue Date:: 2014-01-01

Closed Access

	Filename	Description	Size
	1bf8ce40212336c41db66ebbd1ec2d9d6deb.pdf	Published version	311.41 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ma, Z	en_US
dc.contributor.author	Sebe, N	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Hauptmann, AG	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, 2014, pp. 77 - 86	en_US
dc.identifier.isbn	9781450330633	en_US
dc.identifier.uri	http://hdl.handle.net/10453/120164
dc.description.abstract	Video analysis has been attracting increasing research due to the proliferation of internet videos. In this paper, we investigate how to improve the performance on internet quality video analysis. Particularly, we work on the scenario of few labeled training videos being provided, which is less focused in multimedia. To being with, we consider how to more effectively harness the evidences from the low-level features. Researchers have developed several promising features to represent videos to capture the semantic information. However, as videos usually characterize rich semantic contents, the analysis performance by using one single feature is potentially limited. Simply combining multiple features through early fusion or late fusion to incorporate more informative cues is doable but not optimal due to the heterogeneity and different predicting capability of these features. For better exploitation of multiple features, we propose to mine the importance of different features and cast it into the learning of the classification model. Our method is based on multiple graphs from different features and uses the Riemannian metric to evaluate the feature importance. On the other hand, to be able to use limited labeled training videos for a respectable accuracy we formulate our method in a semi-supervised way. The main contribution of this paper is a novel scheme of evaluating the feature importance that is further casted into a unified framework of harnessing multiple weighted features with limited labeled training videos. We perform extensive experiments on video action recognition and multimedia event recognition and the comparison to other state-of-the-art multi-feature learning algorithms has validated the efficacy of our framework.	en_US
dc.relation.ispartof	MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia	en_US
dc.relation.isbasedon	10.1145/2647868.2654907	en_US
dc.title	Multiple features but few labels? A symbiotic solution exemplified for video analysis	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Video analysis has been attracting increasing research due to the proliferation of internet videos. In this paper, we investigate how to improve the performance on internet quality video analysis. Particularly, we work on the scenario of few labeled training videos being provided, which is less focused in multimedia. To being with, we consider how to more effectively harness the evidences from the low-level features. Researchers have developed several promising features to represent videos to capture the semantic information. However, as videos usually characterize rich semantic contents, the analysis performance by using one single feature is potentially limited. Simply combining multiple features through early fusion or late fusion to incorporate more informative cues is doable but not optimal due to the heterogeneity and different predicting capability of these features. For better exploitation of multiple features, we propose to mine the importance of different features and cast it into the learning of the classification model. Our method is based on multiple graphs from different features and uses the Riemannian metric to evaluate the feature importance. On the other hand, to be able to use limited labeled training videos for a respectable accuracy we formulate our method in a semi-supervised way. The main contribution of this paper is a novel scheme of evaluating the feature importance that is further casted into a unified framework of harnessing multiple weighted features with limited labeled training videos. We perform extensive experiments on video action recognition and multimedia event recognition and the comparison to other state-of-the-art multi-feature learning algorithms has validated the efficacy of our framework.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/120164