Effective transfer tagging from image to video

Yang, Y; Shen, HT

Effective transfer tagging from image to video

Yang, Y

Shen, HT

Permalink

Publication Type:: Journal Article
Citation:: ACM Transactions on Multimedia Computing, Communications and Applications, 2013, 9 (2)
Issue Date:: 2013-05-01

Closed Access

	Filename	Description	Size
	a14-yang.pdf	Published Version	730.77 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Shen, HT	en_US
dc.date.issued	2013-05-01	en_US
dc.identifier.citation	ACM Transactions on Multimedia Computing, Communications and Applications, 2013, 9 (2)	en_US
dc.identifier.issn	1551-6857	en_US
dc.identifier.uri	http://hdl.handle.net/10453/116372
dc.description.abstract	Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a cross-media tunnel to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domainshift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms. © 2013 ACM.	en_US
dc.relation.ispartof	ACM Transactions on Multimedia Computing, Communications and Applications	en_US
dc.relation.isbasedon	10.1145/2457450.2457456	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Effective transfer tagging from image to video	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	9	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Software
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	9	en_US

Abstract:

Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a cross-media tunnel to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domainshift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms. © 2013 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/116372