Online multi-modal robust non-negative dictionary learning for visual tracking

Zhang, X; Guan, N; Tao, D; Qiu, X; Luo, Z

Online multi-modal robust non-negative dictionary learning for visual tracking

Zhang, X Guan, N Tao, D

Qiu, X Luo, Z

Permalink

Publication Type:: Journal Article
Citation:: PLoS ONE, 2015, 10 (5)
Issue Date:: 2015-05-11

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (12.42 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, X	en_US
dc.contributor.author	Guan, N	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Qiu, X	en_US
dc.contributor.author	Luo, Z	en_US
dc.date.available	2015-03-17	en_US
dc.date.issued	2015-05-11	en_US
dc.identifier.citation	PLoS ONE, 2015, 10 (5)	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121689
dc.description.abstract	© 2015 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130101457
dc.relation	http://purl.org/au-research/grants/arc/DP140102164
dc.relation.ispartof	PLoS ONE	en_US
dc.relation.isbasedon	10.1371/journal.pone.0124685	en_US
dc.subject.classification	General Science & Technology	en_US
dc.subject.mesh	Learning	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Models, Theoretical	en_US
dc.subject.mesh	Internet	en_US
dc.subject.mesh	Dictionaries as Topic	en_US
dc.title	Online multi-modal robust non-negative dictionary learning for visual tracking	en_US
dc.type	Journal Article
utslib.citation.volume	5	en_US
utslib.citation.volume	10	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	open_access
pubs.issue	5	en_US
pubs.publication-status	Published	en_US
pubs.volume	10	en_US

Abstract:

© 2015 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121689