Divide-and-conquer anchoring for near-separable nonnegative matrix factorization and completion in high dimensions

Zhou, T; Bian, W; Tao, D

Divide-and-conquer anchoring for near-separable nonnegative matrix factorization and completion in high dimensions

Zhou, T Bian, W Tao, D

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE International Conference on Data Mining, ICDM, 2013, pp. 917 - 926
Issue Date:: 2013-12-01

Closed Access

	Filename	Description	Size
	2013003261OK.pdf		1.96 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhou, T	en_US
dc.contributor.author	Bian, W	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2013-12-01	en_US
dc.identifier.citation	Proceedings - IEEE International Conference on Data Mining, ICDM, 2013, pp. 917 - 926	en_US
dc.identifier.issn	1550-4786	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28030
dc.description.abstract	Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks. © 2013 IEEE.	en_US
dc.relation.ispartof	Proceedings - IEEE International Conference on Data Mining, ICDM	en_US
dc.relation.isbasedon	10.1109/ICDM.2013.29	en_US
dc.title	Divide-and-conquer anchoring for near-separable nonnegative matrix factorization and completion in high dimensions	en_US
dc.type	Conference Proceeding
utslib.for	080109 Pattern Recognition and Data Mining	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Dallas, TX, USA	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks. © 2013 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28030