Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions

Publication Type:
Conference Proceeding
IEEE 13th International Conference on Data Mining, 2013, pp. 917 - 926
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013003261OK.pdf1.96 MB
Adobe PDF
Abstract Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption , which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low- dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non- iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank- k matrix by solving O ( k log k ) sub- problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
Please use this identifier to cite or link to this item: