Divide-and-conquer anchoring for near-separable nonnegative matrix factorization and completion in high dimensions
- Publication Type:
- Conference Proceeding
- Proceedings - IEEE International Conference on Data Mining, ICDM, 2013, pp. 917 - 926
- Issue Date:
Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks. © 2013 IEEE.
Please use this identifier to cite or link to this item: