SimFusion+: Extending SimFusion towards efficient estimation on large and dynamic networks
- Publication Type:
- Conference Proceeding
- SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 365 - 374
- Issue Date:
SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks. © 2012 ACM.
Please use this identifier to cite or link to this item: