SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieva, 2012, pp. 365 - 374
Issue Date:
2012-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013005458OK.pdf579.92 kB
Adobe PDF
SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks.
Please use this identifier to cite or link to this item: