SimFusion+: Extending SimFusion towards efficient estimation on large and dynamic networks

Yu, W; Lin, X; Zhang, W; Zhang, Y; Le, J

SimFusion+: Extending SimFusion towards efficient estimation on large and dynamic networks

Yu, W Lin, X Zhang, W Zhang, Y

Le, J

Permalink

Publication Type:: Conference Proceeding
Citation:: SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 365 - 374
Issue Date:: 2012-09-28

Closed Access

	Filename	Description	Size
	2013005458OK.pdf		579.92 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yu, W	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Zhang, W	en_US
dc.contributor.author	Zhang, Y https://orcid.org/0000-0002-2674-1638	en_US
dc.contributor.author	Le, J	en_US
dc.date.issued	2012-09-28	en_US
dc.identifier.citation	SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 365 - 374	en_US
dc.identifier.isbn	9781450316583	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28952
dc.description.abstract	SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks. © 2012 ACM.	en_US
dc.relation.ispartof	SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval	en_US
dc.relation.isbasedon	10.1145/2348283.2348334	en_US
dc.title	SimFusion+: Extending SimFusion towards efficient estimation on large and dynamic networks	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
dc.location.activity	Portland, USA	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn3) time and O(n2) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks. © 2012 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28952