UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

Publication Type:
Journal Article
Citation:
IEEE Transactions on Knowledge and Data Engineering, 2018, 30 (5), pp. 992 - 1006
Issue Date:
2018-05-01
Filename Description Size
08126261.pdfPublished Version1.18 MB
© 1989-2012 IEEE. SimRank is an important measure of vertex-pair similarity according to the structure of graphs. Although progress has been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, existing methods may require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, UniWalk, to enable the fast top- $k$ SimRank computation over large undirected graphs. UniWalk directly locates the top- $k$ similar vertices for any single source vertex $u$ via $R$ sampling paths originating from $u$ , which avoids selecting candidate vertex set $\mathcal{C}$ and the following $O(|\mathcal{C}|R)$ bidirectional sampling paths. We also devise a path enumeration strategy to improve the SimRank precision by using path probabilities instead of path frequencies when sampling, a space-efficient method to reduce intermediate results, and a path-sharing strategy to lower the redundant path sampling cost for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude.