Diversified top-k clique search

Yuan, L; Qin, L; Lin, X; Chang, L; Zhang, W

Diversified top-k clique search

Yuan, L Qin, L

Lin, X

Chang, L Zhang, W

Permalink

Publication Type:: Journal Article
Citation:: VLDB Journal, 2016, 25 (2), pp. 171 - 196
Issue Date:: 2016-04-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (805.78 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yuan, L	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Lin, X https://orcid.org/0000-0003-2396-7225	en_US
dc.contributor.author	Chang, L	en_US
dc.contributor.author	Zhang, W https://orcid.org/0000-0001-6572-2600	en_US
dc.date.available	2020-05-25T19:28:25Z
dc.date.issued	2016-04-01	en_US
dc.identifier.citation	VLDB Journal, 2016, 25 (2), pp. 171 - 196	en_US
dc.identifier.issn	1066-8888	en_US
dc.identifier.uri	http://hdl.handle.net/10453/43795
dc.description.abstract	© 2015, Springer-Verlag Berlin Heidelberg. Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight (Formula presented.) - (Formula presented.) , based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.	en_US
dc.relation.ispartof	VLDB Journal	en_US
dc.relation.isbasedon	10.1007/s00778-015-0408-z	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Diversified top-k clique search	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	25	en_US
utslib.for	0804 Data Format	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Life Sciences
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	25	en_US

Abstract:

© 2015, Springer-Verlag Berlin Heidelberg. Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight (Formula presented.) - (Formula presented.) , based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/43795