Maximum biclique search at billion scale
- ASSOC COMPUTING MACHINERY
- Publication Type:
- Journal Article
- Proceedings of the VLDB Endowment, 2020, 13, (9), pp. 1359-1372
- Issue Date:
© 2020, VLDB Endowment. Maximum biclique search, which finds the biclique with the maximum number of edges in a bipartite graph, is a fun-damental problem with a wide spectrum of applications in different domains, such as E-Commerce, social analysis, web services, and bioinformatics. Unfortunately, due to the dif-ficulty of the problem in graph theory, no practical solution has been proposed to solve the issue in large-scale real-world datasets. Existing techniques for maximum clique search on a general graph cannot be applied because the search objec-tive of maximum biclique search is two-dimensional, i.e., we have to consider the size of both parts of the biclique simul-taneously. In this paper, we divide the problem into several subproblems each of which is specified using two parameters. These subproblems are derived in a progressive manner, and in each subproblem we can restrict the search in a very small part of the original bipartite graph. We prove that a loga-rithmic number of subproblems is enough to guarantee the algorithm correctness. To minimize the computational cost, we show how to reduce significantly the bipartite graph size for each subproblem while preserving the maximum biclique satisfying certain constraints by exploring the properties of one-hop and two-hop neighbors for each vertex. We use several real datasets from various application domains, one of which contains over 300 million vertices and 1:3 billion edges, to demonstrate the high efficiency and scalability of our proposed solution. It is reported that 50% improve-ment on recall can be achieved after applying our method in Alibaba Group to identify the fraudulent transactions in their e-commerce networks. This further demonstrates the usefulness of our techniques in practice.
Please use this identifier to cite or link to this item: