FAST: FPGA-based subgraph matching on massive graphs

Jin, X; Yang, Z; Lin, X; Yang, S; Qin, L; Peng, Y

FAST: FPGA-based subgraph matching on massive graphs

Jin, X Yang, Z Lin, X Yang, S Qin, L

Peng, Y

Permalink

Publisher:: IEEE
Publication Type:: Journal Article
Citation:: Proceedings - International Conference on Data Engineering, 2021, 2021-April, pp. 1452-1463
Issue Date:: 2021-04-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 22 Jun 2023

Adobe PDF

Download Accepted versionAdobe PDF (785.54 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Jin, X
dc.contributor.author	Yang, Z
dc.contributor.author	Lin, X
dc.contributor.author	Yang, S
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062
dc.contributor.author	Peng, Y
dc.date.accessioned	2022-01-27T23:55:02Z
dc.date.available	2022-01-27T23:55:02Z
dc.date.issued	2021-04-01
dc.identifier.citation	Proceedings - International Conference on Data Engineering, 2021, 2021-April, pp. 1452-1463
dc.identifier.isbn	9781728191843
dc.identifier.issn	1084-4627
dc.identifier.uri	http://hdl.handle.net/10453/153688
dc.description.abstract	Subgraph matching is a basic operation widely used in many applications. However, due to its NP-hardness and the explosive growth of graph data, it is challenging to compute subgraph matching, especially in large graphs. In this paper, we aim at scaling up subgraph matching on a single machine using FPGAs. Specifically, we propose a CPU-FPGA co-designed framework. On the CPU side, we first develop a novel auxiliary data structure called candidate search tree (CST) which serves as a complete search space of subgraph matching. CST can be partitioned and fully loaded into FPGAs' on-chip memory. Then, a workload estimation technique is proposed to balance the load between the CPU and FPGA. On the FPGA side, we design and implement the first FPGA-based subgraph matching algorithm, called FAST. To take full advantage of the pipeline mechanism on FPGAs, task parallelism optimization and task generator separation strategy are proposed for FAST, achieving massive parallelism. Moreover, we carefully develop a BRAM-only matching process to fully utilize FPGA's on-chip memory, which avoids the expensive intermediate data transfer between FPGA's BRAM and DRAM. Comprehensive experiments show that FAST achieves up to 462.0x and 150.0x speedup compared with the state-of-the-art algorithm DAF and CECI, respectively. In addition, FAST is the only algorithm that can handle the billion-scale graph using one machine in our experiments.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/DP180103096
dc.relation	http://purl.org/au-research/grants/arc/FT200100787
dc.relation.ispartof	Proceedings - International Conference on Data Engineering
dc.relation.isbasedon	10.1109/ICDE51399.2021.00129
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	FAST: FPGA-based subgraph matching on massive graphs
dc.type	Journal Article
utslib.citation.volume	2021-April
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2023-06-22T00:00:00+1000Z
dc.date.updated	2022-01-27T23:54:59Z
pubs.publication-status	Published
pubs.volume	2021-April

Abstract:

Subgraph matching is a basic operation widely used in many applications. However, due to its NP-hardness and the explosive growth of graph data, it is challenging to compute subgraph matching, especially in large graphs. In this paper, we aim at scaling up subgraph matching on a single machine using FPGAs. Specifically, we propose a CPU-FPGA co-designed framework. On the CPU side, we first develop a novel auxiliary data structure called candidate search tree (CST) which serves as a complete search space of subgraph matching. CST can be partitioned and fully loaded into FPGAs' on-chip memory. Then, a workload estimation technique is proposed to balance the load between the CPU and FPGA. On the FPGA side, we design and implement the first FPGA-based subgraph matching algorithm, called FAST. To take full advantage of the pipeline mechanism on FPGAs, task parallelism optimization and task generator separation strategy are proposed for FAST, achieving massive parallelism. Moreover, we carefully develop a BRAM-only matching process to fully utilize FPGA's on-chip memory, which avoids the expensive intermediate data transfer between FPGA's BRAM and DRAM. Comprehensive experiments show that FAST achieves up to 462.0x and 150.0x speedup compared with the state-of-the-art algorithm DAF and CECI, respectively. In addition, FAST is the only algorithm that can handle the billion-scale graph using one machine in our experiments.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/153688