All-in-one: Graph processing in RDBMSs revisited

Zhao, K; Yu, JX

All-in-one: Graph processing in RDBMSs revisited

Zhao, K Yu, JX

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017, Part F127746 pp. 1165 - 1180
Issue Date:: 2017-05-09

Closed Access

	Filename	Description	Size
	p1165-zhao.pdf	Published version	936.5 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, K	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.date.issued	2017-05-09	en_US
dc.identifier.citation	Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017, Part F127746 pp. 1165 - 1180	en_US
dc.identifier.isbn	9781450341974	en_US
dc.identifier.issn	0730-8078	en_US
dc.identifier.uri	http://hdl.handle.net/10453/126832
dc.description.abstract	© 2017 ACM. To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.	en_US
dc.relation.ispartof	Proceedings of the ACM SIGMOD International Conference on Management of Data	en_US
dc.relation.isbasedon	10.1145/3035918.3035943	en_US
dc.title	All-in-one: Graph processing in RDBMSs revisited	en_US
dc.type	Conference Proceeding
utslib.citation.volume	Part F127746	en_US
utslib.for	0804 Data Format	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	Part F127746	en_US

Abstract:

© 2017 ACM. To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/126832