Efficient and Effective Path Compression in Large Graphs

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Conference Proceeding
Citation:
Proceedings - International Conference on Data Engineering, 2023, 2023-April, pp. 3093-3105
Issue Date:
2023-01-01
Filename Description Size
Efficient and Effective Path Compression in Large Graphs.pdfAccepted version412.5 kB
Adobe PDF
Full metadata record
A path in a graph is a walk from one vertex to the other via edges. Many tasks for graph analytics may produce numerous paths, which record critical intermediate information or results. On the platform of Alibaba Cloud, a transaction (e.g., user purchase and money transfer) usually involves network communication via multiple servers. The server communication history is recorded as a path, where each vertex is an IP address. It is of significance to record such paths in Alibaba Cloud for daily maintenance tasks, such as anomaly server detection and network routing optimization. Motivated by the considerable data scale of IP paths, this paper proposes a compression method Overlap-Free Frequent Subpath (OFFS) to reduce the overall size. Meanwhile, the compressed paths should allow retrievals of any individual path, which is required by applications in our scenarios. We build a lookup table to match a series of frequent common subpaths to supernodes. Each path is shortened by replacing subpaths with corresponding supernodes in the table. We adopt a bottom-up framework to construct the lookup table in given iterations. Several optimizations are proposed to improve the compression ratio and speed. We conduct extensive experiments to show our effectiveness and efficiency based on several real datasets from Alibaba Cloud.
Please use this identifier to cite or link to this item: