Nested subtree hash kernels for large-scale graph classification over streams

Li, B; Zhu, X; Chi, L; Zhang, C

Nested subtree hash kernels for large-scale graph classification over streams

Li, B Zhu, X Chi, L Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE International Conference on Data Mining, ICDM, 2012, pp. 399 - 408
Issue Date:: 2012-12-01

Closed Access

	Filename	Description	Size
	2012001714OK.pdf		662.27 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, B	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Chi, L	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2012-12-01	en_US
dc.identifier.citation	Proceedings - IEEE International Conference on Data Mining, ICDM, 2012, pp. 399 - 408	en_US
dc.identifier.isbn	9780769549057	en_US
dc.identifier.issn	1550-4786	en_US
dc.identifier.uri	http://hdl.handle.net/10453/23007
dc.description.abstract	Most studies on graph classification focus on designing fast and effective kernels. Several fast subtree kernels have achieved a linear time-complexity w.r.t. the number of edges under the condition that a common feature space (e.g., a subtree pattern list) is needed to represent all graphs. This will be infeasible when graphs are presented in a stream with rapidly emerging subtree patterns. In this case, computing a kernel matrix for graphs over the entire stream is difficult since the graphs in the expired chunks cannot be projected onto the unlimitedly expanding feature space again. This leads to a big trouble for graph classification over streams - Different portions of graphs have different feature spaces. In this paper, we aim to enable large-scale graph classification over streams using the classical ensemble learning framework, which requires the data in different chunks to be in the same feature space. To this end, we propose a Nested Subtree Hashing (NSH) algorithm to recursively project the multi-resolution subtree patterns of different chunks onto a set of common low-dimensional feature spaces. We theoretically analyze the derived NSH kernel and obtain a number of favorable properties: 1) The NSH kernel is an unbiased and highly concentrated estimator of the fast subtree kernel. 2) The bound of convergence rate tends to be tighter as the NSH algorithm steps into a higher resolution. 3) The NSH kernel is robust in tolerating concept drift between chunks over a stream. We also empirically test the NSH kernel on both a large-scale synthetic graph data set and a real-world chemical compounds data set for anticancer activity prediction. The experimental results validate that the NSH kernel is indeed efficient and robust for graph classification over streams. © 2012 IEEE.	en_US
dc.relation.ispartof	Proceedings - IEEE International Conference on Data Mining, ICDM	en_US
dc.relation.isbasedon	10.1109/ICDM.2012.101	en_US
dc.title	Nested subtree hash kernels for large-scale graph classification over streams	en_US
dc.type	Conference Proceeding
utslib.for	080403 Data Structures	en_US
utslib.for	080105 Expert Systems	en_US
dc.location.activity	Brussels, Belgium	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Most studies on graph classification focus on designing fast and effective kernels. Several fast subtree kernels have achieved a linear time-complexity w.r.t. the number of edges under the condition that a common feature space (e.g., a subtree pattern list) is needed to represent all graphs. This will be infeasible when graphs are presented in a stream with rapidly emerging subtree patterns. In this case, computing a kernel matrix for graphs over the entire stream is difficult since the graphs in the expired chunks cannot be projected onto the unlimitedly expanding feature space again. This leads to a big trouble for graph classification over streams - Different portions of graphs have different feature spaces. In this paper, we aim to enable large-scale graph classification over streams using the classical ensemble learning framework, which requires the data in different chunks to be in the same feature space. To this end, we propose a Nested Subtree Hashing (NSH) algorithm to recursively project the multi-resolution subtree patterns of different chunks onto a set of common low-dimensional feature spaces. We theoretically analyze the derived NSH kernel and obtain a number of favorable properties: 1) The NSH kernel is an unbiased and highly concentrated estimator of the fast subtree kernel. 2) The bound of convergence rate tends to be tighter as the NSH algorithm steps into a higher resolution. 3) The NSH kernel is robust in tolerating concept drift between chunks over a stream. We also empirically test the NSH kernel on both a large-scale synthetic graph data set and a real-world chemical compounds data set for anticancer activity prediction. The experimental results validate that the NSH kernel is indeed efficient and robust for graph classification over streams. © 2012 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/23007