Finding the best not the most: Regularized loss minimization subgraph selection for graph classification

Pan, S; Wu, J; Zhu, X; Long, G; Zhang, C

Finding the best not the most: Regularized loss minimization subgraph selection for graph classification

Pan, S

Wu, J

Zhu, X Long, G

Zhang, C

Permalink

Publication Type:: Journal Article
Citation:: Pattern Recognition, 2015, 48 (11), pp. 3783 - 3796
Issue Date:: 2015-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (793.76 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Pan, S https://orcid.org/0000-0003-0794-527X	en_US
dc.contributor.author	Wu, J https://orcid.org/0000-0002-1371-5801	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.available	2020-05-25T19:25:46Z
dc.date.issued	2015-11-01	en_US
dc.identifier.citation	Pattern Recognition, 2015, 48 (11), pp. 3783 - 3796	en_US
dc.identifier.issn	0031-3203	en_US
dc.identifier.uri	http://hdl.handle.net/10453/37319
dc.description.abstract	© 2015 Elsevier Ltd. All rights reserved. Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP140102206
dc.relation.ispartof	Pattern Recognition	en_US
dc.relation.isbasedon	10.1016/j.patcog.2015.05.019	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Finding the best not the most: Regularized loss minimization subgraph selection for graph classification	en_US
dc.type	Journal Article
utslib.citation.volume	11	en_US
utslib.citation.volume	48	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	11	en_US
pubs.publication-status	Published	en_US
pubs.volume	48	en_US

Abstract:

© 2015 Elsevier Ltd. All rights reserved. Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/37319