Margin-based ensemble classifier for protein fold recognition

Yang, T; Kecman, V; Cao, L; Zhang, C; Zhexue Huang, J

Margin-based ensemble classifier for protein fold recognition

Yang, T Kecman, V Cao, L

Zhang, C

Zhexue Huang, J

Permalink

Publication Type:: Journal Article
Citation:: Expert Systems with Applications, 2011, 38 (10), pp. 12348 - 12355
Issue Date:: 2011-09-15

Closed Access

	Filename	Description	Size
	2010004623OK.pdf		304.41 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, T	en_US
dc.contributor.author	Kecman, V	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.contributor.author	Zhexue Huang, J	en_US
dc.date.issued	2011-09-15	en_US
dc.identifier.citation	Expert Systems with Applications, 2011, 38 (10), pp. 12348 - 12355	en_US
dc.identifier.issn	0957-4174	en_US
dc.identifier.uri	http://hdl.handle.net/10453/14498
dc.description.abstract	Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes with only a small number of training samples, and multiple heterogeneous feature groups derived from different propensities of amino acids. This raises the need for a classification method that is able to handle the data complexity with a high prediction accuracy for practical applications. To this end, a novel ensemble classifier, called MarFold, is proposed in this paper which combines three margin-based classifiers for protein fold recognition. The effectiveness of our method is demonstrated with the benchmark D-B dataset with 27 classes. The overall prediction accuracy obtained by MarFold is 71.7%, which surpasses the existing fold recognition methods by 3.1-15.7%. Moreover, one component classifier for MarFold, called ALH, has obtained a prediction accuracy of 65.5%, which is 4.7-9.5% higher than the prediction accuracies for the published methods using single classifiers. Additionally, the feature set of pairwise frequency information about the amino acids, which is adopted by MarFold, is found to be important for discriminating folding patterns. These results imply that the MarFold method and its operation engine ALH might become useful vehicles for protein fold recognition, as well as other bioinformatics tasks. The MarFold method and the datasets can be obtained from: (http://www-staff.it.uts.edu.au/∼lbcao/publication/MarFold.7z). © 2010 Elsevier Ltd. All rights reserved.	en_US
dc.relation.ispartof	Expert Systems with Applications	en_US
dc.relation.isbasedon	10.1016/j.eswa.2011.04.014	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Margin-based ensemble classifier for protein fold recognition	en_US
dc.type	Journal Article
utslib.citation.volume	10	en_US
utslib.citation.volume	38	en_US
utslib.for	0102 Applied Mathematics	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
dc.location.activity	WOS:000292169500038	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	10	en_US
pubs.publication-status	Published	en_US
pubs.volume	38	en_US

Abstract:

Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes with only a small number of training samples, and multiple heterogeneous feature groups derived from different propensities of amino acids. This raises the need for a classification method that is able to handle the data complexity with a high prediction accuracy for practical applications. To this end, a novel ensemble classifier, called MarFold, is proposed in this paper which combines three margin-based classifiers for protein fold recognition. The effectiveness of our method is demonstrated with the benchmark D-B dataset with 27 classes. The overall prediction accuracy obtained by MarFold is 71.7%, which surpasses the existing fold recognition methods by 3.1-15.7%. Moreover, one component classifier for MarFold, called ALH, has obtained a prediction accuracy of 65.5%, which is 4.7-9.5% higher than the prediction accuracies for the published methods using single classifiers. Additionally, the feature set of pairwise frequency information about the amino acids, which is adopted by MarFold, is found to be important for discriminating folding patterns. These results imply that the MarFold method and its operation engine ALH might become useful vehicles for protein fold recognition, as well as other bioinformatics tasks. The MarFold method and the datasets can be obtained from: (http://www-staff.it.uts.edu.au/∼lbcao/publication/MarFold.7z). © 2010 Elsevier Ltd. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/14498