Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Chen, P; Li, J; Wong, L; Kuwahara, H; Huang, JZ; Gao, X

Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Chen, P Li, J

Wong, L

Kuwahara, H Huang, JZ Gao, X

Permalink

Publication Type:: Journal Article
Citation:: Proteins: Structure, Function and Bioinformatics, 2013, 81 (8), pp. 1351 - 1362
Issue Date:: 2013-08-01

Closed Access

	Filename	Description	Size
	2013001060OK.pdf		2.63 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, P	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.contributor.author	Wong, L https://orcid.org/0000-0003-1241-5441	en_US
dc.contributor.author	Kuwahara, H	en_US
dc.contributor.author	Huang, JZ	en_US
dc.contributor.author	Gao, X	en_US
dc.date.available	2013-02-23	en_US
dc.date.issued	2013-08-01	en_US
dc.identifier.citation	Proteins: Structure, Function and Bioinformatics, 2013, 81 (8), pp. 1351 - 1362	en_US
dc.identifier.issn	0887-3585	en_US
dc.identifier.uri	http://hdl.handle.net/10453/27174
dc.description.abstract	Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.	en_US
dc.relation.ispartof	Proteins: Structure, Function and Bioinformatics	en_US
dc.relation.isbasedon	10.1002/prot.24278	en_US
dc.subject.classification	Bioinformatics	en_US
dc.subject.mesh	Animals	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Drosophila	en_US
dc.subject.mesh	Juvenile Hormones	en_US
dc.subject.mesh	Amino Acids	en_US
dc.subject.mesh	Proteins	en_US
dc.subject.mesh	Drosophila Proteins	en_US
dc.subject.mesh	Receptors, Erythropoietin	en_US
dc.subject.mesh	Amino Acid Sequence	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Models, Molecular	en_US
dc.subject.mesh	Artificial Intelligence	en_US
dc.subject.mesh	Databases, Protein	en_US
dc.subject.mesh	Protein Interaction Maps	en_US
dc.title	Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences	en_US
dc.type	Journal Article
utslib.citation.volume	8	en_US
utslib.citation.volume	81	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	06 Biological Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.issue	8	en_US
pubs.publication-status	Published	en_US
pubs.volume	81	en_US

Abstract:

Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/27174