Protein binding hot spots prediction from sequence only by a new ensemble learning method

Hu, SS; Chen, P; Wang, B; Li, J

Protein binding hot spots prediction from sequence only by a new ensemble learning method

Hu, SS Chen, P Wang, B Li, J

Permalink

Publication Type:: Journal Article
Citation:: Amino Acids, 2017, 49 (10), pp. 1773 - 1785
Issue Date:: 2017-10-01

Closed Access

	Filename	Description	Size
	Hu2017_Article_ProteinBindingHotSpotsPredicti.pdf	Published Version	1.49 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Hu, SS	en_US
dc.contributor.author	Chen, P	en_US
dc.contributor.author	Wang, B	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.date.available	2017-07-24	en_US
dc.date.issued	2017-10-01	en_US
dc.identifier.citation	Amino Acids, 2017, 49 (10), pp. 1773 - 1785	en_US
dc.identifier.issn	0939-4451	en_US
dc.identifier.uri	http://hdl.handle.net/10453/125381
dc.description.abstract	© 2017, Springer-Verlag GmbH Austria. Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set.Availability: http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm.	en_US
dc.relation.ispartof	Amino Acids	en_US
dc.relation.isbasedon	10.1007/s00726-017-2474-6	en_US
dc.subject.classification	Biochemistry & Molecular Biology	en_US
dc.subject.mesh	Sequence Analysis, Protein	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Models, Molecular	en_US
dc.subject.mesh	Databases, Protein	en_US
dc.subject.mesh	Machine Learning	en_US
dc.title	Protein binding hot spots prediction from sequence only by a new ensemble learning method	en_US
dc.type	Journal Article
utslib.citation.volume	10	en_US
utslib.citation.volume	49	en_US
utslib.for	1101 Medical Biochemistry and Metabolomics	en_US
utslib.for	0304 Medicinal and Biomolecular Chemistry	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.issue	10	en_US
pubs.publication-status	Published	en_US
pubs.volume	49	en_US

Abstract:

© 2017, Springer-Verlag GmbH Austria. Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set.Availability: http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/125381