Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network

Li, J; Quan, L; Chen, Y; Lü, Q

Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network

Li, J

Quan, L Chen, Y Lü, Q

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Neurocomputing, 2019, 357, pp. 86-100
Issue Date:: 2019-09-10

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 10 Sep 2021

Adobe PDF

Download Accepted ManuscriptAdobe PDF (2.22 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413
dc.contributor.author	Quan, L
dc.contributor.author	Chen, Y
dc.contributor.author	Lü, Q
dc.date.accessioned	2020-06-05T23:20:46Z
dc.date.available	2020-06-05T23:20:46Z
dc.date.issued	2019-09-10
dc.identifier.citation	Neurocomputing, 2019, 357, pp. 86-100
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/141159
dc.description.abstract	© 2019 Elsevier B.V. Proteins often interact with each other and form protein complexes to carry out various biochemical activities. Knowledge of the interaction sites is helpful for understanding disease mechanisms and drug design. Accurate prediction of the interaction sites from protein sequences is still a challenging task and severe imbalance data also decreased the performance of computational methods. In this study, we propose to use a deep learning method for improving the imbalanced prediction of protein interaction sites. We develop a new simplified long short-term memory (SLSTM) network to implement a deep learning architecture (named DLPred). To deal with the imbalanced classification in the deep learning model, we explore three new ideas. First, our collection of the training data is to construct a set of protein sequences, instead of a set of just single residues, to retain the entire sequential completeness of each protein. Second, a new penalization factor is appended to the loss function such that the penalization to the non-interaction site loss can be effectively enhanced. Third, multi-task learning of interaction sites and residue solvent accessibility prediction are used for correcting the preference of the prediction model on the non-interaction sites. Our model is evaluated on three public datasets: Dset186, Dtestset72 and PDBtestset164. Compared with current state-of-the-art methods, DLPred is able to significantly improve the predictive accuracies and AUC values while improving the F-measure. The training dataset, test datasets, a standalone version of DLPred and online service are available at http://qianglab.scst.suda.edu.cn/dlp/.
dc.language	en
dc.publisher	Elsevier BV
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2019.05.013
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network
dc.type	Journal Article
utslib.citation.volume	357
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2021-09-10T00:00:00+1000Z
dc.date.updated	2020-06-05T23:20:43Z
pubs.publication-status	Published
pubs.volume	357
utslib.start-page	86

Abstract:

© 2019 Elsevier B.V. Proteins often interact with each other and form protein complexes to carry out various biochemical activities. Knowledge of the interaction sites is helpful for understanding disease mechanisms and drug design. Accurate prediction of the interaction sites from protein sequences is still a challenging task and severe imbalance data also decreased the performance of computational methods. In this study, we propose to use a deep learning method for improving the imbalanced prediction of protein interaction sites. We develop a new simplified long short-term memory (SLSTM) network to implement a deep learning architecture (named DLPred). To deal with the imbalanced classification in the deep learning model, we explore three new ideas. First, our collection of the training data is to construct a set of protein sequences, instead of a set of just single residues, to retain the entire sequential completeness of each protein. Second, a new penalization factor is appended to the loss function such that the penalization to the non-interaction site loss can be effectively enhanced. Third, multi-task learning of interaction sites and residue solvent accessibility prediction are used for correcting the preference of the prediction model on the non-interaction sites. Our model is evaluated on three public datasets: Dset186, Dtestset72 and PDBtestset164. Compared with current state-of-the-art methods, DLPred is able to significantly improve the predictive accuracies and AUC values while improving the F-measure. The training dataset, test datasets, a standalone version of DLPred and online service are available at http://qianglab.scst.suda.edu.cn/dlp/.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/141159