Inverse similarity and reliable negative samples for drug side-effect prediction

Zheng, Y; Peng, H; Ghosh, S; Lan, C; Li, J

Inverse similarity and reliable negative samples for drug side-effect prediction

Zheng, Y Peng, H

Ghosh, S Lan, C Li, J

Permalink

Publication Type:: Journal Article
Citation:: BMC Bioinformatics, 2019, 19
Issue Date:: 2019-02-04

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (3.86 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zheng, Y	en_US
dc.contributor.author	Peng, H https://orcid.org/0000-0002-4379-8097	en_US
dc.contributor.author	Ghosh, S	en_US
dc.contributor.author	Lan, C	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.date.available	2018-12-07	en_US
dc.date.issued	2019-02-04	en_US
dc.identifier.citation	BMC Bioinformatics, 2019, 19	en_US
dc.identifier.uri	http://hdl.handle.net/10453/135556
dc.description.abstract	© 2019 The Author(s). Background: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. Methods: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. Results: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. Conclusions: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement.	en_US
dc.relation.ispartof	BMC Bioinformatics	en_US
dc.relation.isbasedon	10.1186/s12859-018-2563-x	en_US
dc.subject.classification	Bioinformatics	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Proteins	en_US
dc.subject.mesh	Computational Biology	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Databases as Topic	en_US
dc.subject.mesh	Drug-Related Side Effects and Adverse Reactions	en_US
dc.subject.mesh	Machine Learning	en_US
dc.title	Inverse similarity and reliable negative samples for drug side-effect prediction	en_US
dc.type	Journal Article
utslib.citation.volume	19	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	06 Biological Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	19	en_US

Abstract:

© 2019 The Author(s). Background: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. Methods: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. Results: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. Conclusions: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/135556