A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

Goodswen, SJ; Kennedy, PJ; Ellis, JT

A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

Goodswen, SJ Kennedy, PJ

Ellis, JT

Permalink

Publication Type:: Journal Article
Citation:: BMC Bioinformatics, 2013, 14
Issue Date:: 2013-11-02

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (1.11 MB)

Adobe PDF

Download Published VersionAdobe PDF (1.11 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Goodswen, SJ	en_US
dc.contributor.author	Kennedy, PJ https://orcid.org/0000-0001-7837-3171	en_US
dc.contributor.author	Ellis, JT https://orcid.org/0000-0001-7328-4831	en_US
dc.date.available	2013-10-28	en_US
dc.date.issued	2013-11-02	en_US
dc.identifier.citation	BMC Bioinformatics, 2013, 14	en_US
dc.identifier.uri	http://hdl.handle.net/10453/26517
dc.identifier.uri	http://hdl.handle.net/10453/30202
dc.description.abstract	Background: An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets.Results: The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally.Conclusions: Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory. © 2013 Goodswen et al.; licensee BioMed Central Ltd.	en_US
dc.relation.ispartof	BMC Bioinformatics	en_US
dc.relation.isbasedon	10.1186/1471-2105-14-315	en_US
dc.subject.classification	Bioinformatics	en_US
dc.subject.mesh	Animals	en_US
dc.subject.mesh	Caenorhabditis elegans Proteins	en_US
dc.subject.mesh	Protozoan Proteins	en_US
dc.subject.mesh	Vaccines	en_US
dc.subject.mesh	Antigens	en_US
dc.subject.mesh	Sensitivity and Specificity	en_US
dc.subject.mesh	Computational Biology	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Artificial Intelligence	en_US
dc.subject.mesh	Computer Simulation	en_US
dc.subject.mesh	Drug Discovery	en_US
dc.title	A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms	en_US
dc.type	Journal Article
utslib.citation.volume	14	en_US
utslib.for	070708 Veterinary Parasitology	en_US
utslib.for	060102 Bioinformatics	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	06 Biological Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Life Sciences
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	14	en_US

Abstract:

Background: An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets.Results: The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally.Conclusions: Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory. © 2013 Goodswen et al.; licensee BioMed Central Ltd.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/26517