Critical Assessment of Metagenome Interpretation − a benchmark of computational metagenomics software

Darling, AE

Critical Assessment of Metagenome Interpretation − a benchmark of computational metagenomics software

Darling, AE

Permalink

Publisher:: Nature Publishing Group
Publication Type:: Journal Article
Citation:: Nature Methods, 2017
Issue Date:: 2017-01-09

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (1.54 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Darling, AE	en_US
dc.date.issued	2017-01-09	en_US
dc.identifier.citation	Nature Methods, 2017	en_US
dc.identifier.issn	1548-7091	en_US
dc.identifier.uri	http://hdl.handle.net/10453/115591
dc.description.abstract	In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from newly sequenced ~700 microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions.	en_US
dc.publisher	Nature Publishing Group	en_US
dc.relation	http://purl.org/au-research/grants/arc/LP150100912
dc.relation.ispartof	Nature Methods	en_US
dc.relation.isbasedon	10.1101/099127	en_US
dc.relation.isreplacedby	10453/124093
dc.relation.isreplacedby	http://hdl.handle.net/10453/124093
dc.subject.classification	Developmental Biology	en_US
dc.title	Critical Assessment of Metagenome Interpretation − a benchmark of computational metagenomics software	en_US
dc.type	Journal Article
utslib.for	06 Biological Sciences	en_US
utslib.for	10 Technology	en_US
utslib.for	11 Medical And Health Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Strength - ithree - Institute of Infection, Immunity and Innovation
utslib.copyright.status	open_access
pubs.declined	2017-08-21T16:15:04.190+1000
pubs.consider-herdc	false	en_US
pubs.merge-to	10453/124093
pubs.merge-to	http://hdl.handle.net/10453/124093
pubs.deleted	2017-08-21T16:15:04.190+1000
pubs.publication-status	In preparation	en_US

Abstract:

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from newly sequenced ~700 microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/115591