Alignathon: A competitive assessment of whole-genome alignment methods
Earl, D
Nguyen, N
Hickey, G
Harris, RS
Fitzgerald, S
Beal, K
Seledtsov, I
Molodtsov, V
Raney, BJ
Clawson, H
Kim, J
Kemena, C
Chang, JM
Erb, I
Poliakov, A
Hou, M
Herrero, J
Kent, WJ
Solovyev, V
Darling, AE
Ma, J
Notredame, C
Brudno, M
Dubchak, I
Haussler, D
Paten, B
- Publication Type:
- Journal Article
- Citation:
- Genome Research, 2014, 24 (12), pp. 2077 - 2089
- Issue Date:
- 2014-01-01
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Full metadata record
Field | Value | Language |
---|---|---|
dc.contributor.author | Earl, D | en_US |
dc.contributor.author | Nguyen, N | en_US |
dc.contributor.author | Hickey, G | en_US |
dc.contributor.author | Harris, RS | en_US |
dc.contributor.author | Fitzgerald, S | en_US |
dc.contributor.author | Beal, K | en_US |
dc.contributor.author | Seledtsov, I | en_US |
dc.contributor.author | Molodtsov, V | en_US |
dc.contributor.author | Raney, BJ | en_US |
dc.contributor.author | Clawson, H | en_US |
dc.contributor.author | Kim, J | en_US |
dc.contributor.author | Kemena, C | en_US |
dc.contributor.author | Chang, JM | en_US |
dc.contributor.author | Erb, I | en_US |
dc.contributor.author | Poliakov, A | en_US |
dc.contributor.author | Hou, M | en_US |
dc.contributor.author | Herrero, J | en_US |
dc.contributor.author | Kent, WJ | en_US |
dc.contributor.author | Solovyev, V | en_US |
dc.contributor.author |
Darling, AE |
en_US |
dc.contributor.author | Ma, J | en_US |
dc.contributor.author | Notredame, C | en_US |
dc.contributor.author | Brudno, M | en_US |
dc.contributor.author | Dubchak, I | en_US |
dc.contributor.author | Haussler, D | en_US |
dc.contributor.author | Paten, B | en_US |
dc.date.issued | 2014-01-01 | en_US |
dc.identifier.citation | Genome Research, 2014, 24 (12), pp. 2077 - 2089 | en_US |
dc.identifier.issn | 1088-9051 | en_US |
dc.identifier.uri | http://hdl.handle.net/10453/43476 | |
dc.description.abstract | © 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments. | en_US |
dc.relation.ispartof | Genome Research | en_US |
dc.relation.isbasedon | 10.1101/gr.174920.114 | en_US |
dc.subject.classification | Bioinformatics | en_US |
dc.subject.mesh | Animals | en_US |
dc.subject.mesh | Mammals | en_US |
dc.subject.mesh | Humans | en_US |
dc.subject.mesh | Reproducibility of Results | en_US |
dc.subject.mesh | Sequence Alignment | en_US |
dc.subject.mesh | Computational Biology | en_US |
dc.subject.mesh | Genomics | en_US |
dc.subject.mesh | Phylogeny | en_US |
dc.subject.mesh | Genome | en_US |
dc.subject.mesh | Computer Simulation | en_US |
dc.subject.mesh | Software | en_US |
dc.subject.mesh | Genome-Wide Association Study | en_US |
dc.subject.mesh | Datasets as Topic | en_US |
dc.title | Alignathon: A competitive assessment of whole-genome alignment methods | en_US |
dc.type | Journal Article | |
utslib.citation.volume | 12 | en_US |
utslib.citation.volume | 24 | en_US |
utslib.for | 0605 Microbiology | en_US |
utslib.for | 0604 Genetics | en_US |
utslib.for | 06 Biological Sciences | en_US |
utslib.for | 11 Medical and Health Sciences | en_US |
pubs.embargo.period | Not known | en_US |
pubs.organisational-group | /University of Technology Sydney | |
pubs.organisational-group | /University of Technology Sydney/Faculty of Science | |
pubs.organisational-group | /University of Technology Sydney/Strength - ithree - Institute of Infection, Immunity and Innovation | |
utslib.copyright.status | open_access | |
pubs.issue | 12 | en_US |
pubs.publication-status | Published | en_US |
pubs.volume | 24 | en_US |
Abstract:
© 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
Please use this identifier to cite or link to this item:
Download statistics for the last 12 months
Not enough data to produce graph