Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.
- Publisher:
- Oxford University Press
- Publication Type:
- Journal Article
- Citation:
- Nucleic Acids Research, 2021, 49, (18), pp. 1-16
- Issue Date:
- 2021-07-21
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Full metadata record
Field | Value | Language |
---|---|---|
dc.contributor.author |
Zhang, X https://orcid.org/0000-0002-3089-9809 |
|
dc.contributor.author | Ping, P | |
dc.contributor.author |
Hutvagner, G https://orcid.org/0000-0002-7231-9446 |
|
dc.contributor.author |
Blumenstein, M https://orcid.org/0000-0002-9908-3744 |
|
dc.contributor.author |
Li, J https://orcid.org/0000-0003-1833-7413 |
|
dc.date.accessioned | 2022-02-28T03:34:45Z | |
dc.date.available | 2021-07-06 | |
dc.date.available | 2022-02-28T03:34:45Z | |
dc.date.issued | 2021-07-21 | |
dc.identifier.citation | Nucleic Acids Research, 2021, 49, (18), pp. 1-16 | |
dc.identifier.issn | 0305-1048 | |
dc.identifier.issn | 1362-4962 | |
dc.identifier.uri | http://hdl.handle.net/10453/154916 | |
dc.description.abstract | Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities. | |
dc.format | ||
dc.language | eng | |
dc.publisher | Oxford University Press | |
dc.relation | http://purl.org/au-research/grants/arc/DP180100120 | |
dc.relation.ispartof | Nucleic Acids Research | |
dc.relation.isbasedon | 10.1093/nar/gkab610 | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | 05 Environmental Sciences, 06 Biological Sciences, 08 Information and Computing Sciences | |
dc.subject.classification | Developmental Biology | |
dc.subject.mesh | Algorithms | |
dc.subject.mesh | Animals | |
dc.subject.mesh | Computational Biology | |
dc.subject.mesh | Databases, Genetic | |
dc.subject.mesh | High-Throughput Nucleotide Sequencing | |
dc.subject.mesh | Humans | |
dc.subject.mesh | MicroRNAs | |
dc.subject.mesh | Salmon | |
dc.subject.mesh | Sequence Analysis, DNA | |
dc.subject.mesh | Animals | |
dc.subject.mesh | Salmon | |
dc.subject.mesh | Humans | |
dc.subject.mesh | MicroRNAs | |
dc.subject.mesh | Sequence Analysis, DNA | |
dc.subject.mesh | Computational Biology | |
dc.subject.mesh | Algorithms | |
dc.subject.mesh | Databases, Genetic | |
dc.subject.mesh | High-Throughput Nucleotide Sequencing | |
dc.title | Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach. | |
dc.type | Journal Article | |
utslib.citation.volume | 49 | |
utslib.location.activity | England | |
utslib.for | 05 Environmental Sciences | |
utslib.for | 06 Biological Sciences | |
utslib.for | 08 Information and Computing Sciences | |
pubs.organisational-group | /University of Technology Sydney | |
pubs.organisational-group | /University of Technology Sydney/Faculty of Engineering and Information Technology | |
pubs.organisational-group | /University of Technology Sydney/Strength - CHT - Health Technologies | |
pubs.organisational-group | /University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre | |
pubs.organisational-group | /University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute | |
pubs.organisational-group | /University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information | |
pubs.organisational-group | /University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering | |
pubs.organisational-group | /University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science | |
pubs.organisational-group | /University of Technology Sydney/Centre for Health Technologies (CHT) | |
utslib.copyright.status | open_access | * |
pubs.consider-herdc | false | |
dc.date.updated | 2022-02-28T03:34:42Z | |
pubs.issue | 18 | |
pubs.publication-status | Published | |
pubs.volume | 49 | |
utslib.citation.issue | 18 |
Abstract:
Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.
Please use this identifier to cite or link to this item:
Download statistics for the last 12 months
Not enough data to produce graph