Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

Zhang, X; Ping, P; Hutvagner, G; Blumenstein, M; Li, J

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

Zhang, X

Ping, P Hutvagner, G

Blumenstein, M

Li, J

Permalink

Publisher:: Oxford University Press
Publication Type:: Journal Article
Citation:: Nucleic Acids Research, 2021, 49, (18), pp. 1-16
Issue Date:: 2021-07-21

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (1.2 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, X https://orcid.org/0000-0002-3089-9809
dc.contributor.author	Ping, P
dc.contributor.author	Hutvagner, G https://orcid.org/0000-0002-7231-9446
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413
dc.date.accessioned	2022-02-28T03:34:45Z
dc.date.available	2021-07-06
dc.date.available	2022-02-28T03:34:45Z
dc.date.issued	2021-07-21
dc.identifier.citation	Nucleic Acids Research, 2021, 49, (18), pp. 1-16
dc.identifier.issn	0305-1048
dc.identifier.issn	1362-4962
dc.identifier.uri	http://hdl.handle.net/10453/154916
dc.description.abstract	Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.
dc.format	Print
dc.language	eng
dc.publisher	Oxford University Press
dc.relation	http://purl.org/au-research/grants/arc/DP180100120
dc.relation.ispartof	Nucleic Acids Research
dc.relation.isbasedon	10.1093/nar/gkab610
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	05 Environmental Sciences, 06 Biological Sciences, 08 Information and Computing Sciences
dc.subject.classification	Developmental Biology
dc.subject.mesh	Algorithms
dc.subject.mesh	Animals
dc.subject.mesh	Computational Biology
dc.subject.mesh	Databases, Genetic
dc.subject.mesh	High-Throughput Nucleotide Sequencing
dc.subject.mesh	Humans
dc.subject.mesh	MicroRNAs
dc.subject.mesh	Salmon
dc.subject.mesh	Sequence Analysis, DNA
dc.subject.mesh	Animals
dc.subject.mesh	Salmon
dc.subject.mesh	Humans
dc.subject.mesh	MicroRNAs
dc.subject.mesh	Sequence Analysis, DNA
dc.subject.mesh	Computational Biology
dc.subject.mesh	Algorithms
dc.subject.mesh	Databases, Genetic
dc.subject.mesh	High-Throughput Nucleotide Sequencing
dc.title	Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.
dc.type	Journal Article
utslib.citation.volume	49
utslib.location.activity	England
utslib.for	05 Environmental Sciences
utslib.for	06 Biological Sciences
utslib.for	08 Information and Computing Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Centre for Health Technologies (CHT)
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2022-02-28T03:34:42Z
pubs.issue	18
pubs.publication-status	Published
pubs.volume	49
utslib.citation.issue	18

Abstract:

Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154916