How to do quantile normalization correctly for gene expression data analyses.

Zhao, Y; Wong, L; Goh, WWB

How to do quantile normalization correctly for gene expression data analyses.

Zhao, Y Wong, L Goh, WWB

Permalink

Publisher:: Springer Science and Business Media LLC
Publication Type:: Journal Article
Citation:: Scientific reports, 2020, 10, (1), pp. 15534
Issue Date:: 2020-09-23

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (4.14 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Y
dc.contributor.author	Wong, L
dc.contributor.author	Goh, WWB
dc.date.accessioned	2021-02-07T06:20:45Z
dc.date.available	2020-08-03
dc.date.available	2021-02-07T06:20:45Z
dc.date.issued	2020-09-23
dc.identifier.citation	Scientific reports, 2020, 10, (1), pp. 15534
dc.identifier.issn	2045-2322
dc.identifier.issn	2045-2322
dc.identifier.uri	http://hdl.handle.net/10453/145897
dc.description.abstract	Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split ("Class-specific"). Via simulations with both real and simulated batch effects, we demonstrate that the "Class-specific" strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the "Class-specific" strategy.
dc.format	Electronic
dc.language	eng
dc.publisher	Springer Science and Business Media LLC
dc.relation.ispartof	Scientific reports
dc.relation.isbasedon	10.1038/s41598-020-72664-6
dc.rights	This is a post-peer-review, pre-copyedit version of an article published in Scientific reports. The final authenticated version is available online at: https://dx.doi.org/10.1038/s41598-020-72664-6.	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.mesh	Humans
dc.subject.mesh	Data Interpretation, Statistical
dc.subject.mesh	Models, Statistical
dc.subject.mesh	Gene Expression Profiling
dc.subject.mesh	Transcriptome
dc.subject.mesh	Datasets as Topic
dc.subject.mesh	Data Analysis
dc.subject.mesh	Data Analysis
dc.subject.mesh	Data Interpretation, Statistical
dc.subject.mesh	Datasets as Topic
dc.subject.mesh	Gene Expression Profiling
dc.subject.mesh	Humans
dc.subject.mesh	Models, Statistical
dc.subject.mesh	Transcriptome
dc.title	How to do quantile normalization correctly for gene expression data analyses.
dc.type	Journal Article
utslib.citation.volume	10
utslib.location.activity	England
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
dc.date.updated	2021-02-07T06:20:16Z
pubs.issue	1
pubs.publication-status	Published
pubs.volume	10
utslib.citation.issue	1

Abstract:

Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split ("Class-specific"). Via simulations with both real and simulated batch effects, we demonstrate that the "Class-specific" strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the "Class-specific" strategy.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/145897