A hierarchical model to detect differential gene expression distributions, and their investigation as a reflection of dysregulation in cancer

Publication Type:
Thesis
Issue Date:
2021
Full metadata record
Data from genome-wide gene expression studies provides a wealth of information on diseases such as cancer, which can lead to insights into disease mechanisms and advances in diagnosis and treatment. Analysis of expression data is most commonly aimed at identifying genes whose mean expression levels are increased or decreased in disease compared to normal tissue, or between disease subtypes - differential expression analysis. However, there is strong evidence that changes in the variability of gene expression, without a difference in mean, can also be relevant. Genes related to cancer have been shown to have changes in the variability of their expression between normal and tumour tissue, and these differentially variable genes have also been found to be informative for diagnostic and prognostic cancer classification. This thesis addresses several aspects of research on differential gene expression variability, and the broader concept of differential distribution, defined as any difference in the distribution of expression values between groups. This work makes three contributions to knowledge, relating to cancer classification, identification of differentially variable or distributed genes, and the biology of differential variability and distribution in cancer. Contribution 1 extends previous work by demonstrating that genes identified by differential variability or distribution can be used to classify closely related cancer subtypes, rather than purely diagnostic or prognostic classification. Contribution 2 is a Bayesian hierarchical model for RNA-seq data that provides tests for differential expression, variability and distribution. The performance of each test is compared with existing methods on simulated data and on real RNA-seq datasets modified to artificially introduce changes in expression between groups. The differential expression test is competitive with state-of-the-art methods, and the differential variability test improves on existing methods, particularly for small sample sizes. The differential distribution test is the first such test available for RNA-seq data. Contribution 3 builds on previous work by providing the first clear demonstration that differential variability and differential distribution analyses can identify cancer-related genes, and that differential expression and differential variability identify distinct sets of cancer-related genes, each with different biological functions. Overall, this research confirms and extends previous findings showing that changes in expression variability and distribution in cancer are both of biological significance and informative for classification. As well as further demonstrating the need to look beyond differential expression to a comprehensive assessment of changes in gene expression distributions, this work provides a method that enables the identification of these differentially distributed genes.
Please use this identifier to cite or link to this item: