Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Cai, T; Lin, X; Carroll, RJ

Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Cai, T Lin, X Carroll, RJ

Permalink

Publication Type:: Journal Article
Citation:: Biostatistics, 2012, 13 (4), pp. 776 - 790
Issue Date:: 2012-09-01

Closed Access

	Filename	Description	Size
	kxs015.pdf	Published Version	450.16 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Cai, T	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Carroll, RJ	en_US
dc.date.issued	2012-09-01	en_US
dc.identifier.citation	Biostatistics, 2012, 13 (4), pp. 776 - 790	en_US
dc.identifier.issn	1465-4644	en_US
dc.identifier.uri	http://hdl.handle.net/10453/114926
dc.description.abstract	In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up-or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk. © 2012 The Author.	en_US
dc.relation.ispartof	Biostatistics	en_US
dc.relation.isbasedon	10.1093/biostatistics/kxs015	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Breast Neoplasms	en_US
dc.subject.mesh	Genetic Markers	en_US
dc.subject.mesh	Data Interpretation, Statistical	en_US
dc.subject.mesh	Phenotype	en_US
dc.subject.mesh	Polymorphism, Single Nucleotide	en_US
dc.subject.mesh	Computer Simulation	en_US
dc.subject.mesh	Female	en_US
dc.subject.mesh	Receptor, Fibroblast Growth Factor, Type 2	en_US
dc.title	Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	13	en_US
utslib.for	0104 Statistics	en_US
utslib.for	0604 Genetics	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	closed_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	13	en_US

Abstract:

In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up-or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk. © 2012 The Author.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/114926