A Two-Sample Test for Equality of Means in High Dimension

Gregory, KB; Carroll, RJ; Baladandayuthapani, V; Lahiri, SN

A Two-Sample Test for Equality of Means in High Dimension

Gregory, KB Carroll, RJ Baladandayuthapani, V Lahiri, SN

Permalink

Publication Type:: Journal Article
Citation:: Journal of the American Statistical Association, 2015, 110 (510), pp. 837 - 849
Issue Date:: 2015-04-03

Closed Access

	Filename	Description	Size
	\\utsfs.adsroot.uts.edu.au\homes\staff\108848\Desktop\01621459.2014.pdf	Published Version	969.55 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Gregory, KB	en_US
dc.contributor.author	Carroll, RJ	en_US
dc.contributor.author	Baladandayuthapani, V	en_US
dc.contributor.author	Lahiri, SN	en_US
dc.date.issued	2015-04-03	en_US
dc.identifier.citation	Journal of the American Statistical Association, 2015, 110 (510), pp. 837 - 849	en_US
dc.identifier.issn	0162-1459	en_US
dc.identifier.uri	http://hdl.handle.net/10453/118356
dc.description.abstract	© 2015 American Statistical Association. We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.	en_US
dc.relation.ispartof	Journal of the American Statistical Association	en_US
dc.relation.isbasedon	10.1080/01621459.2014.934826	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.title	A Two-Sample Test for Equality of Means in High Dimension	en_US
dc.type	Journal Article
utslib.citation.volume	510	en_US
utslib.citation.volume	110	en_US
utslib.for	0104 Statistics	en_US
utslib.for	1403 Econometrics	en_US
utslib.for	1603 Demography	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	closed_access
pubs.issue	510	en_US
pubs.publication-status	Published	en_US
pubs.volume	110	en_US

Abstract:

© 2015 American Statistical Association. We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/118356