Empirical performance of cross-validation with oracle methods in a genomics context

Martinez, JG; Carroll, RJ; Müller, S; Sampson, JN; Chatterjee, N

Empirical performance of cross-validation with oracle methods in a genomics context

Martinez, JG Carroll, RJ Müller, S Sampson, JN Chatterjee, N

Permalink

Publication Type:: Journal Article
Citation:: American Statistician, 2011, 65 (4), pp. 223 - 228
Issue Date:: 2011-11-01

Closed Access

	Filename	Description	Size
	Empirical Performance of Cross Validation With Oracle Methods in a Genomics Context.pdf	Published Version	1.89 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Martinez, JG	en_US
dc.contributor.author	Carroll, RJ	en_US
dc.contributor.author	Müller, S	en_US
dc.contributor.author	Sampson, JN	en_US
dc.contributor.author	Chatterjee, N	en_US
dc.date.issued	2011-11-01	en_US
dc.identifier.citation	American Statistician, 2011, 65 (4), pp. 223 - 228	en_US
dc.identifier.issn	0003-1305	en_US
dc.identifier.uri	http://hdl.handle.net/10453/117682
dc.description.abstract	When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to nonoracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold crossvalidation with any oracle method, and not just the SCAD and Adaptive Lasso. © 2011 American Statistical Association.	en_US
dc.relation.ispartof	American Statistician	en_US
dc.relation.isbasedon	10.1198/tas.2011.11052	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.title	Empirical performance of cross-validation with oracle methods in a genomics context	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	65	en_US
utslib.for	0104 Statistics	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	closed_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	65	en_US

Abstract:

When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to nonoracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold crossvalidation with any oracle method, and not just the SCAD and Adaptive Lasso. © 2011 American Statistical Association.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/117682