Data integration with high dimensionality

Gao, X; Carroll, RJ

Data integration with high dimensionality

Gao, X Carroll, RJ

Permalink

Publication Type:: Journal Article
Citation:: Biometrika, 2017, 104 (2), pp. 251 - 272
Issue Date:: 2017-06-01

Closed Access

	Filename	Description	Size
	asx023.pdf	Published Version	342.84 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Gao, X	en_US
dc.contributor.author	Carroll, RJ https://orcid.org/0000-0002-5465-9682	en_US
dc.date.issued	2017-06-01	en_US
dc.identifier.citation	Biometrika, 2017, 104 (2), pp. 251 - 272	en_US
dc.identifier.issn	0006-3444	en_US
dc.identifier.uri	http://hdl.handle.net/10453/124106
dc.description.abstract	© 2017 Biometrika Trust. We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source.	en_US
dc.relation.ispartof	Biometrika	en_US
dc.relation.isbasedon	10.1093/biomet/asx023	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.title	Data integration with high dimensionality	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	104	en_US
utslib.for	0104 Statistics	en_US
utslib.for	1403 Econometrics	en_US
utslib.for	0103 Numerical and Computational Mathematics	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	closed_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	104	en_US

Abstract:

© 2017 Biometrika Trust. We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/124106