Clustering in general measurement error models

Su, Y; Reedy, J; Carroll, RJ

Clustering in general measurement error models

Su, Y Reedy, J Carroll, RJ

Permalink

Publication Type:: Journal Article
Citation:: Statistica Sinica, 2018, 28 (4), pp. 2337 - 2351
Issue Date:: 2018-10-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Submitted VersionAdobe PDF (400.22 kB)

Adobe PDF

Download Submitted VersionAdobe PDF (321.04 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Su, Y	en_US
dc.contributor.author	Reedy, J	en_US
dc.contributor.author	Carroll, RJ	en_US
dc.date.issued	2018-10-01	en_US
dc.identifier.citation	Statistica Sinica, 2018, 28 (4), pp. 2337 - 2351	en_US
dc.identifier.issn	1017-0405	en_US
dc.identifier.uri	http://hdl.handle.net/10453/134046
dc.description.abstract	© Institute of Statistical Science. All rights reserved. This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there were no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.	en_US
dc.relation.ispartof	Statistica Sinica	en_US
dc.relation.isbasedon	10.5705/ss.202017.0093	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.title	Clustering in general measurement error models	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	28	en_US
utslib.for	0104 Statistics	en_US
utslib.for	0199 Other Mathematical Sciences	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	open_access	*
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	28	en_US

Abstract:

© Institute of Statistical Science. All rights reserved. This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there were no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/134046