Bayesian model-based clustering procedures

Lau, JW; Green, PJ

Bayesian model-based clustering procedures

Lau, JW Green, PJ

Permalink

Publication Type:: Journal Article
Citation:: Journal of Computational and Graphical Statistics, 2007, 16 (3), pp. 526 - 558
Issue Date:: 2007-09-01

Closed Access

	Filename	Description	Size
	2010002256OK.pdf		2.77 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Lau, JW	en_US
dc.contributor.author	Green, PJ https://orcid.org/0000-0002-4367-4756	en_US
dc.date.issued	2007-09-01	en_US
dc.identifier.citation	Journal of Computational and Graphical Statistics, 2007, 16 (3), pp. 526 - 558	en_US
dc.identifier.issn	1061-8600	en_US
dc.identifier.uri	http://hdl.handle.net/10453/14467
dc.description.abstract	This article establishes a general formulation for Bayesian model-based clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recently discussed methods involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimization of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved by standard software when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic item-swapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The article includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are model-based but ad hoc in their detailed definition. © 2007 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.	en_US
dc.relation.ispartof	Journal of Computational and Graphical Statistics	en_US
dc.relation.isbasedon	10.1198/106186007X238855	en_US
dc.subject.classification	Statistics & Probability	en_US
dc.title	Bayesian model-based clustering procedures	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	16	en_US
utslib.for	0104 Statistics	en_US
utslib.for	1403 Econometrics	en_US
dc.location.activity	ISI:000249591000002	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	16	en_US

Abstract:

This article establishes a general formulation for Bayesian model-based clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recently discussed methods involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimization of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved by standard software when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic item-swapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The article includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are model-based but ad hoc in their detailed definition. © 2007 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/14467