Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Anchored Bayesian Gaussian Mixture Models

Kunkel, Deborah Elizabeth

Abstract Details

2018, Doctor of Philosophy, Ohio State University, Statistics.
Finite Bayesian mixture models are often used to describe data arising from a heterogeneous population. If information is available about the differences among the groups represented by the mixture components, the model should explicitly incorporate that knowledge through informative distributional assumptions. If no such information is available, however, it is common to specify a fully exchangeable model, where, a priori, all components play the same role. A consequence of the exchangeable specification is the label-switching phenomenon, by which the mixture components can be relabeled arbitrarily without changing the posterior distribution of the model parameters. Label-switching makes direct marginal inference on features of the mixture components impossible and limits the interpretability of the model in applications where one is interested in discovering if and how the mixture components correspond to meaningful subgroups in the population. It is common practice to either prevent label-switching by imposing prior constraints on model parameters or ``undo'' label-switching with post-processing algorithms applied to the Markov chain Monte Carlo (MCMC) output from the exchangeable model. The former approach can be too restrictive and appropriate prior constraints are often difficult to specify. The latter approach does not correspond to any clearly-defined probability model, so any marginal features described using post-processed samples are consequences of the chosen algorithm, not the model itself. This work presents a model-based approach to the resolution of the label-switching phenomenon which treats a small number of observations (called the ``anchor points'') as pre-labeled. This results in a well-defined probability model (the ``anchor model'') that imposes a unique labeling on the mixture components but requires no prior knowledge of the components' relative locations or scales. Several basic properties of the anchor model are derived. These properties depend heavily on the choice of anchor points, the best of which is not obvious because of the large number of possible combinations. Two computationally feasible approaches for selecting anchor points are presented that promote separation among the marginal posterior distributions of the component-specific parameters, a property that is closely associated with the model's goodness of fit. The first approach seeks to find anchor points that maximize the prior information about the component labeling induced by the anchor points. The second approach focuses on producing unimodal posterior distributions for the component-specific parameters by finding an anchor model that maximizes a lower bound on the posterior density at a local mode. The performance of the model is demonstrated on examples of real and simulated data and compared to popular relabeling methods.
Mario Peruggia (Advisor)
Steven MacEachern (Committee Member)
Xinyi Xu (Committee Member)
Yunzhang Zhu (Committee Member)
129 p.

Recommended Citations

Citations

  • Kunkel, D. E. (2018). Anchored Bayesian Gaussian Mixture Models [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524134234501475

    APA Style (7th edition)

  • Kunkel, Deborah. Anchored Bayesian Gaussian Mixture Models. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1524134234501475.

    MLA Style (8th edition)

  • Kunkel, Deborah. "Anchored Bayesian Gaussian Mixture Models." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524134234501475

    Chicago Manual of Style (17th edition)