Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Online Clustering with Bayesian Nonparametrics

Scherreik, Matthew D.

Abstract Details

2020, Doctor of Philosophy (PhD), Wright State University, Electrical Engineering.
Clustering algorithms, such as Gaussian mixture models and K-means, often require the number of clusters to be specified a priori. Bayesian nonparametric (BNP) methods avoid this problem by specifying a prior distribution over the cluster assignments that allows the number of clusters to be inferred from the data. This can be especially useful for online clustering tasks, where data arrives in a continuous stream and the number of clusters may dynamically change over time. Classical BNP priors often overestimate the number of clusters, however, leading researchers to develop new priors with more control over this tendency. To date, BNP algorithms resistant to over-clustering have only been implemented for offline processing, utilizing Markov chain Monte Carlo inference. In this dissertation, we derive a novel algorithm for online BNP clustering using variational inference, with explicit control over the over-clustering phenomenon. Additionally, we propose two methods for tuning a critical hyperparameter mid-stream, based on empirical analysis of the BNP cluster assignment prior and a cost function from Gaussian mixture reduction. We demonstrate the effectiveness of our algorithms on dynamic datasets designed specifically to challenge online BNP clustering algorithms. We also show that our algorithms can be employed for practical applications of radar pulse clustering and neural spike sorting, achieving competitive—and often superior—results when compared to classical BNP methods. Furthermore, we exploit the model-based framework to extend our algorithm and tuning methods from purely Gaussian mixtures to handle data with mixed multivariate Gaussian and categorical type, and demonstrate this new extension on real-world data. Our empirical studies indicate that the developments in this dissertation are a significant contribution to the state of the art in BNP clustering.
Brian Rigling, Ph.D. (Advisor)
Fred Garber, Ph.D. (Committee Member)
Arnab Shaw, Ph.D. (Committee Member)
Joshua Ash, Ph.D. (Committee Member)
John Gallagher, Ph.D. (Committee Member)
132 p.

Recommended Citations

Citations

  • Scherreik, M. D. (2020). Online Clustering with Bayesian Nonparametrics [Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1610711743492959

    APA Style (7th edition)

  • Scherreik, Matthew. Online Clustering with Bayesian Nonparametrics. 2020. Wright State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1610711743492959.

    MLA Style (8th edition)

  • Scherreik, Matthew. "Online Clustering with Bayesian Nonparametrics." Doctoral dissertation, Wright State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1610711743492959

    Chicago Manual of Style (17th edition)