Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Non-parametric Clustering and Topic Modeling via Small Variance Asymptotics with Local Search

Singh, Siddharth

Abstract Details

2013, Master of Science, Ohio State University, Computer Science and Engineering.
Clustering of data has been a very well studied topic in the Machine Learning community, with various different methods trying to solve the same problem of grouping similar objects together. Traditional approaches have been algorithmically simpler and easier to implement with reasonable results. More recently, algorithms derived from asymptotics on Bayesian Non-parametric Infinite Mixture Models have appeared as an alternate. These algorithms in general have pointed at a very clear relation between probabilistic methods like Expectation Maximization, and hard assignment based algorithms like K-Means. They provide both the flexibility of a Bayesian Non-parametric model and scalability of hard clustering algorithms like K-Means. Aysmptotics on further complex mixture models have been used to derive algorithmsthat resemble hierarchical clustering and hard Topic Modeling. Although these new algorithms are highly scalable and they open a new dimension in modeling data based on different similarity measures, they still suffer from problems traditionally seen in any optimization based method, like that of local optima. Also, being non-parametric in nature, parameters like number of clusters are not fixed upfront in these algorithms. This leads to a new problem of choosing the right set of initial values for the parameters existing in the model. This work primarily addresses the issues of local optima and how to reject sub-optimal solutions and get better solutions, and also the initialization of the values of parameters present in the model.We achieve this by adding certain new steps to the existing algorithm, and at last we quantitatively verify the improvements. Our focus would be mostly on the K-Means like algorithm derived from asymptotics on the Dirichlet Process Mixture Model of infinite Gaussians, and its extension to hierarchies via the Hierarchical Dirchlet Process. We will also focus on a similar mixture model more generalized by replacing Gaussians with the Exponential family of distributions. The algorithm derived from its hierarchical version resembles hard Topic Modeling in a special case, which is one of our main areas of focus.
Brian Kulis (Advisor)
Eric Fosler-Lussier (Committee Member)
73 p.

Recommended Citations

Citations

  • Singh, S. (2013). Non-parametric Clustering and Topic Modeling via Small Variance Asymptotics with Local Search [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1376954935

    APA Style (7th edition)

  • Singh, Siddharth. Non-parametric Clustering and Topic Modeling via Small Variance Asymptotics with Local Search. 2013. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1376954935.

    MLA Style (8th edition)

  • Singh, Siddharth. "Non-parametric Clustering and Topic Modeling via Small Variance Asymptotics with Local Search." Master's thesis, Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1376954935

    Chicago Manual of Style (17th edition)