Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Comparison and Application of Probabilistic Clustering Methods for System Improvement Prioritization

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Industrial and Systems Engineering.
We compare probabilistic clustering methods for analyzing unstructured text or images relevant to prioritizing system improvement actions. Such system improvement activities require an awareness of the entire corpus or set of documents such as transcripts of phone conversations or images. For example, a manager trying to improve the performance of a call center might want to quantitatively understand what the fractions of calls are of a set of types (cluster or topic proportions) and what those types are including the phrases associated phrases (cluster or topic definitions). If a sizable fraction of conversations, e.g., 15%, were using unapproved language, there could be a high priority on implementing standardization or training to reduce cost and improve customer satisfaction related to the identified cluster or topic. We argue that such prioritization could be best understood only if proportions and definitions of all of the clusters or topics can be accounted for accurately. The goal of accurate accounting for the entire corpus is different from information retrieval goals. Information retrieval relates to identifying specific documents of interest in specific queries. As a result, our comparison is based on “ground truth” models of four entire corpora and four measures of distribution fitting accuracy. Yet, the literature on numerical and case study comparisons of probabilistic clustering methods for cases with ground truth standards is lacking. Benefits of comparisons based on ground truth models and given corpora also include the provision of complete examples so that readers can see clearly how different approaches can be applied. Further, using the accuracy of cluster identification permits the comparison of popular methods such as fuzzy clustering together with generative methods such as Bayesian mixture models. This is true as long as we interpret the fuzzy clustering model as a topic model which we do. The resulting “fuzzy topic models” offer demonstrated advantages over latent Dirichlet allocation in repeatability and computational efficiency. These include so-called “topic” models and are generative because they provide a distribution from which entire corpora could be sampled. We provide a numerical study which clarifies the relative accuracy of the probabilistic clustering methods including fuzzy clustering, Principle Component Analysis (PCA) followed by fuzzy clustering, latent Dirichlet allocation (LDA), and the recently proposed Subject Matter Expert Refined Topic (SMERT) Models. We illustrate the application of the methods to the analysis of a call center in the insurance industry. We also illustrate how prioritization-related information can be derived from the corpus with documents. We also provide documentation of how relevant probabilistic clustering methods can be applied.
Theodore Allen (Advisor)
Cathy Xia (Committee Member)
Clark Mount-Campbell (Committee Member)
Bruce Patton (Committee Member)
76 p.

Recommended Citations

Citations

  • Lee, S. H. (2012). Comparison and Application of Probabilistic Clustering Methods for System Improvement Prioritization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339766563

    APA Style (7th edition)

  • Lee, Soo Ho. Comparison and Application of Probabilistic Clustering Methods for System Improvement Prioritization. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1339766563.

    MLA Style (8th edition)

  • Lee, Soo Ho. "Comparison and Application of Probabilistic Clustering Methods for System Improvement Prioritization." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339766563

    Chicago Manual of Style (17th edition)