Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Clustering Consistently

Eldridge, Justin, Eldridge

Abstract Details

2017, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Clustering is the task of organizing data into natural groups, or clusters. A central goal in developing a theory of clustering is the derivation of correctness guarantees which ensure that clustering methods produce the right results. In this dissertation, we analyze the setting in which the data are sampled from some underlying probability distribution. In this case, an algorithm is "correct" (or consistent) if, given larger and larger data sets, its output converges in some sense to the ideal cluster structure of the distribution. In the first part, we study the setting in which data are drawn from a probability density supported on a subset of a Euclidean space. The natural cluster structure of the density is captured by the so-called high density cluster tree, which is due to Hartigan (1981). Hartigan introduced a notion of convergence to the density cluster tree, and recent work by Chaudhuri and Dasgupta (2010) and Kpotufe and von Luxburg (2011) has contructed algorithms which are consistent in this sense. We will show that Hartigan's notion of consistency is in fact not strong enough to ensure that an algorithm recovers the density cluster tree as we would intuitively expect. We identify the precise deficiency which allows this, and introduce a new, stronger notion of convergence which we call consistency in merge distortion. Consistency in merge distortion implies Hartigan's consistency, and we prove that the algorithm of Chaudhuri and Dasgupta (2010) satisfies our new notion. In the sequel, we consider the clustering of graphs sampled from a very general, non-parametric random graph model called a graphon. Unlike in the density setting, clustering in the graphon model is not well-studied. We therefore rigorously analyze the cluster structure of a graphon and formally define the graphon cluster tree. We adapt our notion of consistency in merge distortion to the graphon setting and identify efficient, consistent algorithms.
Mikhail Belkin, PhD (Advisor)
Yusu Wang, PhD (Advisor)
Facundo Mémoli, PhD (Committee Member)
Vincent Vu, PhD (Committee Member)
131 p.

Recommended Citations

Citations

  • Eldridge, Eldridge, J. (2017). Clustering Consistently [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249

    APA Style (7th edition)

  • Eldridge, Eldridge, Justin. Clustering Consistently. 2017. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249.

    MLA Style (8th edition)

  • Eldridge, Eldridge, Justin. "Clustering Consistently." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249

    Chicago Manual of Style (17th edition)