Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Geometric Methods for Simplification and Comparison of Data Sets

Singhal, Kritika

Abstract Details

2020, Doctor of Philosophy, Ohio State University, Mathematics.
Finite metric spaces are increasingly being used to model data sets that come equipped with a dissimilarity measure. Some examples of such data sets are DNA sequences, news articles, social networks and feature vectors in machine learning applications. Due to the ever increasing volume of data, the simplification of finite metric spaces is a fundamental problem in data analysis. In this thesis, new methods of simplification of compact metric spaces are introduced. There are three primary concepts used for simplification - fractal dimension, clustering and representative subspaces. The first concept is that of fractal dimension of a finite metric space. The effect of fractal dimension on the computational complexity of geometric problems such as the Traveling Salesman Problem, and the Independent Set of Unit Balls problem is studied. The running time lower bounds for these problems on metric spaces with non-integer dimension are computed and it is shown that these bounds nearly match previously known upper bounds. The second notion that is used for simplification of a compact metric space is clustering. Precisely, a new method for approximating a compact metric space with a k-point metric space is introduced. It is shown that approximating with k points is equivalent to certain notions of clustering of the input metric space into k blocks. Similar results are obtained for metric measure spaces. Using the equivalence results for metric spaces and metric measure spaces, polynomial time algorithms for computing the k-point approximations are developed. Another method of simplification of compact metric spaces is developed through identifying finite representative subspaces of the input metric space. The subspaces are representative because they store the entire metric information of the original metric space, up to a loss of epsilon. As a result, this method identifies a finite compression of the input metric space. A polynomial time exact algorithm is developed for computing a finite compression of smallest size, with zero loss, and a polynomial time approximation algorithm is developed for computing a finite compression of smallest size, with loss of epsilon greater than zero. Furthermore, a method for reconstructing the original metric space from the compressed metric space is described. Two transforms of the distance function of a metric measure space are also studied. These transforms are referred to as the Mass Transform and the Hyperbolicity Transform, as their definition uses sets of large mass and small hyperbolicity respectively. The theoretical and computational properties of these transforms are studied. Finally, new lower bounds on the Gromov-Hausdorff distance between compact metric spaces are computed using some of the simplification methods. New lower bounds on this distance are also computed using persistent homology and curvature sets of the input metric spaces.
Facundo Memoli (Advisor)
Anastasios Sidiropoulos (Advisor)
Tamal Dey (Committee Member)
David Sivakoff (Committee Member)
265 p.

Recommended Citations

Citations

  • Singhal, K. (2020). Geometric Methods for Simplification and Comparison of Data Sets [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587253879303425

    APA Style (7th edition)

  • Singhal, Kritika. Geometric Methods for Simplification and Comparison of Data Sets. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1587253879303425.

    MLA Style (8th edition)

  • Singhal, Kritika. "Geometric Methods for Simplification and Comparison of Data Sets." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587253879303425

    Chicago Manual of Style (17th edition)