Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Distances within and between Metric Spaces: Metric Geometry, Optimal Transport and Applications to Data Analysis

Abstract Details

2021, Doctor of Philosophy, Ohio State University, Mathematics.
Distance functions are important in data analysis for modelling dissimilarity between data points. Utilizing tools from metric geometry and optimal transport, we study foundations of data analysis by examining various distances between and within metric (measure) spaces. The dissertation can be roughly divided into three parts. In the first part, we study problems related to the computational aspect of the Gromov-Hausdorff distance $d_\mathcal{GH}$. $d_\mathcal{GH}$ is a natural distance for comparing metric spaces which satisfies many pleasing mathematical properties. However, the distance is known to be NP-hard to compute in general. Despite this hardness, we devise a fixed-parameter tractable algorithm for computing $d_\mathcal{GH}$ between ultrametric spaces. In the course of discovering this algorithm, we identify a one parameter family of Gromov-Hausdorff type distances $\{d_{\mathcal{GH},p}\}_{p=1}^\infty$ such that $d_{\mathcal{GH},1}=d_\mathcal{GH}$ and $d_{\mathcal{GH},\infty}$, also denoted by $u_\mathcal{GH}$, is an ultrametric itself on the collection $\mathcal{U}$ of compact ultrametric spaces. Whereas for any $p\in[1,\infty)$, $d_{\mathcal{GH},p}$ is NP-hard to compute, we establish that $u_\mathcal{GH}$ can be computed in polynomial time. We further study geometric properties of the space $(\mathcal{U},u_\mathcal{GH})$ and establish universality of the space. There exist two natural analogues of $d_\mathcal{GH}$ in the case of metric measure spaces, namely, the Gromov-Wasserstein distance and Sturm's Gromov-Wasserstein distance. We consider their ultrametric versions using definitions in analogy to $u_\mathcal{GH}$ and study both of their theoretical and experimental behaviors. In the second part, we establish characterizations for Gromov-Hausdorff type geodesics. More precisely, we prove that every Gromov-Hausdorff geodesic is, in fact, a Hausdorff geodesic. This in turn motivated us to study Hausdorff geodesics. We then provide a complete characterization of Hausdorff geodesics and prove that every Hausdorff geodesic is equivalent to a Hausdorff displacement interpolation, which is a notion we adapted from the famous displacement interpolation in optimal transport. Finally, we extend our study to geodesics between metric measure spaces and provide a characterization for a large family of Sturm's Gromov-Wasserstein geodesics. In the final part, we study an unsupervised learning framework called Wasserstein transform (WT), which permits metric spaces to be iteratively updated via local information in order to denoise data sets or enhance features. We propose several instances under the framework and show that the famous mode seeking mean shift algorithm fits into our framework. Theoretically, we establish stability results for the proposed instances of WT and reveal a connection between WT and the classic Ricci flow. Experimentally, our methods turn out to be useful for resolving the notorious chaining effect from hierarchical clustering and for other important tasks such as image segmentation and natural language processing. This dissertation is written as a comprehensive version of the author’s works [130, 139, 97, 132, 126, 193, 131] during his PhD study.
Facundo Mémoli (Advisor)
David Sivakoff (Committee Member)
Matthew Kahle (Committee Member)
444 p.

Recommended Citations

Citations

  • Wan, Z. (2021). Distances within and between Metric Spaces: Metric Geometry, Optimal Transport and Applications to Data Analysis [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1635979369662852

    APA Style (7th edition)

  • Wan, Zhengchao. Distances within and between Metric Spaces: Metric Geometry, Optimal Transport and Applications to Data Analysis. 2021. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1635979369662852.

    MLA Style (8th edition)

  • Wan, Zhengchao. "Distances within and between Metric Spaces: Metric Geometry, Optimal Transport and Applications to Data Analysis." Doctoral dissertation, Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1635979369662852

    Chicago Manual of Style (17th edition)