Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Topology Preserving Data Reductions for Computing Persistent Homology

Abstract Details

2021, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
Topological Data Analysis (TDA) is a relatively recent development in the field of data mining that extends algebraic topology into the realm of computational data mining. TDA has unique advantages compared to other data mining techniques, namely that it is noise resistant and that it generates unique identifiers in the form of persistence intervals for any d-dimensional numerical data set where d>1. These data sets are commonly referred to as point clouds. Persistent homology is a one of the most widely used methods of TDA. Persistent homology is a computational process that creates a multi-dimensional topological representation of an input point cloud. Computing persistent homology is exponential in time and memory relative to the size of the input point cloud. TDA with persistent homology's primary drawback from being widely adopted as a data mining technique is its constraints on point cloud size and dimension. Due to the exponential nature of computing persistent homology, state of the art tools are currently limited to a few thousand points in 3 dimensions and smaller point clouds in higher dimensions (unless the computation is somehow limited to lower dimensional homologies and bounded connectivity distances). This thesis explores techniques to sample larger point clouds to enable the computation of persistent homology. Several techniques to sample large point clouds have been explored. Studies to sample using random or heuristic methods are well established. More recently, the k-means++ clustering algorithm has been used to create reduced mappings (samplings) of the data. These approaches use the k-means++ algorithm to locate a collection of nano-clusters with their centroids being used/extracted to represent the data by a smaller set of points. The number of nano-clusters requested is determined to be of a size such that persistent homology can be computed. While it is known that k-means++ clustering can be used in this way and that k-means++ sampling performs reasonably well to preserve the larger, significant topological features found in a large point cloud, it is unclear if other clustering techniques would provide better mappings to preserve the topological features in the original data. This thesis will experimentally evaluate a number of different methods for extracting a representative sample from large point clouds. A performance comparison of persistent homology output from sampling methods, namely: (i) random, (ii) heuristic, and (iii) several data clustering methods is performed. The specific data clustering algorithms included in this study are: k-means++, agglomerative clustering (Single linkage and Ward linkage) and HDBSCAN. In the experiments, it is shown that at medium levels of reduction agglomerative ward clustering and k-means++ clustering outperform other data reduction methods, especially the heuristic methods of random sampling and Gaussian mixture modeling.
Philip Wilsey, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
70 p.

Recommended Citations

Citations

  • Sens, A. M. (2021). Topology Preserving Data Reductions for Computing Persistent Homology [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018

    APA Style (7th edition)

  • Sens, Aaron. Topology Preserving Data Reductions for Computing Persistent Homology. 2021. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018.

    MLA Style (8th edition)

  • Sens, Aaron. "Topology Preserving Data Reductions for Computing Persistent Homology." Master's thesis, University of Cincinnati, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018

    Chicago Manual of Style (17th edition)