Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
40478.pdf (944.97 KB)
ETD Abstract Container
Abstract Header
Topology Preserving Data Reductions for Computing Persistent Homology
Author Info
Sens, Aaron M
ORCID® Identifier
http://orcid.org/0000-0001-6282-5743
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018
Abstract Details
Year and Degree
2021, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
Abstract
Topological Data Analysis (TDA) is a relatively recent development in the field of data mining that extends algebraic topology into the realm of computational data mining. TDA has unique advantages compared to other data mining techniques, namely that it is noise resistant and that it generates unique identifiers in the form of persistence intervals for any d-dimensional numerical data set where d>1. These data sets are commonly referred to as point clouds. Persistent homology is a one of the most widely used methods of TDA. Persistent homology is a computational process that creates a multi-dimensional topological representation of an input point cloud. Computing persistent homology is exponential in time and memory relative to the size of the input point cloud. TDA with persistent homology's primary drawback from being widely adopted as a data mining technique is its constraints on point cloud size and dimension. Due to the exponential nature of computing persistent homology, state of the art tools are currently limited to a few thousand points in 3 dimensions and smaller point clouds in higher dimensions (unless the computation is somehow limited to lower dimensional homologies and bounded connectivity distances). This thesis explores techniques to sample larger point clouds to enable the computation of persistent homology. Several techniques to sample large point clouds have been explored. Studies to sample using random or heuristic methods are well established. More recently, the k-means++ clustering algorithm has been used to create reduced mappings (samplings) of the data. These approaches use the k-means++ algorithm to locate a collection of nano-clusters with their centroids being used/extracted to represent the data by a smaller set of points. The number of nano-clusters requested is determined to be of a size such that persistent homology can be computed. While it is known that k-means++ clustering can be used in this way and that k-means++ sampling performs reasonably well to preserve the larger, significant topological features found in a large point cloud, it is unclear if other clustering techniques would provide better mappings to preserve the topological features in the original data. This thesis will experimentally evaluate a number of different methods for extracting a representative sample from large point clouds. A performance comparison of persistent homology output from sampling methods, namely: (i) random, (ii) heuristic, and (iii) several data clustering methods is performed. The specific data clustering algorithms included in this study are: k-means++, agglomerative clustering (Single linkage and Ward linkage) and HDBSCAN. In the experiments, it is shown that at medium levels of reduction agglomerative ward clustering and k-means++ clustering outperform other data reduction methods, especially the heuristic methods of random sampling and Gaussian mixture modeling.
Committee
Philip Wilsey, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
Pages
70 p.
Subject Headings
Computer Science
Keywords
data mining
;
topological data analysis
;
persistent homology
;
clustering algorithms
;
TDA
;
functors
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sens, A. M. (2021).
Topology Preserving Data Reductions for Computing Persistent Homology
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018
APA Style (7th edition)
Sens, Aaron.
Topology Preserving Data Reductions for Computing Persistent Homology.
2021. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018.
MLA Style (8th edition)
Sens, Aaron. "Topology Preserving Data Reductions for Computing Persistent Homology." Master's thesis, University of Cincinnati, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1627658247850018
Download Count:
191
Copyright Info
© 2021, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.