Topology Preserving Data Reductions for Computing Persistent Homology

Sens, Aaron M

Keyword Search

School Logo

40478.pdf (944.97 KB)

Topology Preserving Data Reductions for Computing Persistent Homology

Author Info

Sens, Aaron M

ORCID® Identifier

http://orcid.org/0000-0001-6282-5743

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018

Year and Degree

2021, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.

Abstract

Topological Data Analysis (TDA) is a relatively recent development in the field of data mining that extends algebraic topology into the realm of computational data mining. TDA has unique advantages compared to other data mining techniques, namely that it is noise resistant and that it generates unique identifiers in the form of persistence intervals for any d-dimensional numerical data set where d>1. These data sets are commonly referred to as point clouds. Persistent homology is a one of the most widely used methods of TDA. Persistent homology is a computational process that creates a multi-dimensional topological representation of an input point cloud. Computing persistent homology is exponential in time and memory relative to the size of the input point cloud. TDA with persistent homology's primary drawback from being widely adopted as a data mining technique is its constraints on point cloud size and dimension. Due to the exponential nature of computing persistent homology, state of the art tools are currently limited to a few thousand points in 3 dimensions and smaller point clouds in higher dimensions (unless the computation is somehow limited to lower dimensional homologies and bounded connectivity distances). This thesis explores techniques to sample larger point clouds to enable the computation of persistent homology. Several techniques to sample large point clouds have been explored. Studies to sample using random or heuristic methods are well established. More recently, the k-means++ clustering algorithm has been used to create reduced mappings (samplings) of the data. These approaches use the k-means++ algorithm to locate a collection of nano-clusters with their centroids being used/extracted to represent the data by a smaller set of points. The number of nano-clusters requested is determined to be of a size such that persistent homology can be computed. While it is known that k-means++ clustering can be used in this way and that k-means++ sampling performs reasonably well to preserve the larger, significant topological features found in a large point cloud, it is unclear if other clustering techniques would provide better mappings to preserve the topological features in the original data. This thesis will experimentally evaluate a number of different methods for extracting a representative sample from large point clouds. A performance comparison of persistent homology output from sampling methods, namely: (i) random, (ii) heuristic, and (iii) several data clustering methods is performed. The specific data clustering algorithms included in this study are: k-means++, agglomerative clustering (Single linkage and Ward linkage) and HDBSCAN. In the experiments, it is shown that at medium levels of reduction agglomerative ward clustering and k-means++ clustering outperform other data reduction methods, especially the heuristic methods of random sampling and Gaussian mixture modeling.

Committee

Philip Wilsey, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)

Pages

70 p.

Subject Headings

Computer Science

Keywords

data mining; topological data analysis; persistent homology; clustering algorithms; TDA; functors

Sens, A. M. (2021). Topology Preserving Data Reductions for Computing Persistent Homology [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018
APA Style (7th edition)
Sens, Aaron. Topology Preserving Data Reductions for Computing Persistent Homology. 2021. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018.
MLA Style (8th edition)
Sens, Aaron. "Topology Preserving Data Reductions for Computing Persistent Homology." Master's thesis, University of Cincinnati, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658247850018
Chicago Manual of Style (17th edition)

Document number:

ucin1627658247850018

Download Count:

191

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Topology Preserving Data Reductions for Computing Persistent Homology

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Topology Preserving Data Reductions for Computing Persistent Homology

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations