Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
thesis.pdf (3.4 MB)
ETD Abstract Container
Abstract Header
Computing Topological Features for Data Analysis
Author Info
Shi, Dayu
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1512079255367232
Abstract Details
Year and Degree
2017, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
Topological data analysis (TDA) provides a new methodology to data analysis problems. It captures intrinsic topological structures in data, which can then offer useful guidelines for other data analysis approaches. One main task in TDA is to extract topological features from data. And this is often done through one of the main tools of TDA, persistent homology, or simply, persistence. While the idea of persistent homology has been demonstrated valuable in a broad range of areas, its application to large scale real-world data is limited by the efficiency of existing persistence computation approaches. I will present a focused study during my PhD research on broadening applicability of the idea of persistence in data analysis in two fronts, to explore novel ways of applying persistent homology for qualitative data analysis and to study the computational issues related to persistent homology, so as to significantly broaden their applications to large-scale low or high dimensional data analysis. In the first direction, we applied persistent homology to a special kind of data, called \emph{metric graphs}. A metric graph offers one of the simplest yet still meaningful ways to represent the non-linear structure hidden behind the data. Thus, comparing two graphs has many important applications in data analysis. In this work, we proposed a new distance between two finite metric graphs, called the \spdist{} distance, which formulates the graph matching problem through persistent homology. Our \spdist{} distance has two properties not shared by previous methods: First, it is stable against the perturbations of the input graph metrics. Second, it is a \emph{continuous} distance measure, in the sense that it is defined on an alignment of the underlying spaces of input graphs, instead of merely their nodes. This makes our \spdist{} distance robust against, for example, different discretizations of the same underlying graph. Despite considering the input graphs as continuous spaces, we showed that we can still compute the \spdist{} distance in polynomial time. The time complexity for the discrete case where only graph nodes are considered is much faster. We also provided some preliminary experimental results to demonstrate the use of the new distance measure. The software is publicly available at \cite{comptogo}. Besides this graph comparison work, we also did another preliminary work of studying the concept of \emph{Multiscale Mapper}. We implemented a basic software and applied it to some volumetric data to get better understanding of the \emph{Multiscale Mapper}. In the second part, we consider the more general case, high-dimensional point cloud data. To extract topological features of a point cloud data sampled from a metric space, a sequence of Rips complexes built on $P$ indexed by a scale parameter is mostly used. Unfortunately, even for input with moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the \emph{Sparse Rips filtration} introduced by Sheehy \cite{S2012}, some existing methods aim to reduce the size of the complex so as to improve the time efficiency as well. However, existing approaches still fall short of scaling well, especially for high dimensional data. In this work, we investigated the advantages and limitations of existing approaches. Based on insights gained from the experiments, we proposed an efficient new algorithm, called \emph{SimBa}, for approximating the persistent homology of Rips filtrations with quality guarantees. Our new algorithm leverages a batch collapse strategy as well as a new sparse Rips-like filtration. We experimented with a variety of low and high dimensional data sets. We showed that our strategy presents a significant size reduction, and our algorithm for approximating Rips filtration persistence is an order of magnitude faster than existing methods in practice. In addition, we also implemented SimPers which is the only available software to compute persistence for simplicial maps.
Committee
Tamal Dey, Dr. (Advisor)
Wang Yusu, Dr. (Advisor)
Wenger Rephael, Dr. (Committee Member)
Bozanic Zahn, Dr. (Committee Member)
Pages
123 p.
Subject Headings
Computer Science
Keywords
Computational Topology, Topological Data Analysis, Persistent Homology
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Shi, D. (2017).
Computing Topological Features for Data Analysis
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512079255367232
APA Style (7th edition)
Shi, Dayu.
Computing Topological Features for Data Analysis.
2017. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1512079255367232.
MLA Style (8th edition)
Shi, Dayu. "Computing Topological Features for Data Analysis." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512079255367232
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1512079255367232
Download Count:
684
Copyright Info
© 2017, some rights reserved.
Computing Topological Features for Data Analysis by Dayu Shi is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.