Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Computation and Application of Persistent Homology on Streaming Data

Abstract Details

2020, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Persistent homology is a robust and powerful tool for data analysis that provides information about topological properties of data. Given a set of data points, persistent homology computes a set of intervals or lifespans of certain topological structures that appear and subsequently disappear through a sequence of nested spaces constructed on the data points. The topological structures and their lifespans enable the segregation of meaningful patterns from noise and often lead to the discovery of insight not discernible by conventional methods of data mining. The capabilities of persistent homology come at the cost of its high space complexity that grows exponentially with the number of input data points. This dissertation examines the application of persistent homology to streaming data, two important but disjoint areas of data science that never crossed paths before. The intensive computational requirements of persistent homology coupled with the unique challenges of dealing with a potentially infinite sequence of data objects in a stream are the primary reasons why persistent homology has not yet been applied to data stream mining. The dissertation proposes two general-purpose frameworks or models, called the microcluster model and the sliding-window model, for computing persistent homology on streaming data. Consistent with the standard computational paradigm for processing data streams, each of the models is organized into online and offline components. The online component maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The internal difference between the two models lies in the data structure used to maintain the summary of the stream during the online component. While the microcluster model employs statistics related to the weighted sum of the data vectors, the sliding-window model uses a topological structure that comprises sets of generalized geometric objects as the summary of the stream. As for the applications, the microcluster model and the sliding-window model are best suited for two different types of streams. The microcluster model is ideal for data streams that show a steady and gradual concept drift. In contrast, the sliding-window model has the capability of identifying abrupt changes that occur for very short duration, and thus is suitable for developing change detection mechanisms and surveillance systems. The computational models are validated on two important real-world applications: network anomaly detection, and identification of reticulate genomic exchanges during the evolution of species. It is shown that the models accurately and efficiently detect changes in evolving data streams while discovering knowledge not available to classical methods of data mining.
Philip Wilsey, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Raj Bhatnagar, Ph.D. (Committee Member)
Brian Gettelfinger, Ph.D. (Committee Member)
Badri Vellambi Ravisankar, Ph.D. (Committee Member)
91 p.

Recommended Citations

Citations

  • Moitra, A. (2020). Computation and Application of Persistent Homology on Streaming Data [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613686214764863

    APA Style (7th edition)

  • Moitra, Anindya. Computation and Application of Persistent Homology on Streaming Data. 2020. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613686214764863.

    MLA Style (8th edition)

  • Moitra, Anindya. "Computation and Application of Persistent Homology on Streaming Data." Doctoral dissertation, University of Cincinnati, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613686214764863

    Chicago Manual of Style (17th edition)