Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data

Abstract Details

2015, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Making sense of, analyzing, and extracting useful information from large and complex data is a grand challenge. A user tasked with meeting this challenge is often befuddled with questions on where and how to begin to understand the relevant characteristics of such data. Recent advances in relational analytics, in particular network analytics, offer key tools for insight into connectivity structure and relationships at both local ("guilt by association") and global (clustering and pattern matching) levels. These tools form the basis of recommender systems, ranking, and learning algorithms of great importance to research and industry alike. However, complex data rarely originate in a format suitable for network analytics, and the transformation of large and typically high-dimensional non-network data to a network is rife with parameterization challenges, as an under- or over-connected network will lead to poor subsequent analysis. Additionally, both network formation and subsequent network analytics become very computationally expensive as network size increases, especially if multiple networks with different connectivity levels are formed in the previous step; scalable approximate solutions are thus a necessity. I present an interactive system called PLASMA-HD to address these challenges. PLASMA-HD builds on recent progress in the fields of locality sensitive hashing, knowledge caching, and graph visualization to provide users with the capability to probe and interrogate the intrinsic structure of data. For an arbitrary dataset (vector, structural, or mixed), and given a similarity or distance measure-of-interest, PLASMA-HD enables an end user to interactively explore the intrinsic connectivity or clusterability of a dataset under different threshold criteria. PLASMA-HD employs and enhances the recently proposed Bayesian Locality Sensitive Hashing (BayesLSH), to efficiently estimate connectivity structure among entities. Unlike previous efforts which operate at a single similarity or distance threshold, PLASMA-HD efficiently enables exploration of network analytics measures across the entire spectrum of similarity thresholds, restoring connectivity context. To inform the user of the nature of the network at each threshold, we introduce efficient network analytics measure estimators ranging from simple local measures like edge & triangle counts using LSH, to complex global measures like betweenness and compressibility using sampling & regression, and "LAM", the first scalable pattern mining algorithm for massive data. To enable rapid and interactive discovery PLASMA-HD provides three key capabilities to maximize user and system responsiveness: 1) Interpretable feedback, by providing visual cues by which the user can make good choices as to the next exploration step, 2) Incremental response, where the system responds quickly with partial results enabled by a flexibly compact yet representative internal data structure, and 3) and knowledge caching, where the system leverages information from previous queries' results to speed up processing. By converting a high dimensional dataset into a graphical (dimensionless) representation, PLASMA-HD then takes advantage of recent advances in graph and sub-graph visualization to provide end users with relevant visual cues to understand the intrinsic structure of the data they are examining.
Srinivasan Parthasarathy (Advisor)
Arnab Nandi (Committee Member)
P Sadayappan (Committee Member)
Michael Barton (Committee Member)
181 p.

Recommended Citations

Citations

  • Fuhry, D. P. (2015). PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146

    APA Style (7th edition)

  • Fuhry, David. PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data. 2015. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146.

    MLA Style (8th edition)

  • Fuhry, David. "PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data." Doctoral dissertation, Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146

    Chicago Manual of Style (17th edition)