Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
proposal.pdf (10.15 MB)
ETD Abstract Container
Abstract Header
PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data
Author Info
Fuhry, David P
ORCID® Identifier
http://orcid.org/0000-0002-7564-1983
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146
Abstract Details
Year and Degree
2015, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
Making sense of, analyzing, and extracting useful information from large and complex data is a grand challenge. A user tasked with meeting this challenge is often befuddled with questions on where and how to begin to understand the relevant characteristics of such data. Recent advances in relational analytics, in particular network analytics, offer key tools for insight into connectivity structure and relationships at both local ("guilt by association") and global (clustering and pattern matching) levels. These tools form the basis of recommender systems, ranking, and learning algorithms of great importance to research and industry alike. However, complex data rarely originate in a format suitable for network analytics, and the transformation of large and typically high-dimensional non-network data to a network is rife with parameterization challenges, as an under- or over-connected network will lead to poor subsequent analysis. Additionally, both network formation and subsequent network analytics become very computationally expensive as network size increases, especially if multiple networks with different connectivity levels are formed in the previous step; scalable approximate solutions are thus a necessity. I present an interactive system called PLASMA-HD to address these challenges. PLASMA-HD builds on recent progress in the fields of locality sensitive hashing, knowledge caching, and graph visualization to provide users with the capability to probe and interrogate the intrinsic structure of data. For an arbitrary dataset (vector, structural, or mixed), and given a similarity or distance measure-of-interest, PLASMA-HD enables an end user to interactively explore the intrinsic connectivity or clusterability of a dataset under different threshold criteria. PLASMA-HD employs and enhances the recently proposed Bayesian Locality Sensitive Hashing (BayesLSH), to efficiently estimate connectivity structure among entities. Unlike previous efforts which operate at a single similarity or distance threshold, PLASMA-HD efficiently enables exploration of network analytics measures across the entire spectrum of similarity thresholds, restoring connectivity context. To inform the user of the nature of the network at each threshold, we introduce efficient network analytics measure estimators ranging from simple local measures like edge & triangle counts using LSH, to complex global measures like betweenness and compressibility using sampling & regression, and "LAM", the first scalable pattern mining algorithm for massive data. To enable rapid and interactive discovery PLASMA-HD provides three key capabilities to maximize user and system responsiveness: 1) Interpretable feedback, by providing visual cues by which the user can make good choices as to the next exploration step, 2) Incremental response, where the system responds quickly with partial results enabled by a flexibly compact yet representative internal data structure, and 3) and knowledge caching, where the system leverages information from previous queries' results to speed up processing. By converting a high dimensional dataset into a graphical (dimensionless) representation, PLASMA-HD then takes advantage of recent advances in graph and sub-graph visualization to provide end users with relevant visual cues to understand the intrinsic structure of the data they are examining.
Committee
Srinivasan Parthasarathy (Advisor)
Arnab Nandi (Committee Member)
P Sadayappan (Committee Member)
Michael Barton (Committee Member)
Pages
181 p.
Subject Headings
Computer Science
Keywords
Network Analytics
;
Graph Analytics
;
High-Dimensional Data
;
Visualization
;
PLASMA-HD
;
Graph Growth
;
Approximate Itemset Mining
;
Parallel Coordinates
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Fuhry, D. P. (2015).
PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146
APA Style (7th edition)
Fuhry, David.
PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data.
2015. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146.
MLA Style (8th edition)
Fuhry, David. "PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data." Doctoral dissertation, Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440431146
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1440431146
Download Count:
441
Copyright Info
© 2015, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.