Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
DTD.pdf (3.01 MB)
ETD Abstract Container
Abstract Header
Querying Graph Structured Data
Author Info
Yang, Lei
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109
Abstract Details
Year and Degree
2015, Doctor of Philosophy, Case Western Reserve University, EECS - Computer and Information Sciences.
Abstract
With the emergence of bioinformatics and social science applications, a large amount of data can be represented as graphs. Thus, there is an increasing interest in developing diverse graph algorithms that operate on large graphs. In this thesis, we discuss diverse graph problems including the pedigree computation problem in the bioinformatics field and a general purpose graph query problem – graph pattern matching problem, which can be applied to social network. First, a pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. The pedigree computation problem studies the evaluation of the inbreeding coefficient of a given individual using Compact Path Encoding (CPE), which is a new compact path encoding scheme for large pedigrees. We design a set of efficient algorithms for identifying paths based on CPE. In addition, we present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. Second, querying large data graphs efficiently with a flexible query pattern is an important research problem due to a wide range of applications with large graph structured data. As the sizes of data graphs become increasingly larger, more efficient techniques are needed to achieve better query performance. In this part, we focus on designing techniques to find all the matches for a user-provided query pattern in a large data graph (graph pattern matching problem) based on subgraph isomorphism. The query pattern used in this work extends the traditional one by supporting paths with distance constraints. To evaluate such query patterns, we propose a hybrid inverted index, and an efficient query evaluation algorithm with join order optimization – JAGUAR (standing for join-based graph pattern querying algorithm). The scalability and efficiency of our proposed methods are demonstrated by detailed experiments on a set of synthetic and real datasets. Moving forward, we improve the scalability of our approach by adapting it to a distributed environment. The extension includes a distributed version of VEJoint & Hubs Index as well as the construction algorithms, a distributed computation model which is customized for the graph pattern matching problem and the new query evaluation algorithm based on the model. As an important feature of our distributed computation model, we significantly reduced the network communication during the query evaluation procedure, which largely improves the query performance. At last, we test the query performance of our distributed algorithm using data graphs as large as 100 million of edges with a cluster of up-to 16 machines. The experimental results demonstrate that with only 16 machines, our algorithm can evaluate the query pattern in tens of milliseconds.
Committee
Meral Ozsoyoglu (Advisor)
Jing Li (Committee Member)
Mehmet Koyuturk (Committee Member)
Erkki Somersalo (Committee Member)
Pages
196 p.
Subject Headings
Computer Science
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Yang, L. (2015).
Querying Graph Structured Data
[Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109
APA Style (7th edition)
Yang, Lei.
Querying Graph Structured Data.
2015. Case Western Reserve University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109.
MLA Style (8th edition)
Yang, Lei. "Querying Graph Structured Data." Doctoral dissertation, Case Western Reserve University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
case1410434109
Download Count:
551
Copyright Info
© 2014, all rights reserved.
This open access ETD is published by Case Western Reserve University School of Graduate Studies and OhioLINK.