Skip to Main Content
 

Global Search Box

 
 
 
 

Files

File List

ETD Abstract Container

Abstract Header

Querying Graph Structured Data

Abstract Details

2015, Doctor of Philosophy, Case Western Reserve University, EECS - Computer and Information Sciences.
With the emergence of bioinformatics and social science applications, a large amount of data can be represented as graphs. Thus, there is an increasing interest in developing diverse graph algorithms that operate on large graphs. In this thesis, we discuss diverse graph problems including the pedigree computation problem in the bioinformatics field and a general purpose graph query problem – graph pattern matching problem, which can be applied to social network. First, a pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. The pedigree computation problem studies the evaluation of the inbreeding coefficient of a given individual using Compact Path Encoding (CPE), which is a new compact path encoding scheme for large pedigrees. We design a set of efficient algorithms for identifying paths based on CPE. In addition, we present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. Second, querying large data graphs efficiently with a flexible query pattern is an important research problem due to a wide range of applications with large graph structured data. As the sizes of data graphs become increasingly larger, more efficient techniques are needed to achieve better query performance. In this part, we focus on designing techniques to find all the matches for a user-provided query pattern in a large data graph (graph pattern matching problem) based on subgraph isomorphism. The query pattern used in this work extends the traditional one by supporting paths with distance constraints. To evaluate such query patterns, we propose a hybrid inverted index, and an efficient query evaluation algorithm with join order optimization – JAGUAR (standing for join-based graph pattern querying algorithm). The scalability and efficiency of our proposed methods are demonstrated by detailed experiments on a set of synthetic and real datasets. Moving forward, we improve the scalability of our approach by adapting it to a distributed environment. The extension includes a distributed version of VEJoint & Hubs Index as well as the construction algorithms, a distributed computation model which is customized for the graph pattern matching problem and the new query evaluation algorithm based on the model. As an important feature of our distributed computation model, we significantly reduced the network communication during the query evaluation procedure, which largely improves the query performance. At last, we test the query performance of our distributed algorithm using data graphs as large as 100 million of edges with a cluster of up-to 16 machines. The experimental results demonstrate that with only 16 machines, our algorithm can evaluate the query pattern in tens of milliseconds.
Meral Ozsoyoglu (Advisor)
Jing Li (Committee Member)
Mehmet Koyuturk (Committee Member)
Erkki Somersalo (Committee Member)
196 p.

Recommended Citations

Citations

  • Yang, L. (2015). Querying Graph Structured Data [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109

    APA Style (7th edition)

  • Yang, Lei. Querying Graph Structured Data. 2015. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109.

    MLA Style (8th edition)

  • Yang, Lei. "Querying Graph Structured Data." Doctoral dissertation, Case Western Reserve University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1410434109

    Chicago Manual of Style (17th edition)