Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Directed Graph Analysis: Algorithms and Applications

Abstract Details

2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this thesis, we first address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. We then apply our breaking cycles technique to address the problem of Question Difficulty and Expertise Estimation (QDEE) in Community Question Answer (CQA) sites such as Yahoo! Answers and Stack Overflow. Our framework QDEE tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within the said model. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such collaborative CQA systems. Each node in the competition graph will be assigned by QDEE a scalar value to represent its skill. The learned skills of question nodes are question difficulty scores, and the learned skills of user nodes are their expertise scores. However, the question difficulty and user expertise can vary on different topics. In Stack Exchange sites, each question can be assigned multi-tags to represent its most relevant topics. Hence using a solely scalar value to represent question difficulty level or user expertise is not thorough. We then propose to learn latent representations for question difficulty (question node) and user expertise (user node) by directed graph embedding with asymmetric transitivity preserving (ATP). Asymmetric transitivity is an essential property of directed graphs since it can play an important role in downstream graph inference and analysis. Maintaining such properties, while reducing the graph to a lower-dimensional vector space, has been the focus of much recent research. In this thesis, we propose a scalable methodology for embedding directed graphs with the goal of preserving such properties. The technique incorporates graph hierarchy and reachability information naturally by relying on a non-linear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. However, ATP can only learn embedding for nodes which are already in the graph, which undermines its application in cold- start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. In this thesis, we propose ColdRoute to handle the task of routing cold questions posted by new or existing askers to suitable experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach with well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (precision$@1$, accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5\%$,$31.84\%$, and $40.36\%$ for cold questions posted by existing askers, and $123.1\%$, $27.03\%$, and $34.81\%$ for cold questions posted by new askers respectively. Finally, we propose to solve the cold question routing problem by marrying the merits of the above two types of methods ColdRoute and ATP. On one hand, we use graph embedding techniques to encode useful higher-order graph structure information, and on the other hand, we enable the incorporation of heterogeneous information (e.g. textual) in the node embedding as well. Comparing with the directed graph constructed in {\em ATP}, our proposed method allows more flexible node interactions and can enrich the question/user node embeddings with the textual information from question tags. And it's able to route newly posted questions no matter posted by existing askers or new askers to suitable users in CQAs.
Srinivasan Parthasarathy (Advisor)
Huan Sun (Committee Member)
Eric Fosler-Lussier (Committee Member)
Darren Drewry (Committee Member)
208 p.

Recommended Citations

Citations

  • Sun, J. (2019). Directed Graph Analysis: Algorithms and Applications [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422

    APA Style (7th edition)

  • Sun, Jiankai. Directed Graph Analysis: Algorithms and Applications. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422.

    MLA Style (8th edition)

  • Sun, Jiankai. "Directed Graph Analysis: Algorithms and Applications." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422

    Chicago Manual of Style (17th edition)