Skip to Main Content
 

Global Search Box

 
 
 
 

Files

File List

ETD Abstract Container

Abstract Header

Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets

Abstract Details

2013, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
With rapid technological advances, the potential for transformational science and engineering for all scientific domains is enormous. Discovering useful and meaningful patterns and knowledge extraction from large, diverse, distributed and heterogeneous datasets however continues to pose a formidable challenge. Thus, there is an urgent need for more efficient and robust computational approaches to effectively manage, use, and exploit these heterogeneous data sources. This in turn can accelerate the progress of scientific discovery and innovation; gain new insights in a timely manner; lead to new fields of inquiry hitherto impossible. In this dissertation, we tackle this challenge by developing and applying novel and efficient network-based computational approaches. To demonstrate the utility of our algorithms, we use several large and heterogeneous datasets from biomedical domain, focusing specifically on rare or orphan diseases (OD) as an application. Our research has three facets: First, we conduct a global network analysis of orphan diseases (OD) and demonstrate the utility of topological analyses in deducing the underlying biology for rare diseases and their causal genes. Specifically, starting with a bipartite network of known OD and OD-causing mutant genes, using the human protein interactome, functional enrichment and literature co-citation, we constructed and topologically analyzed several networks. Our analyzed results revealed that a majority of orphan disease-causing mutant genes are essential, in contrast to common disease-causing mutant genes, which are predominantly nonessential. In the second facet, we designed a novel algorithm based on vertex similarity to identify and rank novel orphan disease candidate genes. We tested and validated this algorithm using leave one out cross-validation approach on known orphan disease gene sets. We also compared its performance with previously reported similar approaches and found that its performance was comparable to the current state-of-art approaches. Finally, we designed and developed a novel drug repositioning candidate discovery framework that combines both information theory and network analyses-based approaches. Integrating fourteen heterogeneous gene-gene networks, this framework quantifies similarities between disease causal genes and drug target genes based on topological similarity (vertex similarity score) and mutual information score. By extracting the related drug and disease information from the top ranked gene pairs or gene clusters, we discovered several drug repositioning candidates for both common and orphan diseases.
Kenneth Berman, Ph.D. (Committee Chair)
Anil Jegga, D.V.M., M.Res. (Committee Member)
Fred Annexstein, Ph.D. (Committee Member)
Anca Ralescu, Ph.D. (Committee Member)
Marepalli Rao, Ph.D. (Committee Member)
86 p.

Recommended Citations

Citations

  • Zhu, C. (2013). Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769

    APA Style (7th edition)

  • Zhu, Cheng. Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets. 2013. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769.

    MLA Style (8th edition)

  • Zhu, Cheng. "Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets." Doctoral dissertation, University of Cincinnati, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769

    Chicago Manual of Style (17th edition)