Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
3376.pdf (2.11 MB)
ETD Abstract Container
Abstract Header
Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets
Author Info
Zhu, Cheng
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769
Abstract Details
Year and Degree
2013, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Abstract
With rapid technological advances, the potential for transformational science and engineering for all scientific domains is enormous. Discovering useful and meaningful patterns and knowledge extraction from large, diverse, distributed and heterogeneous datasets however continues to pose a formidable challenge. Thus, there is an urgent need for more efficient and robust computational approaches to effectively manage, use, and exploit these heterogeneous data sources. This in turn can accelerate the progress of scientific discovery and innovation; gain new insights in a timely manner; lead to new fields of inquiry hitherto impossible. In this dissertation, we tackle this challenge by developing and applying novel and efficient network-based computational approaches. To demonstrate the utility of our algorithms, we use several large and heterogeneous datasets from biomedical domain, focusing specifically on rare or orphan diseases (OD) as an application. Our research has three facets: First, we conduct a global network analysis of orphan diseases (OD) and demonstrate the utility of topological analyses in deducing the underlying biology for rare diseases and their causal genes. Specifically, starting with a bipartite network of known OD and OD-causing mutant genes, using the human protein interactome, functional enrichment and literature co-citation, we constructed and topologically analyzed several networks. Our analyzed results revealed that a majority of orphan disease-causing mutant genes are essential, in contrast to common disease-causing mutant genes, which are predominantly nonessential. In the second facet, we designed a novel algorithm based on vertex similarity to identify and rank novel orphan disease candidate genes. We tested and validated this algorithm using leave one out cross-validation approach on known orphan disease gene sets. We also compared its performance with previously reported similar approaches and found that its performance was comparable to the current state-of-art approaches. Finally, we designed and developed a novel drug repositioning candidate discovery framework that combines both information theory and network analyses-based approaches. Integrating fourteen heterogeneous gene-gene networks, this framework quantifies similarities between disease causal genes and drug target genes based on topological similarity (vertex similarity score) and mutual information score. By extracting the related drug and disease information from the top ranked gene pairs or gene clusters, we discovered several drug repositioning candidates for both common and orphan diseases.
Committee
Kenneth Berman, Ph.D. (Committee Chair)
Anil Jegga, D.V.M., M.Res. (Committee Member)
Fred Annexstein, Ph.D. (Committee Member)
Anca Ralescu, Ph.D. (Committee Member)
Marepalli Rao, Ph.D. (Committee Member)
Pages
86 p.
Subject Headings
Computer Science
Keywords
Network approaches
;
pattern recognition
;
heterogeneous datasets
;
rare orphan disease
;
drug repositioning
;
gene prediction
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zhu, C. (2013).
Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769
APA Style (7th edition)
Zhu, Cheng.
Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets.
2013. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769.
MLA Style (8th edition)
Zhu, Cheng. "Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets." Doctoral dissertation, University of Cincinnati, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1378215769
Download Count:
642
Copyright Info
© 2013, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.