Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology

Abstract Details

2014, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Machine learning is a branch of generic artificial intelligence, which covers a wide range of learning topics. A variety of supervised and unsupervised models of machine learning/data mining have been applied extensively in biomedical informatics studies for knowledge discovery. Advantages of meta-analysis or data fusion have been discussed in many research domains. Specifically, growing data, information, and knowledge covering various dimensions of human development and diseases calls for efficient integrative and mining efforts to analyze such heterogeneous information simultaneously. In this dissertation, we present our work to extract hidden knowledge from data about the large-scale complex biological systems that usually involve heterogeneous entities and associations between them. First, we propose a biclustering algorithm to identify entities that may manifest cohesiveness within a subspace of conditions. We apply this algorithm to predict combinatorial regulation of transcription factors. We also extend the algorithm to generate 3-clusters in order to capture associations between different classes of entities. Second, we propose network-based approaches to predict drug repositioning candidates. These computational models utilize heterogeneous genomic and pharmacological information to generate potential drug repositioning candidates. We validate the approach using known indications before applying to predict new indications for existing drugs. Third, we study several statistical and computational strategies to generate overall significance of relationships between different biological entities. We apply this specifically to the problem of microRNA target ranking. We propose a framework that applies a series of data mining methods to prioritize entities in a heterogeneous network context. We also develop a workbench ToppMiR based on this framework to infer significant microRNAs and mRNA targets given a biological context.
Raj Bhatnagar, Ph.D. (Committee Chair)
Anil Jegga, D.V.M. M.Res. (Committee Member)
Bruce Aronow, Ph.D. (Committee Member)
Yizong Cheng, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
122 p.

Recommended Citations

Citations

  • Wu, C. (2014). Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880774

    APA Style (7th edition)

  • Wu, Chao. Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology. 2014. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880774.

    MLA Style (8th edition)

  • Wu, Chao. "Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology." Doctoral dissertation, University of Cincinnati, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880774

    Chicago Manual of Style (17th edition)