Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery.

Ghandikota, Sudhir

Abstract Details

2023, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Graphs have become a convenient approach for representing complex real-world systems that contain a collection of objects and their relationships. They are extensively used to model data in various domains, including computer science, statistical physics, linguistics, and biological and social sciences. For instance, in the biological domain, networks are used to represent the interactions between proteins. Traditional network clustering and community detection algorithms are then applied for candidate gene prioritization and in silico biomarker discovery. However, given the ever-increasing size and complexity of networks, machine learning has become the primary approach for analyzing such graphs. The success of these models is highly dependent on the quality of user-designed input features. Alternatively, representation learning models work toward learning relevant representations of input data suitable for the task at hand. The learned representations can then be reused in subsequent downstream tasks as inputs. In addition, they can be used to determine the explanatory factors shared by two or more independent learning tasks. Recently, there has been a surge in representation learning frameworks for graph-structured data to learn node embeddings. However, computational frameworks capable of analyzing multiple networks simultaneously are still limited. Such implementations are particularly useful for research problems, such as in silico biomarker discovery, where multiple transcriptomic studies associated with a given disease are available but seldom used. In this dissertation, we developed novel feature learning frameworks capable of embedding network nodes from multiple datasets. In the first part of our work, we developed a skip-gram-based multi-task feature learning model that is capable of combining multiple supervised and/or unsupervised task objectives to learn continuous features of discrete entities. We used this model to extract contextualized gene features by combining gene function and gene co-expression contexts. Using the biological relevance of these neighborhoods as the criterion, we compared the learned features from our framework with those from other unsupervised representation learning models. The next problem that we worked on involves learning node embeddings from multiple networks while also explicitly encoding both short- and long-range interaction neighborhoods. For this, we implemented a multiview graph neural network (GNN) to analyze multiple networks sharing the same vertex space but originating from different similarity sources (views). We applied this model to analyze and learn node features from multiple gene transcriptomic networks and to identify consensus candidate gene modules associated with a given disease. Several state-of-the-art graph neural networks were used during our evaluation experiments to compare the quality of the learned features and gene clusters. In our final study, we developed a multimodal feature-learning framework capable of integrating and analyzing data elements from different input sources. Additionally, in this study, we employed a cross-modal attention mechanism to aggregate features from individual input data types. Using a supervised training objective, multimodal feature vectors were generated for a given set of input genes and used to identify candidate biomarkers. We performed ablation experiments to determine the advantages of following a multimodal approach. We also compared the predictive performance of our framework with several other state-of-the-art multimodal learning methodologies.
Anil Jegga, DVM MRes (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Yizong Cheng, Ph.D. (Committee Member)
Jing Chen, Ph.D. (Committee Member)
144 p.

Recommended Citations

Citations

  • Ghandikota, S. (2023). Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery. [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1684777495198182

    APA Style (7th edition)

  • Ghandikota, Sudhir. Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery. 2023. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1684777495198182.

    MLA Style (8th edition)

  • Ghandikota, Sudhir. "Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery." Doctoral dissertation, University of Cincinnati, 2023. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1684777495198182

    Chicago Manual of Style (17th edition)