Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis

Castleberry, Alissa

Abstract Details

2019, Master of Science, Ohio State University, Public Health.
Background Sparse canonical correlation analysis (SCCA) is a multivariate statistical model that uses constraints to find linear relationships between high dimensional data sets. In this study, I apply SCCA to two different problems. One analysis is the identification of linear combinations of microRNA (miRNA) that have maximum positive correlation with their messenger RNA (mRNA) targets. Messenger RNA provide the code to make proteins, while miRNA are small non-coding RNA molecules that help to regulate mRNA function post-transcription. The relationship between the two is much studied in the function of healthy and diseased cells and therefore determining the targets of miRNAs is crucially important. However, the existing methods to determine these targets return lists of thousands of mRNA. Existing correlation analysis can constrain these lists, but are limited by the high dimensionality involved in the computation in these large datasets. Thus, SCCA is well suited to constrain the number of mRNA targets returned in these analyses. Further, SCCA was then used to calculate a novel gene set enrichment analysis (GSEA) statistic. This GSEA statistic was then visualized using heat maps, which allowed for more succinct and informative visualization of the pathway statistics. The other analysis of interest was to use SCCA to find linear combinations of microbes that have maximum correlation with associated metabolites. The SCCA was also used to inform a metabolite set enrichment analysis. Results Data analysis using head and neck squamous cell carcinoma samples from the Cancer Genome Atlas revealed that many of the miRNA-mRNA interactions of interest were on cancer or cancer-related pathways, such as the p53-signaling or melanoma pathways. Data analysis using microbe and metabolite data taken from colorectal cancer patients revealed that many of the metabolites produced by microbes of interest, such as tyrosine and lysylproline, were found on cancer or cancer-related pathways, like central carbon metabolism in cancer. Conclusions Using SCCA to inform a gene or metabolite set enrichment analysis is more informative than using simply a pairwise correlation analysis, as it better takes into account the complexity present in real biology. The performance of the proposed method shows promise for identifying biological pathways enriched for genes regulated by miRNA expression or metabolites produced by a certain set of microbes, and could be used to model other multi-omics datasets assuming a relationship between the two can be established.
Guy Brock (Advisor)
Kevin Coombes (Committee Member)
Chi Song (Committee Member)
67 p.

Recommended Citations

Citations

  • Castleberry, A. (2019). Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976

    APA Style (7th edition)

  • Castleberry, Alissa. Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis. 2019. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976.

    MLA Style (8th edition)

  • Castleberry, Alissa. "Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis." Master's thesis, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976

    Chicago Manual of Style (17th edition)