Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Thesis_Rong_07072016.pdf (2.28 MB)
ETD Abstract Container
Abstract Header
Statistical Methods for Functional Genomics Studies Using Observational Data
Author Info
Lu, Rong
ORCID® Identifier
http://orcid.org/0000-0003-4321-9144
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1467830759
Abstract Details
Year and Degree
2016, Doctor of Philosophy, Ohio State University, Biostatistics.
Abstract
In functional genomics studies, human tissue samples are always difficult to get access to, and the lab experiments are expensive to implement and time-consuming. Data mining in existing databases is an essential step in building scientific hypotheses for designing well-targeted lab experiments. Therefore, it is important to study statistical methods that can better utilize observational data in functional genomics studies. Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high throughput sequencing technologies for studying RNA expression permits measurement of allelic RNA expression imbalance at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing methods used to model allelic expression from RNA-seq often assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically. In Chapter 2, a folded Skellam mixture model is proposed for AEI analysis using RNA-seq data. Under the null hypothesis of no AEI, a group of SNPs (possibly across multiple genes) is considered comparable if their respective total sums of the allelic reads are of similar magnitude. Within each group of comparable SNPs, we identify SNPs with AEI signal by fitting a mixture of folded Skellam distributions to the absolute values of read differences. By applying this methodology to RNA-Seq data from human autopsy brain tissues, we identified numerous instances of moderate to strong imbalanced allelic RNA expression at heterozygous SNPs. Findings with SLC1A3 mRNA exhibiting known expression differences are discussed as examples. In the theory of complex systems, the Sobol sensitivity indices are typically introduced under the high dimension model representation (HDMR, also known as functional ANOVA), assuming all the inputs are independent uniform random variables. The variance-based definitions of Sobol indices are available for analyzing systems with correlated or non-uniform inputs. The existing algorithms for estimating Sobol indices with correlated inputs mostly start with approximating the underlying full model by meta-models with certain type of orthogonality among the decomposition components, which is computationally expensive to implement especially when the number of inputs is large. In Chapter 3, a simple strategy for estimating Sobol indices is proposed under the generalized linear models with independent or multivariate normal inputs. If the ultimate goal is only to estimate Sobol indices for variable selection instead of building a predictive model, it may be more convenient to approximate conditional expectations of the response with respect to different input subsets separately, without reconstructing the complete input-output map. It can be shown that under a large group of GLMs, Sobol sensitivity indices can be either estimated directly using closed analytic formulas or approximated numerically using empirical variance estimates to any level of desired accuracy, without requiring the knowledge of the underlying true model or its HDMR. The usage of this method is illustrated in the application example of selecting genes that are co-expressed with a target gene of interest, CYP3A4.
Committee
Grzegorz Rempala (Advisor)
Wolfgang Sadee (Committee Member)
Shili Lin (Committee Member)
Pages
178 p.
Subject Headings
Biostatistics
Keywords
folded Skellam mixture
;
AEI
;
Sobol sensitivity indices
;
GLM
;
variable selection
;
variable ranking
;
global sensitivity analysis
;
co-expression network
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Lu, R. (2016).
Statistical Methods for Functional Genomics Studies Using Observational Data
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467830759
APA Style (7th edition)
Lu, Rong.
Statistical Methods for Functional Genomics Studies Using Observational Data.
2016. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1467830759.
MLA Style (8th edition)
Lu, Rong. "Statistical Methods for Functional Genomics Studies Using Observational Data." Doctoral dissertation, Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467830759
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1467830759
Download Count:
865
Copyright Info
© 2016, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.
Release 3.2.12