Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA

Abstract Details

2007, Doctor of Philosophy, Case Western Reserve University, Epidemiology and Biostatistics.
An important application of microarray research is to develop cancer diagnostic and prognostic tools based on tumor genetic profiles. For easy interpretation, such studies aim to identify a small fraction of genes to build molecular predictors of clinical outcomes from at least thousands of genes thus require methodologies that can model high dimensional covariates and accomplish variable selection simultaneously. One interesting area is modeling cancer patients’ survival time or time to cancer reoccurrence with gene expression data. In the first part of this dissertation, we propose a new penalized weighted least squares method for model estimation and variable selection in accelerated failure time models. In this method, right censored observations are used as censoring constraints in optimizing the weighted least squares objective function. We also include ridge penalty to deal with singularity caused by collinearity and high dimensionality and use the least absolute shrinkage and selection operator to achieve model parsimony. Simulation studies demonstrate that adding censoring constraints improves model estimation and variable selection especially for data with high dimensional covariates. Real data examples show our method is able to identify genes that are relevant to patient survival times. Another interesting area is cancer subtype classification using gene expression profiles. One important issue is to reduce redundancy caused by correlation among genes. Since genes with correlated expression levels may be co-expressed or belong to the same biological pathway related to the disease, including such genes into classifiers provides very little additional information. In the second part of the dissertation, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability of a set of genes. Based on this eigenvalue-ratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the effectiveness of our eigenvalue-ratio statistic based gene selection methods. We also demonstrate that the selected compact gene subsets can not only be used to build high quality cancer classifiers but also have biological relevance.
J. Rao (Advisor)

Recommended Citations

Citations

  • Hu, S. (2007). NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1164873326

    APA Style (7th edition)

  • Hu, Simin. NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA. 2007. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1164873326.

    MLA Style (8th edition)

  • Hu, Simin. "NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA." Doctoral dissertation, Case Western Reserve University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=case1164873326

    Chicago Manual of Style (17th edition)