Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Computational Modeling for Censored Time to Event Data Using Data Integration in Biomedical Research

Choi, Ickwon

Abstract Details

2011, Doctor of Philosophy, Case Western Reserve University, EECS - Computer and Information Sciences.

Medical prognostic models are designed by clinicians to predict the future course or outcome of disease progression after diagnosis or treatment. The data, which are used when these clinical models are developed, are required to contain a high number of events per variable (EPV) for the resulting model to be reliable. If our objective is to optimize predictive performance by some criterion, we can often achieve a reduced model that has a little bias with low variance, but whose overall performance is improved. To accomplish this goal, we propose a new variable selection approach that combines Stepwise Tuning in the Maximum Concordance Index (STMC) and Forward Nested Subset Selection (FNSS) in two stages. In the first stage, the proposed variable selection is employed to identify the best subset of risk factors optimized with the concordance index using inner cross validation for optimism correction in the outer loop of cross validation, yielding potentially different final models for each of the folds. We then feed the intermediate results of the prior stage into another selection method in the second stage to resolve the overfitting problem and to select a final model from the variation of predictors in the selected models. Two case studies on relatively different sized survival data sets as well as a simulation study demonstrate that the proposed approach is able to select an improved and reduced average model under a sufficient sample and event size compared to other selection methods such as stepwise selection using the likelihood ratio test, Akaike Information Criterion (AIC), and least absolute shrinkage and selection operator (lasso). Finally, we achieve improved final models in each dataset as compared full models according to most criteria. These results of the model selection models and the final models were analyzed in a systematic scheme through validation for independent performance evaluation.

For the second part of this dissertation, we build prognostic models that use clinicopathologic features and predict prognosis after a certain treatment. Most of the recent research efforts have focused on high dimensional genomic data with a small sample. Since clinically similar but molecularly heterogeneous tumors may produce different clinical outcomes, the combination of clinical and genomic information is crucial to improve the quality of prognostic prediction. However, there is lack of an integrating scheme into a clinico-genomic model due to the larger number of variables and small sample size, in particular, for a parsimonious model. We propose a methodology to build a reduced yet accurate integrative model using a hybrid approach based on the Cox regression model, which uses several dimension reduction techniques, L2 penalized maximum likelihood estimation (PMLE), and resampling methods to tackle the problems above. The predictive accuracy of the modeling approach is assessed by several metrics via an independent and thorough scheme to compare competing methods. In breast cancer data studies for metastasis and mortality outcome, in a DLBCL data study, and in simulation studies, we demonstrate that the proposed methodology can improve prediction accuracy and build a final model with a hybrid signature that is parsimonious when integrating both types of variables. The selected clinical factors and genomic biomarkers are found to be highly relevant to the biological processes and can be considered as potential biomarkers for cancer prognosis and therapy. Furthermore, selected but unidentified genes are open to thorough investigation.

Michael Kattan (Advisor)
Mehmet Koyuturk (Committee Chair)
Andy Podgurski (Committee Member)
Soumya Ray (Committee Member)
124 p.

Recommended Citations

Citations

  • Choi, I. (2011). Computational Modeling for Censored Time to Event Data Using Data Integration in Biomedical Research [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1307969890

    APA Style (7th edition)

  • Choi, Ickwon. Computational Modeling for Censored Time to Event Data Using Data Integration in Biomedical Research. 2011. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1307969890.

    MLA Style (8th edition)

  • Choi, Ickwon. "Computational Modeling for Censored Time to Event Data Using Data Integration in Biomedical Research." Doctoral dissertation, Case Western Reserve University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=case1307969890

    Chicago Manual of Style (17th edition)