Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Comparative Microarray Data Mining

Abstract Details

2007, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.
As a revolutionary technology, microarrays have great potential to provide genome-wide patterns of gene expression, to make accurate medical diagnosis, and to explore genetic causes underlying diseases. It is commonly believed that suitable analysis of microarray datasets can lead to achieve the above goals. While much has been done in microarray data mining, few previous studies, if any, focused on multiple datasets at the comparative level. This dissertation aims to fill this gap by developing tools and methods for set-based comparative microarray data mining. Specifically, we mine highly differentiative gene groups (HDGGs) from given datasets/classes, evaluate the concordance of datasets generated from different platforms/laboratories, investigate the impact of variability in microarray dataset on data mining results, provide tools and algorithms for the above tasks, and identify reliable invariant HDGG patterns for better understanding diseases. It is a big challenge to discover high-quality discriminating (emerging) patterns from high dimensional microarray datasets. We develop a novel feature-group selection method to help discover HDGGs, especially signature HDGGs that completely characterize some disease classes. In addition to giving insights on the diseases, better classification results are also obtained using HDGG-based classifiers compared with other existing classifiers. As microarray datasets are often generated from different platforms/laboratories, it is necessary to evaluate their concordance/consistence before they can be studied together. We provide measures and techniques to quantitatively test such concordance at the comparative level. In addition to applying measures to evaluate the degree of variability in microarray datasets, we also develop a novel algorithm called C-loocv to effectively minimize the variability. As an indicator of the utility of C-loocv, classifiers trained from C-loocv-refined datasets become more robust and predict test samples at significantly higher accuracy over classifiers trained from original datasets. Based on the variability minimization algorithm, we provide a novel strategy to mine invariant patterns from multiple datasets concerning a common disease. As a demonstration, invariant patterns are identified from two datasets concerning lung cancer; these patterns may shed light on the mechanism underlying the pathogenesis of lung cancer. Our methods are generic and can be applied to microarrays concerning any human diseases.
Guozhu Dong (Advisor)
157 p.

Recommended Citations

Citations

  • Mao, S. (2007). Comparative Microarray Data Mining [Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1198695415

    APA Style (7th edition)

  • Mao, Shihong. Comparative Microarray Data Mining. 2007. Wright State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1198695415.

    MLA Style (8th edition)

  • Mao, Shihong. "Comparative Microarray Data Mining." Doctoral dissertation, Wright State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1198695415

    Chicago Manual of Style (17th edition)