Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Sample Mislabeling Detection and Correction in Bioinformatics Experimental Data

Abstract Details

2021, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.
Sample mislabeling or incorrect annotation has been a long-standing problem in biomedical research and contributes to irreproducible results and invalid conclusions. These problems are especially prevalent in multi-omics studies in which a large set of biological samples are characterized by multiple types of omics platforms at different times or different labs. While multi-omics studies have demonstrated tremendous value in understanding disease biology and improving patient outcomes, the complexity of these studies may increase opportunities for human error. Fortunately, the interrelated nature of the data collected in multi-omics studies can be exploited to facilitate the identification and, in some cases, correction of mislabeling errors. The dissertation proposed a pipeline comprising statistical and machine learning techniques to identify mislabeled samples and correct the sample labels. Expected correlations between copy number variation, gene transcript abundance, protein abundance and microRNA expression were used to identify mislabeled samples. In datasets with only two omics data, the label corrections were performed by exploiting gender-specific indicators of the mislabeled samples; whereas in datasets with more than two omics data, a network topology realignment method was proposed to perform label correction. We demonstrated the effectiveness of the pipeline in several cancer datasets by simulation experiments. The pipeline was then performed on several public multi-omics datasets and in overall, 2.71% of the samples are found to be mislabeled.
Michael Raymer, Ph.D. (Advisor)
Michael Markey, Ph.D. (Committee Member)
Travis Doom, Ph.D. (Committee Member)
Tanvi Banerjee, Ph.D. (Committee Member)
109 p.

Recommended Citations

Citations

  • Kho, S. J. (2021). Sample Mislabeling Detection and Correction in Bioinformatics Experimental Data [Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1629736147173188

    APA Style (7th edition)

  • Kho, Soon Jye. Sample Mislabeling Detection and Correction in Bioinformatics Experimental Data. 2021. Wright State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1629736147173188.

    MLA Style (8th edition)

  • Kho, Soon Jye. "Sample Mislabeling Detection and Correction in Bioinformatics Experimental Data." Doctoral dissertation, Wright State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1629736147173188

    Chicago Manual of Style (17th edition)