Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Validation of clustering solutions for clinical data through biologically meaningful simulations and mixed-distance dissimilarity methods

Coombes, Caitlin E

Abstract Details

2020, Master of Science, Ohio State University, Public Health.
Unsupervised clustering poses unique challenges in clinical data due to heterogeneous size and mixed type. We hypothesize that these limitations can be overcome by calculating dissimilarity by combining multiple distance methods. A review of the literature suggests that solutions for mixed, clinical data are sparse and lack rigor. In an initial experiment on real clinical data, we find limitations in a common approach: converting a mixed data set to a single data type. To rigorously test dissimilarity metrics and clustering methods, we develop 32,400 simulations of realistic, mixed-type clinical data and test 3 clustering algorithms (hierarchical clustering, Partitioning Around Medoids, and self-organizing maps) on 5 single distance metrics (Jaccard Index, Sokal & Michener distance, Gower coefficient, Manhattan distance, Euclidean distance) and 3 multiple distance methods of calculating dissimilarity (DAISY, Supersom, and Mercator, a method of our own devising). We apply the superior solution for a data mixture predominated by binary features, DAISY with Ward’s hierarchical clustering, to the data set from our initial experiment, and recover important prognostic features. These experiments raise future questions for clustering problems in clinical data, including identifying minimum size for successful clustering (relevant when clustering clinical trials) and addressing concerns for validation of sometimes variable outcome.
Guy Brock, PhD (Advisor)
Courtney Hebert, MD MS (Committee Member)
Chi Song, PhD (Committee Member)
146 p.

Recommended Citations

Citations

  • Coombes, C. E. (2020). Validation of clustering solutions for clinical data through biologically meaningful simulations and mixed-distance dissimilarity methods [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1584957936561871

    APA Style (7th edition)

  • Coombes, Caitlin. Validation of clustering solutions for clinical data through biologically meaningful simulations and mixed-distance dissimilarity methods. 2020. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1584957936561871.

    MLA Style (8th edition)

  • Coombes, Caitlin. "Validation of clustering solutions for clinical data through biologically meaningful simulations and mixed-distance dissimilarity methods." Master's thesis, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1584957936561871

    Chicago Manual of Style (17th edition)