Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Development of novel unsupervised and supervised informatics methods for drug discovery applications

Mohiddin, Syed B

Abstract Details

2006, Doctor of Philosophy, Ohio State University, Chemical Engineering.
As of 2002, the cost of discovering a new drug was nearly $802 million with a timeline of nearly 13.6 years. Despite the large investments in time and money, drugs that were successfully introduced in the market had to be withdrawn later due to efficacy (38%) and safety (20%) reasons. Improving the success rate in drug discovery is linked with two key steps in the process. First, in order to improve efficacy, there is a need for improved understanding of genetic biomarkers (targets for drug action) that are responsible for characterizing a given disease. Second, it is possible to improve drug safety, by predicting the activity/toxicity of potential drug candidates at an early stage prior to the initiation of expensive clinical trials. In this work, we develop a novel unsupervised informatics methodology that addresses characterization of both biological and chemical samples and identification of underlying key non-redundant features responsible for characterization. Biological samples are characterized into different groups (e.g. cancer types) based on gene expression profiling and the genetic biomarkers most responsible for characterization are identified. Similarly, chemical compounds are characterized into different groups with varying activity/toxicity based on structural, physical and chemical property data of the chemical compounds. The methodology developed in this work relies largely on the multivariate aspects of principal component analysis and the application of k-means clustering algorithm in a hierarchically recursive manner to achieve unsupervised multi-class classification. The principal components are replaced by the corresponding partial least square (PLS) components in the supervised scenario. Selection of influential components (principal components in unsupervised case and PLS components in supervised case) for the purpose of classification is demonstrated and is one of the key steps for the success of this methodology. Hierarchical k-means is applied recursively to achieve binary classification at each stage eventually resulting in multi-class classification. Identification of features responsible for classification is achieved by examining the appropriate loadings of the principal or PLS components along with their coefficient of correlation with influential components.
James Rathman (Advisor)
199 p.

Recommended Citations

Citations

  • Mohiddin, S. B. (2006). Development of novel unsupervised and supervised informatics methods for drug discovery applications [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1138385657

    APA Style (7th edition)

  • Mohiddin, Syed. Development of novel unsupervised and supervised informatics methods for drug discovery applications. 2006. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1138385657.

    MLA Style (8th edition)

  • Mohiddin, Syed. "Development of novel unsupervised and supervised informatics methods for drug discovery applications." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1138385657

    Chicago Manual of Style (17th edition)