Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING.

Ribeiro Leite SIlva, Joao Vinicius, Ribeiro

Abstract Details

, Doctor of Philosophy, Ohio State University, Chemical Engineering.
Evaluating the behavior of a chemical inside a living organism is an essential step in the drug design process. Being able to predict properties related to chemical activity and toxicity can significantly improve the efficiency of developing effective and safe chemical products. The field of cheminformatics strives to use in silico approaches to elucidate such properties. Quantitative Structure-Activity Relationship (QSAR) modeling is a subfield of cheminformatics that leverages experimental data to derive empirical models for a particular chemical property or activity in terms of molecular structure. Creating a QSAR modeling involves successfully identifying structural features of the compounds present in a chemical dataset that can differentiate the compounds concerning the endpoint of interest. Frequently the number of structural features present in the data is enormous, which leads to models that are overly-complex and hard to interpret. We present a framework that can be used to automatically identify relevant chemical structures in a chemical dataset, which operates by the following steps: 1) Extract chemical substructures from the dataset. 2) Evaluate the discriminative power of each feature using the chi-square statistic, accuracy, and frequency, thereby filter all by the most relevant. 3) Apply hierarchical clustering to identify and remove redundant features. Another aspect of this work is the introduction of a descriptor/feature generation and consolidation technique described by the application of the logical union to binary features. This idea can be used to cluster structural features into a more general concept without losing the chemical information present at each variable. We make use of a genetic algorithm to generate unions, which allows for the creation of a new set of variables that were constructed from chemical structural features. This strategy has the benefits of reducing the dimensionality of the data while achieving a high model performance without a loss in model interpretability. We present three endpoints as case studies to test our proposed techniques: Brain-blood barrier permeability, Ames mutagenicity, and AIDS antiviral response. We identified descriptors that known to be mechanistically related to these properties. We also created QSAR models for blood-brain barrier permeability and Ames mutagenicity using the proposed algorithm to generate the variables that were used in the models. These QSAR models for blood-brain barrier permeability and Ames mutagenicity performed with a concordance of 75% and 76%, respectively. This is a similar performance to other learning methods applied to these data sets without loss of interpretability. The information from our work can help guide new experiments and in the design of new chemical products.
James Rathman (Advisor)
Isamu Kusaka (Committee Member)
Lisa Hall (Committee Member)
171 p.

Recommended Citations

Citations

  • Ribeiro Leite SIlva, Ribeiro, J. V. (2019). USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING. [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339

    APA Style (7th edition)

  • Ribeiro Leite SIlva, Ribeiro, Joao Vinicius. USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339.

    MLA Style (8th edition)

  • Ribeiro Leite SIlva, Ribeiro, Joao Vinicius. "USING MACHINE LEARNING TECHNIQUES FOR IDENTIFICATION AND IMPROVEMENT OF CHEMICAL DESCRIPTORS IN QSAR MODELING." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1565994502476339

    Chicago Manual of Style (17th edition)