Skip to Main Content
 

Global Search Box

 
 
 
 

Supplemental Files

File List

ETD Abstract Container

Abstract Header

Applications of Cheminformatics for the Analysis of Proteolysis Targeting Chimeras and the Development of Natural Product Computational Target Fishing Models

Cockroft, Nicholas T

Abstract Details

2019, Doctor of Philosophy, Ohio State University, Pharmaceutical Sciences.
The use of data-driven methods and machine learning has become increasingly pervasive in many industries, including drug discovery and design, as computing power and large amounts of data become increasingly available. In an effort to efficiently leverage this data, cheminformatics has emerged as a data-driven, interdisciplinary field that focuses on storing, accessing, and applying chemical information. Cheminformatics methods and tools facilitate the management and analysis of large annotated chemical datasets that would be difficult or impossible to do manually. A famous application of leveraging large amounts of chemical data was performed by Christopher A. Lipinski in 1997. Lipinski analyzed a large set of bioavailable synthetic drug molecules and identified trends in their molecular properties, which has since been referred to as the “Lipinski’s Rule of 5”. While these rules are far from absolute, Lipinski’s analysis demonstrates the utility of leveraging large amounts of chemical data to gain important insights. This thesis describes the application of cheminformatics methods to tackle two very different research problems: 1) the analysis and binding of a class of protein degraders called proteolysis targeting chimeras (PROTACs) and 2) the development of a target fishing application for the prediction of mechanism of action of natural products. PROTACs are a novel class of small molecule therapeutics that are garnering significant interest. Unlike traditional small molecule therapeutics, PROTACs simultaneously bind to both their protein target and an E3 ligase to induce degradation. The requirement to simultaneously bind two proteins necessitates a high molecular weight as PROTACs must contain two unique binding moieties that are connected by a linker. As a result, PROTAC molecules are expected to lie outside of the traditional drug-like chemical space described by Lipinski. To gain a better understanding of the physicochemical properties of PROTACs currently in development, the patent literature was searched and PROTAC compounds targeting either the Von Hippel-Lindau (VHL) or cereblon (CRNB) ligases were retrieved. This analysis identified that the physicochemical properties of PROTACs were indeed different from those of drug-like small molecules. However, the importance of each property for activity and permeability cannot yet be addressed without additional annotated biological endpoints. While the physicochemical properties of a PROTAC compound are expected to be important for its pharmacokinetics, the formation of a ternary complex is crucial for its pharmacodynamics. Using the currently available crystallographic data of ternary complexes with resolved PROTACs, a method for prediction of the ternary complex structure was developed and benchmarked. The results of this method were promising with ternary structures predicted correctly for up to 60% of the final predicted complexes. However, the identification of the correct complexes from among the incorrect complexes a priori was shown to be a difficult task. Another class of small molecule therapeutics which do not adhere to traditional drug-like properties is natural products. Natural products have been a tremendous source of new drugs over the past three decades with unaltered natural products and natural product derivatives making up over one-third of FDA approved small molecule drugs. These natural products have made up a substantial portion of first-in-class drugs identified through phenotypic screening methods. A limitation of phenotypic screening methods is a lack of understanding of the target and molecular mechanism of action, which is desirable for the progression of a chemical entity to the clinic. Cheminformatics methods can be applied to aid in the identification of the molecular mechanism of action of small molecules in a process termed computational target fishing. The current methods for computational target fishing have been trained and tested on datasets containing exclusively synthetic compounds. Based on their inherent structural differences, the relative ability and accuracy of a model trained on synthetic data to predict targets for natural products remains unknown. To address this, a natural product benchmark set containing 5,589 compound-target pairs for 1,943 unique compounds and 1,023 unique targets was collected by cross-referencing 20 publicly available natural product databases with the bioactivity database ChEMBL. A dataset of synthetic compounds from ChEMBL containing 107,190 compound-target pairs for 88,728 unique compounds and 1,907 unique targets was used to train k-nearest neighbors (KNN), random forest (RF), and multi-layer perceptron (MLP) models. Additionally, a model stacking approach was also investigated, which uses logistic regression as a meta-classifier to combine the individual model predictions. A model stacking approach using KNN and RF as the base classifiers showed the best performance on the natural product benchmark set with an area under the receiver operating characteristic (AUROC) score of 0.94 and a Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) score of 0.73. A similarly performing and more computationally efficient model stacking approach using KNN as the base classifier was deployed as a web application, called STarFish, and has been made available for use to aid in the target identification of natural products.
James Fuchs (Advisor)
Xioalin Cheng (Advisor)
Karl Werbovetz (Committee Member)
Lara Sucheston-Campbell (Committee Member)
187 p.

Recommended Citations

Citations

  • Cockroft, N. T. (2019). Applications of Cheminformatics for the Analysis of Proteolysis Targeting Chimeras and the Development of Natural Product Computational Target Fishing Models [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu156596730476322

    APA Style (7th edition)

  • Cockroft, Nicholas. Applications of Cheminformatics for the Analysis of Proteolysis Targeting Chimeras and the Development of Natural Product Computational Target Fishing Models. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu156596730476322.

    MLA Style (8th edition)

  • Cockroft, Nicholas. "Applications of Cheminformatics for the Analysis of Proteolysis Targeting Chimeras and the Development of Natural Product Computational Target Fishing Models." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu156596730476322

    Chicago Manual of Style (17th edition)