Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Extension of Similarity Functions and their Application to Chemical Informatics Problems

Wood, Nicholas Linder

Abstract Details

2018, Doctor of Philosophy, Ohio State University, Chemical Engineering.
Similarity is the most pervasive concept in chemoinformatics and it provides direction for many of the problems which arise in that field. Similarity functions are mathematical tools for quantifying the similarity of one molecule with respect to another molecule. In this work, we developed a method for the quantification of the similarity of one molecule with respect to a set of molecules. This method requires a similarity function which is symmetric and positive definite. If the similarity function meets two additional mild requirements, namely if it is bound between zero and unity and is unity when evaluated on two identical molecules, then we say that the similarity function is extendable. In this case, the similarity of a molecule with respect to a set containing one molecule reduces to the original similarity function evaluated on those two molecules. We additionally stated and proved several properties of the extension of similarity functions. We then applied the extension of similarity functions to two problems in chemoinformatics. First, we used the extension of similarity functions as the basis for machine learning models for the prediction of various molecular endpoints. These machine learning models were compared to the kNN machine learning model. For each endpoint predicted, the model based on the extension of similarity functions was shown either comparable to or to be exceeding the kNN model. Second, we used the extension of similarity functions as the basis for defining the domain of applicability of a machine learning model. We applied this definition to a kNN model and showed that using the extension of similarity functions can be used to order predictions for the rational selection of molecules for further testing. We showed how doing so can increase the overall usefulness of a machine learning model. Finally, we stated several mathematical questions related to the extension of similarity functions which, if answered, could aid in the training of machine learning models based on the extension of similarity functions.
James Rathman, Dr (Advisor)
Isamu Kusaka, Dr (Committee Member)
Aravind Asthagiri, Dr (Committee Member)
169 p.

Recommended Citations

Citations

  • Wood, N. L. (2018). Extension of Similarity Functions and their Application to Chemical Informatics Problems [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615

    APA Style (7th edition)

  • Wood, Nicholas. Extension of Similarity Functions and their Application to Chemical Informatics Problems. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615.

    MLA Style (8th edition)

  • Wood, Nicholas. "Extension of Similarity Functions and their Application to Chemical Informatics Problems." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1542299336598615

    Chicago Manual of Style (17th edition)