Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Feature extraction and similarity-based analysis for proteome and genome databases

Ozturk, Ozgur

Abstract Details

2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Bioinformatics will boost our understanding of how life works and enhance medicinal and bio technology. Very large amounts of data is being produced by the experiments of the researchers trying to decipher the complexity of life. In this dissertation, I present our methods for search and analysis of microbiological sequence and 3D protein structure data. We developed methods to map genomic and proteomic sequencesinto metric feature vector spaces in order to facilitate the building of index structures that have practical, accurate, and sensitive filtering capabilities. Similarity distance functions between these N-gram frequency vectors and N-gram wavelet vectors are defined such that these distances satisfy desired properties to represent the original distance between the subsequences corresponding to the vectors. These vectors are indexed using a compressed, multiresolution, grid style data structure for efficient pruning of the candidates in the search space. Our method to index protein structuresdefines and utilizes spatial profiles, i.e., summaries constructed from the geometrical and biochemical properties that characterize the neighborhood around the geometrically significant sites of proteins. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. Unlike most of the currently available methods, our methods are able to capture structurally local motifs. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features. These tools utilize accurate and compact representations of data together with better similarity measures, new data structures and algorithms, and apply data mining techniques in novel ways to help researchers extract information from very large data repositories and make better use of them.
Hakan Ferhatosmanoglu (Advisor)
119 p.

Recommended Citations

Citations

  • Ozturk, O. (2007). Feature extraction and similarity-based analysis for proteome and genome databases [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805

    APA Style (7th edition)

  • Ozturk, Ozgur. Feature extraction and similarity-based analysis for proteome and genome databases. 2007. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805.

    MLA Style (8th edition)

  • Ozturk, Ozgur. "Feature extraction and similarity-based analysis for proteome and genome databases." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805

    Chicago Manual of Style (17th edition)