Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1190138805.pdf (1.18 MB)
ETD Abstract Container
Abstract Header
Feature extraction and similarity-based analysis for proteome and genome databases
Author Info
Ozturk, Ozgur
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805
Abstract Details
Year and Degree
2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Abstract
Bioinformatics will boost our understanding of how life works and enhance medicinal and bio technology. Very large amounts of data is being produced by the experiments of the researchers trying to decipher the complexity of life. In this dissertation, I present our methods for search and analysis of microbiological sequence and 3D protein structure data. We developed methods to map
genomic and proteomic sequences
into metric feature vector spaces in order to facilitate the building of index structures that have practical, accurate, and sensitive filtering capabilities. Similarity distance functions between these N-gram frequency vectors and N-gram wavelet vectors are defined such that these distances satisfy desired properties to represent the original distance between the subsequences corresponding to the vectors. These vectors are indexed using a compressed, multiresolution, grid style data structure for efficient pruning of the candidates in the search space. Our method to index
protein structures
defines and utilizes
spatial profiles
, i.e., summaries constructed from the geometrical and biochemical properties that characterize the neighborhood around the geometrically significant sites of proteins. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. Unlike most of the currently available methods, our methods are able to capture structurally local motifs. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features. These tools utilize accurate and compact representations of data together with better similarity measures, new data structures and algorithms, and apply data mining techniques in novel ways to help researchers extract information from very large data repositories and make better use of them.
Committee
Hakan Ferhatosmanoglu (Advisor)
Pages
119 p.
Keywords
;
Bioinformatics
;
Structural Motifs
;
Sequence Indexing
;
Sequence Similarity
;
Subsequence Similarity
;
Substructure Similarity
;
Very Large Databases
;
Similarity Search
;
k-NN Search
;
Range Search
;
Approximate Querying
;
Quantized Index
;
Multiresolution Search
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Ozturk, O. (2007).
Feature extraction and similarity-based analysis for proteome and genome databases
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805
APA Style (7th edition)
Ozturk, Ozgur.
Feature extraction and similarity-based analysis for proteome and genome databases.
2007. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805.
MLA Style (8th edition)
Ozturk, Ozgur. "Feature extraction and similarity-based analysis for proteome and genome databases." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1190138805
Download Count:
1,292
Copyright Info
© 2007, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.