Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1164741421.pdf (1.18 MB)
ETD Abstract Container
Abstract Header
Searching for remotely homologous sequences in protein databases with hybrid PSI-blast
Author Info
Li, Yuheng
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421
Abstract Details
Year and Degree
2006, Doctor of Philosophy, Ohio State University, Biophysics.
Abstract
Sequence alignment is one of the fundamental techniques used in molecular biology. It has been widely used in many biological applications, such as protein classification, gene finding, homology modeling, structure and function prediction, phylogenetic analysis and database annotation. In high sensitivity sequence homology database searches, progressive sequence model refinement by means of iterative searches is an effective method and is currently employed in many popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. We have demonstrated that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. We also evaluated the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. In addition, we have exploited other benefits of the hybrid alignment. We show that incorporating information about the suboptimal alignments, otherwise ignored in PSI-BLAST, already improves the sensitivity of PSI-BLAST. In one experiment, we have found a set of sequences on which our tool disagrees with the classification given by SCOP. Careful examination points to a possible misclassification in SCOP. Cross-referencing with two other methods of protein structure classification, CATH and DALI, supports this view, indicating that the enriched information from suboptimal alignments is valuable for detecting more weakly related sequences. Finally, we have integrated position-specific gap penalties in PSI-BLAST, which is intensionally left out due to a theoretical limitation of its underlying Smith-Waterman score statistics. We also investigated several strategies to adjust the position-based gap costs derived from the forward-backward algorithm. The results show that the degree of conservedness calculated as a localized relative entropy from the position-specific substitution matrix is the most effective. Such enhancements further improve the sensitivity of PSI-BLAST for remote homology detection in database searches.
Committee
Mario Lauria (Advisor)
Pages
171 p.
Keywords
sequence alignment
;
sequence database searches
;
hybrid alignment
;
PSI-BLAST
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Li, Y. (2006).
Searching for remotely homologous sequences in protein databases with hybrid PSI-blast
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421
APA Style (7th edition)
Li, Yuheng.
Searching for remotely homologous sequences in protein databases with hybrid PSI-blast.
2006. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421.
MLA Style (8th edition)
Li, Yuheng. "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1164741421
Download Count:
995
Copyright Info
© 2006, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.