Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Searching for remotely homologous sequences in protein databases with hybrid PSI-blast

Abstract Details

2006, Doctor of Philosophy, Ohio State University, Biophysics.
Sequence alignment is one of the fundamental techniques used in molecular biology. It has been widely used in many biological applications, such as protein classification, gene finding, homology modeling, structure and function prediction, phylogenetic analysis and database annotation. In high sensitivity sequence homology database searches, progressive sequence model refinement by means of iterative searches is an effective method and is currently employed in many popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. We have demonstrated that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. We also evaluated the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. In addition, we have exploited other benefits of the hybrid alignment. We show that incorporating information about the suboptimal alignments, otherwise ignored in PSI-BLAST, already improves the sensitivity of PSI-BLAST. In one experiment, we have found a set of sequences on which our tool disagrees with the classification given by SCOP. Careful examination points to a possible misclassification in SCOP. Cross-referencing with two other methods of protein structure classification, CATH and DALI, supports this view, indicating that the enriched information from suboptimal alignments is valuable for detecting more weakly related sequences. Finally, we have integrated position-specific gap penalties in PSI-BLAST, which is intensionally left out due to a theoretical limitation of its underlying Smith-Waterman score statistics. We also investigated several strategies to adjust the position-based gap costs derived from the forward-backward algorithm. The results show that the degree of conservedness calculated as a localized relative entropy from the position-specific substitution matrix is the most effective. Such enhancements further improve the sensitivity of PSI-BLAST for remote homology detection in database searches.
Mario Lauria (Advisor)
171 p.

Recommended Citations

Citations

  • Li, Y. (2006). Searching for remotely homologous sequences in protein databases with hybrid PSI-blast [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421

    APA Style (7th edition)

  • Li, Yuheng. Searching for remotely homologous sequences in protein databases with hybrid PSI-blast. 2006. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421.

    MLA Style (8th edition)

  • Li, Yuheng. "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421

    Chicago Manual of Style (17th edition)