Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles of Conservation and Correlation in Triosephosphate Isomerase

Sullivan, Brandon Joseph

Abstract Details

2011, Doctor of Philosophy, Ohio State University, Biochemistry Program, Ohio State.

The structure, function and dynamics of proteins are determined by the physical and chemical properties of their amino acids. Unfortunately, the information encapsulated within a position or between positions is poorly understood. Multiple sequence alignments of protein families allow us to interrogate these questions statistically. Here, we describe the characterization of bioinformatically-designed variants of triosephosphate isomerase (TIM). First, we review the state-of-the-art for engineering proteins with increased stability. We examine two methodologies that benefit from the availability of large numbers - high-throughput screening and sequence statistics of protein families. Second, we have deconvoluted what properties are encoded within a position (conservation) and between positions (correlations) by designing TIMs in which each position is the most common amino acid in the multiple sequence alignment. We found that a consensus TIM from a raw sequence database performs the complex isomerization reaction with weak activity as a dynamic molten globule. Furthermore, we have confirmed that the monomeric species is the catalytically active conformation despite being designed from 600+ dimeric proteins. A second consensus TIM from a curated dataset is well folded, has wild-type activity and is dimeric, but it only differs from the raw consensus TIM at 35 nonconserved positions. These two TIMs differ in the fraction of dataset sequences from eukaryotes and prokaryotes. These distribution differences have led to the breaking and altering of networks of statistical correlations at nonconserved positions which we demonstrate with mutual information and subset perturbation calculations. Additionally, we show that the curated consensus TIM is an extreme thermostable enzyme. The protein remains half folded at 95 °C and may be the only TIM to completely refold after thermal denaturation.

Third, we wished to understand the determinants of protein stability -- one of biochemistry's most difficult questions. It has been shown that consensus mutations improve the stability of native proteins approximately half the time, but there is no a priori technique to predict which consensus mutations will be stabilizing. We have developed a double-sieve filter that selects stabilizing mutations based on extent of conservation and statistical independence from other positions within the multiple sequence alignment. These two mathematical tests reliably predict stabilizing mutations with greater than 90% accuracy. The statistical algorithm was used to select 15 consensus mutations that together, significantly improved the melting temperature of wild-type TIM .

Finally, we designed and characterized a model system for testing the effects of statistically correlated residues. The TIM-knockout from the Keio Collection was engineered for T7 expression and tested for TIM activity complementation. The single gene knockout exhibits differential growth that correlates well to in vitro specific activities. The design and characterization of two libraries are proposed to test the relationship between correlations and protein fitness.

Thomas Magliery, PhD (Advisor)
Mark Foster, PhD (Committee Member)
William Ray, PhD (Committee Member)
Mark Peeples, PhD (Committee Member)
245 p.

Recommended Citations

Citations

  • Sullivan, B. J. (2011). Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles of Conservation and Correlation in Triosephosphate Isomerase [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1325106135

    APA Style (7th edition)

  • Sullivan, Brandon. Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles of Conservation and Correlation in Triosephosphate Isomerase. 2011. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1325106135.

    MLA Style (8th edition)

  • Sullivan, Brandon. "Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles of Conservation and Correlation in Triosephosphate Isomerase." Doctoral dissertation, Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1325106135

    Chicago Manual of Style (17th edition)