The protein folding problem is an ongoing challenge, and even though there have been significant advances in our understanding of proteins, accurately predicting the effect of amino acid mutations on the structure and stability of a protein remains a challenge. This makes the task of engineering proteins to suit our purposes labor intensive as significant trial and error is involved. In this thesis, we have explored possibilities of better understanding and if possible improving some bioinformatics and computational methods to study proteins and also to engineer them.
A significant portion of this thesis is based on bioinformatics approaches of consensus and correlation analyses. We have illustrated how consensus and correlation metrics can be calculated and analyzed to explore various aspects of protein structure. Proteins triosephosphate isomerase and Cu, Zn superoxide dismutase were studied using these approaches and we found that a significant amount of information about a protein fold is encoded at the consensus level; however, the effect of amino acid correlations, while subtle, is significant nonetheless and some of the failures of consensus approach can be attributed to broken amino acid correlations. We also successfully engineered thermostabilized and active variants of the triosephosphate isomerase protein using correlation, phylogeny, and consensus information.
In another project, the DNA binding domain of the E. coli Cra protein was used to evaluate the accuracy of some computational predictions. Based on the experimental data we came to the conclusion that computational prediction of the exact effect of a mutation on the structure and especially on the stability of a protein is challenging, and even the best computational tools available today have substantial room for improvement. We are carrying out extensive mutagenesis of the DNA binding domain of the E. coli Cra protein to compile a database of biophysical data to aid further development of computational techniques.
The studies described above involved making several mutants of proteins and we also discuss the development of a new plasmid vector system that was used to clone many of these mutants.