Discovering protein functions is a major task in computational biology, since proteins have key roles in the underlying mechanisms of cellular processes, phenotypes, and diseases. Most common in silico protein annotation method is function transfer through sequence homology that does not always produce correct results. Consequently, we propose an alternative research direction of assigning functional annotations to proteins (and genes) based on biological network information.
In general, our approach is to transfer functionality between related proteins. We present our approaches in three parts:
1. In the first part, we compute the probabilistic significance of GO annotation sequences obtained from the annotations of a sequence of proteins in a protein-protein interaction network. After identifying significant annotation sequences, we predict the annotation of a target protein by picking the most significant candidate GO annotation sequence observed in the close neighborhood of the target protein. Our cross-validation prediction experiments with pre-annotated proteins recovered correct annotations of proteins with 81% precision with the recall at 45%.
2. In the second part, we develop and evaluate a new pattern-based function annotation framework. For a given target protein P, and for each GO term t, we compare (through graph alignment) neighborhood of P with neighborhoods of proteins annotated by t. We then assign to P the GO term whose neighborhoods are the most similar to the neighborhood of P. In this part, we improve the accuracy of techniques introduced in the first part, by 30.44%, 41.94%, and 2.62% in the organism-specific networks of fly, worm, and yeast, respectively.
3. In the third part, we present a technique that improves our pattern-based methodologies with an iterative prediction algorithm. In this part, by using a multi-iteration algorithm, we predict functions of protein P at one step, and employ predicted functions of P for fine-tuning the predictions of other target proteins at a later step. Plugging in the iterative prediction algorithm improves the accuracy of pattern-based function annotation framework presented in the second part by 11.24%, 14.32%, 5.6%, and 15.14% in organism-specific networks of fly, human, worm, and yeast, respectively.