Recent years have seen an exponential growth in publicly available genetic data for many organisms. To be scientifically or medically useful, the genetic data must be mapped to the physical traits that the genes in the genotype code. In this dissertation, we describe methods to find correlations between genotypes and phenotypes using phylogenetic trees that can be applied on a genome-wide scale. We first describe Felsenstein's argument showing the necessity of using phylogenetic trees when a genotype-phenotype correlation is calculated. Then, we propose a method using a modified Maddison's Concentrated Changes Test (CCT) to find correlations between a binary phenotype and a binary genotype. The applicability of this method is demonstrated by its use to find genes correlated with susceptibility to anthrax in inbred mice strains.
As our programs can be used to correlate any two binary variables which can be optimized on a phylogenetic tree, it was used to find correlations between avian influenza strains and various traits of the species or organisms affected. In particular, we find correlations between spread of influenza and particular mutations in the influenza virus. We demonstrate its applicability in case of a continuous phenotype that has been suitably binarized by finding genes correlated with cholesterol and lipid levels in inbred mice and report results.
The limitation of CCT to binary phenotypes is significant as most phenotypes are not binary in nature. We develop a method that can be used to find correlations between a continuous phenotype and a binary genotype using a phylogenetic tree. Randomization testing is used to assess the significance of the correlation between the genotype and the phenotype. We test our methods by correlating lipid levels in inbred mice with their genotype. Comparison of our results with literature surveys of previous in silicomethods as well as experimental results show that our method performs favorably.