Generally speaking, association studies aim to find ties between a
given trait, most commonly a disease, and the location of a causative
gene. A subset of these studies uses case-control data in which there
are both affected and unaffected individuals. The basic idea behind
these studies is to compare genotype frequencies of those who have a
disease, or other trait, with those who do not. By analyzing
frequency patterns, researchers can pick up signals along each
chromosome that indicate an association with disease status.
Depending on the type of disease under study, different mapping
approaches are necessary. One way to classify diseases is by the
underlying structure of mutations that cause them. While association
studies may be used to effectively map Mendelian diseases in which a
single mutation with high penetrance gives rise to a disease, the more
challenging problem is that of mapping complex diseases.
For our purposes, the genotypes are made up of single nucleotide polymorphisms (SNPs) typed
along the chromosome for a sample of individuals. The phenotype data
reflect the affected status of an individual with respect to a certain
disease. Along with the observed genotype and
phenotype data, we use information about the sample's unknown common
genealogy. The genealogies are estimated under the coalescent model with recombination and are represented by
ancestral recombination graphs (ARGs). Such a genealogy, when
accurately estimated, can provide information about possible
disease-causing mutations that have occurred in the common history.
We propose a new method of disease mapping via the coalescent, which we
refer to as ARGlik. Our method implements a fast ARG estimation
program and performs likelihood-based association testing. We use an existing algorithm,
implemented in the software program MARGARITA, to
estimate the genealogy. After
estimating a genealogy for a given sample, we compute the
likelihood of the phenotype data given the genealogy. If there is a disease association at a particular SNP in the data, we expect to see a non-random clustering
of cases and controls within the genealogy.
To check the performance of ARGlik, we compared our method against other coalescent-based methods as well as the standard chi-squared
approach. Our simulation study includes data ranging from simple
one-locus disease models to disease models with an external covariate. Results show that ARGlik
performs as well as the coalescent methods for the one-locus disease
models while maintaining a lower false positive rate for the no
disease model. Moreover, ARGlik performs well in its
ability to detect association in the presence of a covariate. As a
final check on the program, we test three chromosomes for association
with type 1 diabetes.