Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

ESTIMATION OF HAPLOTYPE FREQUENCIES FROM DATA ON UNRELATED PEOPLE

Sinha, Moumita

Abstract Details

2007, Doctor of Philosophy, Case Western Reserve University, Epidemiology and Biostatistics.
The estimation of haplotype frequencies has become important because it has been shown that using haplotypes frequencies instead of individual single nucleotide polymorphisms (SNPs) often provides higher power for genetic association studies (Olson and Wijsman, 1994). Several algorithms or methods have been proposed in the literature (Excoffier and Slatkin, 1995, Hawley and Kidd, 1995, Lin et al. 2002, Niu et al., 2002, Stephens et al, 2001a, Qin et al, 2002) for estimating haplotype frequencies. Some of the most popular methods have been using the expectation-maximization (EM) maximum likelihood (ML) algorithm to obtain the maximum-likelihood estimates, and the Bayesian approach using a coalescent prior, the latter as incorporated in the software PHASE. However, a major drawback of these methods is the number of parameters that have to be estimated and hence the number of loci the algorithms can handle, especially when the number of individuals in the sample is large. Here we propose for case-control data a novel method to estimate haplotype frequencies, called the limited linkage disequilibrium (LLD) algorithm that requires the estimation of many fewer parameters, and hence can accommodate a larger number of loci. Haplotypes that are found to be significantly associated with disease in this way can then be further studied with a view to finding disease-causing genetic variants. We first estimate the allele frequencies, then the linkage disequilibrium (LD) coefficients for all possible combinations of two loci by an exact estimation procedure on the assumption that the allele frequencies are known. Then we successively estimate all combinations of three- and four- locus linkage disequilibrium coefficients, at each stage fixing the estimates obtained so far. The haplotype frequencies are then expressed in terms of these estimated allele frequencies and linkage disequilibrium coefficients. Because we limit the number of stages, assuming the higher order disequilibrium coefficients are zero, our method estimates the haplotype frequencies as functions of fewer parameters and hence can handle a larger number of loci, even when the sample size is large. The LLD algorithm estimates the haplotype frequencies efficiently with absolute errors of estimates that are minimal. Also, the estimates are virtually unaffected by deviations from Hardy Weinberg Equilibrium even though the method assumes that the Hardy Weinberg Equilibrium holds at the loci.
Robert Elston (Advisor)

Recommended Citations

Citations

  • Sinha, M. (2007). ESTIMATION OF HAPLOTYPE FREQUENCIES FROM DATA ON UNRELATED PEOPLE [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1164801319

    APA Style (7th edition)

  • Sinha, Moumita. ESTIMATION OF HAPLOTYPE FREQUENCIES FROM DATA ON UNRELATED PEOPLE. 2007. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1164801319.

    MLA Style (8th edition)

  • Sinha, Moumita. "ESTIMATION OF HAPLOTYPE FREQUENCIES FROM DATA ON UNRELATED PEOPLE." Doctoral dissertation, Case Western Reserve University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=case1164801319

    Chicago Manual of Style (17th edition)