Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Developing Computational Tools for Evolutionary Inferences in Polyploids

Abstract Details

2018, Doctor of Philosophy, Ohio State University, Evolution, Ecology and Organismal Biology.
Methods for generating genome-scale data sets are facilitating the inference of phylogenetic relationships in non-model taxa across the Tree of Life. However, rapid speciation and heterogeneous patterns of diversification make this task difficult when gene trees have conflicting histories (e.g., from incomplete lineage sorting). For plant species in particular, additional complications arise due to the intermixing of divergent lineages through hybridization and the subsequent occurrence of whole genome duplication (WGD; i.e., allopolyploidy). Investigations regarding the evolutionary history of recently formed polyploids and their diploid progenitors are difficult to conduct because of problems with resolving ambiguous genotypes in the polyploids as well as analyzing species with different ploidies. The focus of my dissertation has been to develop models and bioinformatic tools for analyzing high-throughput sequencing (HTS) data collected in non-model taxa of different ploidy levels to estimate phylogenetic relationships. I am applying these tools in the plant genus Penstemon (Plantaginaceae) to infer the relationships in two groups of closely related species containing diploids, tetraploids, and hexaploids. The first chapter of my dissertation uses HTS data and a hierarchical Bayesian framework to estimate biallelic single nucleotide polymorphism (SNP) genotypes and allele frequencies in populations of any ploidy level (diploid or higher) assuming Hardy Weinberg equilibrium. It does this using Markov chain Monte Carlo (MCMC) to integrate over the uncertainty in the estimated genotypes. I then assess the model’s accuracy using simulations and test it on a SNP data set in autotetraploid potato (Solanum tuberosum). Both of these tests demonstrate the usefulness of the model for parameter inference at different ploidy levels. The MCMC algorithm that is used for inference is implemented in the open source R package polyfreqs. The set of models in my second chapter builds on Chapter 1 in two important ways. First, I extend the Hardy Weinberg equilibrium model to include inbreeding. Second, I directly address the hybrid nature of allopolyploid organisms by separately modeling the genomes of the two parental species. Using both simulations and empirical data sets from the literature (autopolyploid: Andropogon gerardii, allopolyploid: Betula pubescens + diploid parent: B. pendula), I benchmark these methods against other software (Genome Analysis Toolkit) to demonstrate their effectiveness for estimating genotypes. These new models also use a different algorithm for inferring population parameters, the expectation maximization algorithm, which I have implemented in the open source software package ebg. Chapter 3 uses ideas similar to those presented in the first two chapters, but focuses on inferring full haplotype sequences, rather than single SNP genotypes, for samples of arbitrary ploidy. The method is able to process paired-end HTS data collected using double-barcoded amplicon sequencing, and uses the program PURC to cluster sequencing reads into haplotypes. It then uses a multinomial likelihood to infer haplotypes while also accounting for sequencing error. The pipeline is implemented in the software Fluidigm2PURC, and I demonstrate its use on a polyploid series from the genus Thalictrum (Ranunculaceae). My final chapter uses nuclear amplicon sequencing to infer evolutionary relationships between two closely related groups in Penstemon: subsections Humiles and Proceri (Plantaginaceae). These two groups are known to hybridize and have documented cases of WGD events forming putative allotetraploids and allohexaploids. To estimate phylogeny in these two groups, I first use the methods described in Chapter 3 to determine haplotypes from paired-end HTS data for all diploid, tetraploid, and hexaploid individuals. I also develop a method for assessing the proportion of gene trees supporting a species-level quartet (quartet concordance factors; QCFs), which I use as input for estimating a species network using the program SNaQ. Phylogenies inferred using both species tree, and network, approaches recover subsections Humiles and Proceri as non-monophyletic. There is also strong evidence for hybridization within and between these two groups.
Andrea Wolfe (Advisor)
221 p.

Recommended Citations

Citations

  • Blischak, P. D. (2018). Developing Computational Tools for Evolutionary Inferences in Polyploids [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531400134548368

    APA Style (7th edition)

  • Blischak, Paul. Developing Computational Tools for Evolutionary Inferences in Polyploids. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1531400134548368.

    MLA Style (8th edition)

  • Blischak, Paul. "Developing Computational Tools for Evolutionary Inferences in Polyploids." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531400134548368

    Chicago Manual of Style (17th edition)