Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

An Investigation into the Evolution of Nucleotide Composition in the Human Genome

Abstract Details

2019, Master of Science in Biomedical Sciences (MSBS), University of Toledo, Biomedical Sciences (Bioinformatics and Proteomics/Genomics).
Every human has about 100 novel mutations that are absent in the genomes of his/her parents. This intense influx of mutations degrades information that is stored in the DNA sequences and, at the same time, provides an opportunity for creation of new genetic messages. Currently, over one hundred million mutations have been characterized in the public databases. The dynamics of mutation have been investigated for decades in both experiments and sophisticated mathematical models, yet our understanding of genome evolution is still ambiguous. In this project, we computationally processed eighty million human mutations to get clear answers to basic questions about DNA evolution. Specifically, how is the non-randomness in nucleotide composition in vast genomic regions maintained? What biological forces preserve sequence non-randomness from being degraded by novel mutations? Our goal was to uncover peculiarities in dynamics of G+C nucleotide content and evaluate the equilibrium of GC-percentage in the human genome. We found that novel mutations that convert G:C pairs into A:T pairs are 1.39 times more frequent than opposite mutations that change A:T → G:C. This effect is more striking if we take into account the fact that the total number of G:C pairs (42%) is significantly less than the number of A:T pairs (58%). Hence, calculating per nucleotide pair, the mutations of G:C → A:T is 1.93 times more frequent than A:T → G:C mutations. Such bias should create fewer and fewer G:C pairs in the genomes from generation to generation, until it reaches equilibrium at 34% of GC-composition. However, the GC-percentage of the human genome is stable at 42%. There are two possible biological processes that may be responsible for preserving GC-composition from degradation: i) natural selection or ii) biased gene conversion. However, estimated parameters for both processes are unable to explain the maintenance of CG-percentage. We re-evaluated the biased gene conversion parameters and rates that might explain GC-composition. The vast majority of our genome is represented by intergenic regions and introns. The effects of mutations inside these two noncoding regions are practically impossible to evaluate. We generally cannot classify these mutations as increasing or decreasing fitness, or measure their effects. In contrast, the effects of some of the mutations in protein-coding regions, that occupy only 1.2% of the human genome, may be quantifiable. Learning to measure the ratio of synonymous to non-synonymous mutations in coding regions was profoundly important, and revealed important rules in population genetics. Human Genomes have about 5% of regions with extreme nucleotide compositions. These include chromosomal segments with A+T-rich, G+C-rich, purine-rich (pyrimidine-rich), G+T-rich (A+C-rich) and alternating purine/pyrimidine sequences (that may form Z-DNA structures). We called such sequence patterns, exhibiting profound biases in nucleotide composition, Genomic MRI (Mid-Range Inhomogeneity). Genomic-MRI regions may form special DNA structures (e.g. H-DNA, Z-DNA) and are non-randomly distributed along the genome. At least some of them have known biological roles. The best understood are the G+C-rich sequences that organize CG-islands in promoters of many genes, and are the targets of DNA methylation. Genomic MRI-regions allow us to quantify the effect of mutations inside them, because mutations may decrease or increase the nucleotide bias in these regions. For example, A→C, A→G, T→C, and T→G mutations increase GC-composition in G+C-rich sequences, G→A, G→T, C→A, and C→T decrease the GC-composition, while A→T, T→A, G→C, and C→G are neutral to G+C-richness. In this project, we examined how mutations change genomic-MRI regions, and explore the biological forces that maintain these genomic-MRI structures during evolution despite the constant mutational pressure to equilibrium and randomness. We found that the point mutations in MRI preferentially degrade the nucleotide inhomogeneity, decreasing the biases in their nucleotide composition. The level of mutational degradation by novel SNPs was observed to be highest for G+C-rich MRIs and least for the A+T-rich MRIs. Older SNPs (those broadly widespread across populations) showed a decrease in the level of degradation compared to the Novel SNPs. Furthermore, we found that re-evaluation of Biased Gene Conversion parameters could explain how the GC content is preserved despite the bias in mutations.
Alexei Fedorov (Committee Chair)
Robert Blumenthal (Committee Member)
Sadik Khuder (Committee Member)
84 p.

Recommended Citations

Citations

  • Paudel, R. (2019). An Investigation into the Evolution of Nucleotide Composition in the Human Genome [Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=mco1564404055416097

    APA Style (7th edition)

  • Paudel, Rajan. An Investigation into the Evolution of Nucleotide Composition in the Human Genome. 2019. University of Toledo, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=mco1564404055416097.

    MLA Style (8th edition)

  • Paudel, Rajan. "An Investigation into the Evolution of Nucleotide Composition in the Human Genome." Master's thesis, University of Toledo, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=mco1564404055416097

    Chicago Manual of Style (17th edition)