Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS

Abstract Details

2019, Doctor of Philosophy, Miami University, Cell, Molecular and Structural Biology (CMSB).
This dissertation addresses four main questions in molecular biology with our new bioinformatics software tools and databases, as well as data analysis of Whole-Genome Bisulfite Sequencing data. Chapter I and VI give the overview of introduction and conclusions to the dissertation respectively. Chapter II titled -CRF: detection of CRISPR arrays using random forest‖, addresses the problem of detection CRISPR arrays from archaea and bacteria genomes. Here, we developed a new bioinformatics pipeline called -CRF‖ to detect CRISPR arrays from archaea and bacteria genomes with a random forest classifier, and offer easy-to-use interactive data visualizations to display the results. Compared to the popular CRISPR array detection tools, CRT, PILER-CR, and CRISPRDetect, CRF shows more comprehensive results with highly interactive data visualization. The paper is published in PeerJ: Wang, Kai, and Chun Liang. "CRF: detection of CRISPR arrays using random forest." PeerJ 5 (2017): e3219. Chapter III titled -SCRISPRdb: a CRISPRs database with sequence secondary structure and the relationship between the spacers and plasmids/virus‖, addresses the need of detecting and displaying CRISPR arrays in various microbe genomes using web-based bioinformatics tools and databases. We developed a new database called SCRISPRdb that covers over ten thousands of microbe genomes. Our database and web portals enable visualizations for DNA/RNA secondary structure for CRISPR repeats and spacers, repeat sequences logo, and CRISPR architectures. Our database was also implemented with BLAST function that can be used for investigating the origin of CRISPR spacers. Based on sequence alignment results, a force-directed network graph is provided in our database. Chapter IV titled -piRNN: Deep learning algorithm for piRNA Prediction‖, addresses the problem of accurate prediction of piwi-interacting RNAs (piRNAs) from small RNA data. Here, we developed piRNN, a tool for piRNAs prediction based on a convolutional neural network classifier with the deep learning package TensorFlow. piRNN trained four models for four species (Caenorhabditis elegans, Drosophila melanogaster, rat and human). Users can apply these four pre-trained models directly to small RNA data collected for these species small RNA data. In addition, our program provides scripts that can be used for model training or re-training for new data in new species or aforementioned four species. piRNN achieved very high (over 90%) accuracy, sensitivity, precision, specificity, and Matthews correlation coefficient. This paper is published in PeerJ: Wang K, Hoeksema J, Liang C. -piRNN: deep learning algorithm for piRNA prediction.‖ PeerJ 6(2018): e5429. Chapter V titled -Genome wide analysis of methylomes in chicken retina regeneration‖, investigates the DNA methylation level from chick retinal pigment epithelium at stage E4 developing RPE), E4 6 hours following retinectomy (injured RPE), and E4 6 hours following retinectomy in the presence of FGF2 (reprogrammed RPE). By using several Bio-conductor packages in R and other tools, over thousand differentially methylated regions (DMRs) were identified through the comparison between these three treatments. These DMRs are enriched with a large number of pathways including retina layer formation and embryonic eye morphogenesis. Our analyses detect the major DNA methylation difference between developing RPE, injured RPE, and reprogrammed RPE.
Chun Liang (Advisor)
109 p.

Recommended Citations

Citations

  • Wang, K. (2019). DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS [Doctoral dissertation, Miami University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901

    APA Style (7th edition)

  • Wang, Kai. DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS. 2019. Miami University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901.

    MLA Style (8th edition)

  • Wang, Kai. "DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS." Doctoral dissertation, Miami University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901

    Chicago Manual of Style (17th edition)