Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
KaiWang.dissertation.pdf (3.93 MB)
ETD Abstract Container
Abstract Header
DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS
Author Info
Wang, Kai
ORCID® Identifier
http://orcid.org/0000-0002-8541-7563
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901
Abstract Details
Year and Degree
2019, Doctor of Philosophy, Miami University, Cell, Molecular and Structural Biology (CMSB).
Abstract
This dissertation addresses four main questions in molecular biology with our new bioinformatics software tools and databases, as well as data analysis of Whole-Genome Bisulfite Sequencing data. Chapter I and VI give the overview of introduction and conclusions to the dissertation respectively. Chapter II titled -CRF: detection of CRISPR arrays using random forest‖, addresses the problem of detection CRISPR arrays from archaea and bacteria genomes. Here, we developed a new bioinformatics pipeline called -CRF‖ to detect CRISPR arrays from archaea and bacteria genomes with a random forest classifier, and offer easy-to-use interactive data visualizations to display the results. Compared to the popular CRISPR array detection tools, CRT, PILER-CR, and CRISPRDetect, CRF shows more comprehensive results with highly interactive data visualization. The paper is published in PeerJ: Wang, Kai, and Chun Liang. "CRF: detection of CRISPR arrays using random forest." PeerJ 5 (2017): e3219. Chapter III titled -SCRISPRdb: a CRISPRs database with sequence secondary structure and the relationship between the spacers and plasmids/virus‖, addresses the need of detecting and displaying CRISPR arrays in various microbe genomes using web-based bioinformatics tools and databases. We developed a new database called SCRISPRdb that covers over ten thousands of microbe genomes. Our database and web portals enable visualizations for DNA/RNA secondary structure for CRISPR repeats and spacers, repeat sequences logo, and CRISPR architectures. Our database was also implemented with BLAST function that can be used for investigating the origin of CRISPR spacers. Based on sequence alignment results, a force-directed network graph is provided in our database. Chapter IV titled -piRNN: Deep learning algorithm for piRNA Prediction‖, addresses the problem of accurate prediction of piwi-interacting RNAs (piRNAs) from small RNA data. Here, we developed piRNN, a tool for piRNAs prediction based on a convolutional neural network classifier with the deep learning package TensorFlow. piRNN trained four models for four species (Caenorhabditis elegans, Drosophila melanogaster, rat and human). Users can apply these four pre-trained models directly to small RNA data collected for these species small RNA data. In addition, our program provides scripts that can be used for model training or re-training for new data in new species or aforementioned four species. piRNN achieved very high (over 90%) accuracy, sensitivity, precision, specificity, and Matthews correlation coefficient. This paper is published in PeerJ: Wang K, Hoeksema J, Liang C. -piRNN: deep learning algorithm for piRNA prediction.‖ PeerJ 6(2018): e5429. Chapter V titled -Genome wide analysis of methylomes in chicken retina regeneration‖, investigates the DNA methylation level from chick retinal pigment epithelium at stage E4 developing RPE), E4 6 hours following retinectomy (injured RPE), and E4 6 hours following retinectomy in the presence of FGF2 (reprogrammed RPE). By using several Bio-conductor packages in R and other tools, over thousand differentially methylated regions (DMRs) were identified through the comparison between these three treatments. These DMRs are enriched with a large number of pathways including retina layer formation and embryonic eye morphogenesis. Our analyses detect the major DNA methylation difference between developing RPE, injured RPE, and reprogrammed RPE.
Committee
Chun Liang (Advisor)
Pages
109 p.
Subject Headings
Bioinformatics
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Wang, K. (2019).
DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS
[Doctoral dissertation, Miami University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901
APA Style (7th edition)
Wang, Kai.
DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS.
2019. Miami University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901.
MLA Style (8th edition)
Wang, Kai. "DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FOR CRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITE SEQUENCING DATA ANALYSIS." Doctoral dissertation, Miami University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
miami1546437447863901
Download Count:
146
Copyright Info
© 2019, all rights reserved.
This open access ETD is published by Miami University and OhioLINK.