Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Kuntala Prashant Kumar.pdf (1.2 MB)
ETD Abstract Container
Abstract Header
Optimizing Biomarkers From an Ensemble Learning Pipeline
Author Info
Kuntala, Prashant Kumar
ORCID® Identifier
http://orcid.org/0000-0002-3372-0691
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1503592057943043
Abstract Details
Year and Degree
2017, Master of Science (MS), Ohio University, Electrical Engineering & Computer Science (Engineering and Technology).
Abstract
Understanding gene expression pattern is crucial in deciphering any observed biological phenotypes. Transcription factors (TF) are proteins that regulate genes by binding to a transcription factor binding site (TFBS) within the promoter region of a gene. Motif discovery is a computational approach that conventionally uses stochastic models, enumeration methods and many other techniques to report candidate motifs (TFBS). These methods generate similar motifs for a TF due to various reasons. Motif selection algorithms successfully identify a small set of motifs that address the specificity problem and coverage problem in motif discovery. However, these selected motifs do not always capture all the binding site preferences for a TF. This study verifies the hypothesis that motif discovery tools generate similar motifs for a transcription factor and once these variants (similar motifs) are identified, they can be used to form a super motif set, which may improve the accuracy of motif discovery. This study introduces the concept of Super motif set, a new model to accurately predict the binding sites for a TF. Two heuristic algorithms are introduced to identify Super motif sets, utilizing motif selection algorithms and a motif comparison tool. These super motif sets identified, capture the biological diversity in TFBS preferences of a TF. The algorithms are valuated on ChIP-seq data for 54 TF factor groups from the ENCODE project. Moreover, the proposed algorithms are used to optimize the motifs that are reported by motif selection algorithms and to report super motif sets in three case studies: Chagas disease, pollen specific HRGP genes in Arabidopsis thaliana and Shigellosis. On an average two motif variants are added to the selected motifs, which improve the accuracy of motif discovery.
Committee
Frank Drews (Advisor)
Lonnie Welch (Committee Chair)
Jundong Liu (Committee Member)
Erin Murphy (Committee Member)
Pages
89 p.
Subject Headings
Bioinformatics
;
Biology
;
Biomedical Research
;
Computer Engineering
;
Computer Science
;
Genetics
;
Molecular Biology
Keywords
Motif Discovery
;
Motif Selection
;
Super Motif Set
;
Transcription Factor
;
Heuristic algorithm
;
DNA Motifs
;
Ensemble Learning
;
Genomics
;
ENCODE
;
Chagas disease
;
Shigellosis
;
Bioinformatics
;
Computational Biology
;
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Kuntala, P. K. (2017).
Optimizing Biomarkers From an Ensemble Learning Pipeline
[Master's thesis, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1503592057943043
APA Style (7th edition)
Kuntala, Prashant Kumar.
Optimizing Biomarkers From an Ensemble Learning Pipeline.
2017. Ohio University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1503592057943043.
MLA Style (8th edition)
Kuntala, Prashant Kumar. "Optimizing Biomarkers From an Ensemble Learning Pipeline." Master's thesis, Ohio University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1503592057943043
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ohiou1503592057943043
Download Count:
263
Copyright Info
© 2017, all rights reserved.
This open access ETD is published by Ohio University and OhioLINK.