Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Khavari Sepideh PDF A APPROVED.pdf (1.24 MB)
ETD Abstract Container
Abstract Header
Predicting Human and Animal Protein Subcellular Location
Author Info
Khavari, Sepideh
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ysu1472463855
Abstract Details
Year and Degree
2016, Master of Science in Mathematics, Youngstown State University, Department of Mathematics and Statistics.
Abstract
An important objective in cell biology is to determine the subcellular location of different proteins and their functions in the cell. Identifying the subcellular location of proteins can be accomplished either by using biochemical experiments or by developing computational predictors that aid in predicting the subcellular location of proteins. Since the former method is both time-consuming and expensive, the computational predictors provide a more advantageous and efficient method of solving the problem. Computational predictors are also ideal in solving the problem of predicting protein subcellular locations since the number of newly discovered proteins have been increasing tremendously as a result of the genome sequencing project. The main objective of this study is to use several different classifiers to predict the subcellular location of animal and human proteins and to determine which of these classifiers performs the best in predicting protein subcellular location. The data for this study was obtained from The Universal Protein Resource (UniProt) which is a database of protein sequence and annotation. Therefore, by accessing UniProt Knowledgebase (UniProt KB), the human and animal proteins that were manually reviewed and annotated (Swiss-Prot) were chosen for this study. A reliable benchmark dataset is obtained by following and applying criteria established in earlier studies for predicting protein subcellular locations. After applying the above criteria to the original dataset, the working benchmark dataset includes 2944 protein sequences. The subcellular locations of these proteins are the nucleus (1001 proteins), the cytoplasm (540 proteins), the secreted (436 proteins), the mitochondria (328 proteins), the cell membrane (286 proteins), the endoplasmic reticulum (207 proteins), the Golgi apparatus (86 proteins), the peroxisome (30 proteins), and the lysosome (30 proteins). Therefore, there are 9 different subcellular locations for proteins in this dataset. The method used for representing proteins in the study is the pseudo-amino acid composition (PseAA composition) adapted from earlier studies. The predictors used to predict the subcellular location of proteins in animal and human include Random Forest, Adaptive Boosting (AdaBoost), and Stage-wise Additive Modeling using a Multi-class Exponential loss function (SAMME), Support Vector Machines (SVMs), and Artificial Neural Networks (ANNs). The results from this study establish that the SVMs classifier yielded the best overall accuracy for predicting the subcellular location of proteins. Most of the computational classifiers used in this study produced better prediction results for determining the subcellular location of proteins in the nucleus, the secreted, and the cell membrane. The secreted and the cell membrane locations had high specificity values with all of the classifiers used in this study. The nucleus had the best prediction results, including a high sensitivity and a high MCC value by using the Bagging method.
Committee
Andy Chang, PhD (Advisor)
Jay Kerns, PhD (Committee Member)
Jack Min, PhD (Committee Member)
Pages
68 p.
Subject Headings
Biology
;
Statistics
Keywords
Protein
;
Subcellular location
;
Computational predictors
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Khavari, S. (2016).
Predicting Human and Animal Protein Subcellular Location
[Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1472463855
APA Style (7th edition)
Khavari, Sepideh.
Predicting Human and Animal Protein Subcellular Location.
2016. Youngstown State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1472463855.
MLA Style (8th edition)
Khavari, Sepideh. "Predicting Human and Animal Protein Subcellular Location." Master's thesis, Youngstown State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1472463855
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ysu1472463855
Download Count:
575
Copyright Info
© 2016, all rights reserved.
This open access ETD is published by Youngstown State University and OhioLINK.