Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Amino Acid Properties Provide Insight to a Protein’s Subcellular Location

Abstract Details

2016, Master of Computing and Information Systems, Youngstown State University, Department of Computer Science and Information Systems.
Current approaches of predicting subcellular locations of proteins located in a cell have made some advances but are far from perfect. Accurately predicting these locations result in better annotations of that protein and provide clearer pictures of its functions. We approach this problem by using a chaos game representation of the sequence based on physical and chemical properties of amino acids. We then split the resulting graph into two related discrete series, which is then subjected to wavelet transformation. The wavelet transformation data is then used as input for our classification algorithms. We observe the accuracy of how well each property predicts the correct subcellular location. We aim to achieve above the threshold of 45 percent accuracy, which is the average of existing general sub-cellular predic- tors. For our study protein sequences were obtained from Uniprot’s freely acces- sible repositories. We parsed data from five different classes, consisting of plant, fungal, mammal, human, and rodent proteins. We accommodate 10 subcellular locations: Nucleus, Membrane, Cytoplasm, Endoplasmic Reticulum, Secreted, Mi- tochondria, Cell Membrane, Vacuole, Golgi Apparatus, and Chloroplast. Protein sequences comprised of 20 amino acids are sorted into groups of four based on the selected property of amino acids. These groups allow the sequence to be plotted using 2-dimension chaos game theory. The resulting graph retains the sequence order in numerical form. Looking at the graph with a human eye we can’t deduce any information. To address this, we split the graph into two related discrete series based on the x-axis and y-axis. We then use a 3-level Haar wavelet transformation. Each level provides us with a detail coefficient vector the length of our sequence. For each detail coefficient vector we calculate the mean, min, max, and standard deviation. This provides us with 24 features to be used as input for classification. We run a variety of classifiers to assess the importance of amino acid properties.
Alina Lazar, PhD (Advisor)
Xiangjia Min, PhD (Committee Member)
Feng Yu, PhD (Committee Member)
29 p.

Recommended Citations

Citations

  • Powell, B. T. (2016). Amino Acid Properties Provide Insight to a Protein’s Subcellular Location [Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1484694077480789

    APA Style (7th edition)

  • Powell, Brian. Amino Acid Properties Provide Insight to a Protein’s Subcellular Location. 2016. Youngstown State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1484694077480789.

    MLA Style (8th edition)

  • Powell, Brian. "Amino Acid Properties Provide Insight to a Protein’s Subcellular Location." Master's thesis, Youngstown State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1484694077480789

    Chicago Manual of Style (17th edition)