Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thaliana

Yerardi, Jason T.

Abstract Details

2011, Master of Science (MS), Ohio University, Computer Science (Engineering and Technology).

As a result of the dynamic and progressive nature of modern biological research, new data-related problems are continuously being uncovered, resulting in a growing need for bioinformatics-based solutions. One current and active research area in bioinformatics is the classification of proteins into distinct protein families and varying levels of subfamilies. As part of this thesis research, a highly extensible, module-based software platform was developed to provide a centralized graphical user interface for configuring, executing, and analyzing the results of in silico biological analyses for the automation of the protein classification process. This comprehensive bioinformatics software, aptly named "BioOhio," and its related biological analysis algorithms are detailed in this thesis.

Since the BioOhio platform and underlying protein analysis algorithms are generally applicable to any plant species, the fully sequenced, widely used model organism Arabidopsis thaliana was chosen as the test data set for the developed bioinformatics software. A significant advantage of this choice was the extensive volume of existing work on this species, including well-defined protein classification criteria for the initial protein classification problems addressed by this thesis.

In particular, this thesis initially focused on the development of algorithms for the classification of the hydroxyproline-rich glycoprotein (HRGP) superfamily of plant cell wall proteins into its three basic protein families: (1) arabinogalactan-proteins (AGPs), (2) extensins (EXTs), and (3) proline-rich proteins (PRPs). These basic classification and related protein analysis algorithms provided a firm foundation for the primary focus of this thesis research, which was the development of techniques and algorithms for the further classification of the AGP protein family into distinct subfamilies.

At the time of this research, the classification criteria and accepted protein family members for each of the basic HRGP protein families of Arabidopsis thaliana were well established. However, research on the classification of AGPs into proper subfamilies was still in its early stages. As a result, there were only basic generally accepted AGP protein subfamilies and associated characteristics. In particular, the existing work resulted in the general acceptance of the following three primary AGP subfamilies: (1) Classical AGPs, (2) AG-Peptides, and (3) Fasciclin-Like AGPs. However, there was clearly a need for much more thorough research in this particular area, thus making it an excellent prospect for novel biological and bioifnroamtics research in the HRGP research community.

In determining the formal criteria to be utilized in the evaluation of the AGP subfamily classification algorithms developed in this thesis research, it was generally agreed upon that the the existing work of Carolyn Schultz et al. was the most complete, accurate, and well known research on AGP subfamilies at the time. As described as part of the thesis results, a 100% accuracy rate was achieved in replicating all of this existing work's findings. In addition, the developed algorithms identified several new members of the Classical AGP and AG-Peptide subfamilies. These results and other primary contributions of this thesis are summarized below.

1. Co-developer of the highly customizable, extensible, and reusable module-based bioinformatics software platform BioOhio 2. The development of comprehensive classification algorithms and software to classify the AGP protein family in Arabidopsis thaliana into distinct subfamilies

3. The identification of an error in AGP classification results in a seminal AGP publication, which resulted from an error in the source code of the software which was used to generate the publication's results

4. A 100% accuracy rate of all developed AGP subfamily classification algorithms in correctly identifying and classifying known AGPs, based on the reported findings of Schultz et al.

5. The identification and classification of nine new members of the AGP protein family in Arabidopsis thaliana (in addition to those reported by Schultz et al.)

* Four new members of the Classical AGP subfamily: At1g31250, At1g63540, At4g16980, and At4g40090

* Five new members of the AG-Peptide subfamily: At1g51915, At2g41905, At3g20865, At5g12880, and At5g24105

Frank Drews, PhD (Advisor)
Frank Drews, PhD (Committee Chair)
Lonnie R. Welch, PhD (Committee Co-Chair)
Jundong Liu, PhD (Committee Member)
Allan M. Showalter, PhD (Committee Member)
100 p.

Recommended Citations

Citations

  • Yerardi, J. T. (2011). The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thaliana [Master's thesis, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1301069861

    APA Style (7th edition)

  • Yerardi, Jason. The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thaliana. 2011. Ohio University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1301069861.

    MLA Style (8th edition)

  • Yerardi, Jason. "The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thaliana." Master's thesis, Ohio University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1301069861

    Chicago Manual of Style (17th edition)