Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Application of biclustering algorithms to biological data

Abstract Details

2012, Master of Science, Ohio State University, Computer Science and Engineering.

Microarrays have made it possible to cheaply collect large gene expression datasets. Biclustering has become established as a popular method for mining patterns in these datasets. Biclustering algorithms simultaneously cluster rows and columns of the data matrix; this approach is well suited to gene expression data because genes are not related across all samples, and vice versa. In the past decade many biclustering algorithms that specifically target gene expression data have been published. However, only a few are commonly used in bioinformatics pipelines. There are a few reasons for this omission: implementations for only a small fraction of these algorithms have been published. Those that have been published have different interfaces, and there are few comparisons of algorithms or guidelines for choosing among them in the literature.

In this thesis we address three problems: the development of an efficient and effective biclustering algorithm, the development of a software framework for biclustering tasks, and a comprehensive benchmark of biclustering techniques.

We improved the Correlated Patterns Biclustering (CPB) algorithm's running time and accuracy by modifying its heuristic for evaluating rows and columns for inclusion in a bicluster. This calculation was previously performed by an iterative approach, but we developed a more computationally efficient method. We further improved CPB by removing unnecessary parameters and developing a nonparametric method for filtering irrelevant biclusters.

To provide a common interface and also enable comparison of biclustering algorithms, we developed a Python package for bicluster analysis, which we introduce in this thesis. This package, BiBench, provides wrappers to twelve biclustering algorithms, as well as functionality for generating synthetic data, downloading gene expression data, transforming datasets, and validating biclusters.

Using BiBench we compared twelve algorithms, including the modified version of CPB. The algorithms were tested on synthetic datasets for their ability to recover specific bicluster models, resist noise, recover multiple biclusters, and recover overlapping biclusters. They were also tested on gene expression data; gene ontology enrichment was used to identify biologically relevant biclusters.

Umit Catalyurek, PhD (Advisor)
Srinivasan Parthasarathy, PhD (Committee Member)
91 p.

Recommended Citations

Citations

  • Eren, K. (2012). Application of biclustering algorithms to biological data [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492

    APA Style (7th edition)

  • Eren, Kemal. Application of biclustering algorithms to biological data. 2012. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492.

    MLA Style (8th edition)

  • Eren, Kemal. "Application of biclustering algorithms to biological data." Master's thesis, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492

    Chicago Manual of Style (17th edition)