Application of biclustering algorithms to biological data

Eren, Kemal

Keyword Search

School Logo

osu1332533492.pdf (2.61 MB)

Application of biclustering algorithms to biological data

Author Info

Eren, Kemal

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492

Year and Degree

2012, Master of Science, Ohio State University, Computer Science and Engineering.

Abstract

Microarrays have made it possible to cheaply collect large gene expression datasets. Biclustering has become established as a popular method for mining patterns in these datasets. Biclustering algorithms simultaneously cluster rows and columns of the data matrix; this approach is well suited to gene expression data because genes are not related across all samples, and vice versa. In the past decade many biclustering algorithms that specifically target gene expression data have been published. However, only a few are commonly used in bioinformatics pipelines. There are a few reasons for this omission: implementations for only a small fraction of these algorithms have been published. Those that have been published have different interfaces, and there are few comparisons of algorithms or guidelines for choosing among them in the literature.

In this thesis we address three problems: the development of an efficient and effective biclustering algorithm, the development of a software framework for biclustering tasks, and a comprehensive benchmark of biclustering techniques.

We improved the Correlated Patterns Biclustering (CPB) algorithm's running time and accuracy by modifying its heuristic for evaluating rows and columns for inclusion in a bicluster. This calculation was previously performed by an iterative approach, but we developed a more computationally efficient method. We further improved CPB by removing unnecessary parameters and developing a nonparametric method for filtering irrelevant biclusters.

To provide a common interface and also enable comparison of biclustering algorithms, we developed a Python package for bicluster analysis, which we introduce in this thesis. This package, BiBench, provides wrappers to twelve biclustering algorithms, as well as functionality for generating synthetic data, downloading gene expression data, transforming datasets, and validating biclusters.

Using BiBench we compared twelve algorithms, including the modified version of CPB. The algorithms were tested on synthetic datasets for their ability to recover specific bicluster models, resist noise, recover multiple biclusters, and recover overlapping biclusters. They were also tested on gene expression data; gene ontology enrichment was used to identify biologically relevant biclusters.

Committee

Umit Catalyurek, PhD (Advisor)
Srinivasan Parthasarathy, PhD (Committee Member)

Pages

91 p.

Subject Headings

Computer Science

Keywords

biclustering; data mining; gene expression; microarray

Eren, K. (2012). Application of biclustering algorithms to biological data [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492
APA Style (7th edition)
Eren, Kemal. Application of biclustering algorithms to biological data. 2012. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492.
MLA Style (8th edition)
Eren, Kemal. "Application of biclustering algorithms to biological data." Master's thesis, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492
Chicago Manual of Style (17th edition)

Document number:

osu1332533492

Download Count:

2,934

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Application of biclustering algorithms to biological data

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Application of biclustering algorithms to biological data

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations