Pattern Recognition in Large Dimensional and Structured Datasets

Kurra, Goutham

Keyword Search

School Logo

ucin1014322308.pdf (371.08 KB)

Pattern Recognition in Large Dimensional and Structured Datasets

Author Info

Kurra, Goutham

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308

Year and Degree

2002, MS, University of Cincinnati, Engineering : Computer Science.

Abstract

Gene expression datasets obtained from DNA microarrays are examples of large-dimensional and structured datasets. In this thesis, we approach the task of applying pattern recognition techniques on gene expression data from an exploratory perspective. First, we develop methods and algorithms for the classification and prediction of cancer classes from large-dimensional gene expression data. We also develop algorithms for extracting the information content hidden within the compositions of the discovered classifiers. Second, we look at the problems of clustering structured gene expression data, such as temporal expression profiles, by introducing a method to cluster genes having partially similar profiles. We demonstrate the classification methods on a gene expression dataset containing two acute leukemia classes. A prioritized feature-selection approach is followed to account for incomplete knowledge of gene function and complex inter-gene dependencies. We utilize a combination of class scatter metrics and heuristic search algorithms to determine all those minimal combinations of genes that have potential to discriminate between the two leukemia classes. A modified perceptron training algorithm further trains the discriminant gene-sets. This process results in a large number of distinct and accurate classifiers. We present an algorithm which we then employ to mine these classifiers to discover ‘core’ patterns in their compositions. These gene-cores can be very useful to biologists searching for inter-gene dependencies and gene function. Most current clustering algorithms primarily cluster genes taking into account the entire feature set of conditions. However, it is of interest to discover groups of genes that are co-expressed only under certain conditions, especially when the data is structured. To address this need, we develop an ‘automatic partial-featureset clustering algorithm’ (APCA), and a set of heuristics, that can cluster genes according to partially similar expression profiles. The subset of features relevant to a particular clustering is chosen automatically as a part of the clustering process. We apply our algorithm on a synthetic dataset and contrast the results with those obtained by applying a standard K-means clustering algorithm on the same data.

Committee

Dr. Raj Bhatnagar (Advisor)

Pages

72 p.

Subject Headings

Computer Science

Keywords

feature selection; partial profile clustering; pattern recognition; clustering structured data; gene expression; data analysis

Kurra, G. (2002). Pattern Recognition in Large Dimensional and Structured Datasets [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308
APA Style (7th edition)
Kurra, Goutham. Pattern Recognition in Large Dimensional and Structured Datasets. 2002. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308.
MLA Style (8th edition)
Kurra, Goutham. "Pattern Recognition in Large Dimensional and Structured Datasets." Master's thesis, University of Cincinnati, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308
Chicago Manual of Style (17th edition)

Document number:

ucin1014322308

Download Count:

766

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Pattern Recognition in Large Dimensional and Structured Datasets

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Pattern Recognition in Large Dimensional and Structured Datasets

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations