Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ucin1014322308.pdf (371.08 KB)
ETD Abstract Container
Abstract Header
Pattern Recognition in Large Dimensional and Structured Datasets
Author Info
Kurra, Goutham
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308
Abstract Details
Year and Degree
2002, MS, University of Cincinnati, Engineering : Computer Science.
Abstract
Gene expression datasets obtained from DNA microarrays are examples of large-dimensional and structured datasets. In this thesis, we approach the task of applying pattern recognition techniques on gene expression data from an exploratory perspective. First, we develop methods and algorithms for the classification and prediction of cancer classes from large-dimensional gene expression data. We also develop algorithms for extracting the information content hidden within the compositions of the discovered classifiers. Second, we look at the problems of clustering structured gene expression data, such as temporal expression profiles, by introducing a method to cluster genes having partially similar profiles. We demonstrate the classification methods on a gene expression dataset containing two acute leukemia classes. A prioritized feature-selection approach is followed to account for incomplete knowledge of gene function and complex inter-gene dependencies. We utilize a combination of class scatter metrics and heuristic search algorithms to determine all those minimal combinations of genes that have potential to discriminate between the two leukemia classes. A modified perceptron training algorithm further trains the discriminant gene-sets. This process results in a large number of distinct and accurate classifiers. We present an algorithm which we then employ to mine these classifiers to discover ‘core’ patterns in their compositions. These gene-cores can be very useful to biologists searching for inter-gene dependencies and gene function. Most current clustering algorithms primarily cluster genes taking into account the entire feature set of conditions. However, it is of interest to discover groups of genes that are co-expressed only under certain conditions, especially when the data is structured. To address this need, we develop an ‘automatic partial-featureset clustering algorithm’ (APCA), and a set of heuristics, that can cluster genes according to partially similar expression profiles. The subset of features relevant to a particular clustering is chosen automatically as a part of the clustering process. We apply our algorithm on a synthetic dataset and contrast the results with those obtained by applying a standard K-means clustering algorithm on the same data.
Committee
Dr. Raj Bhatnagar (Advisor)
Pages
72 p.
Subject Headings
Computer Science
Keywords
feature selection
;
partial profile clustering
;
pattern recognition
;
clustering structured data
;
gene expression
;
data analysis
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Kurra, G. (2002).
Pattern Recognition in Large Dimensional and Structured Datasets
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308
APA Style (7th edition)
Kurra, Goutham.
Pattern Recognition in Large Dimensional and Structured Datasets.
2002. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308.
MLA Style (8th edition)
Kurra, Goutham. "Pattern Recognition in Large Dimensional and Structured Datasets." Master's thesis, University of Cincinnati, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1014322308
Download Count:
766
Copyright Info
© 2002, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.