Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Pattern Recognition in Large Dimensional and Structured Datasets

Kurra, Goutham

Abstract Details

2002, MS, University of Cincinnati, Engineering : Computer Science.
Gene expression datasets obtained from DNA microarrays are examples of large-dimensional and structured datasets. In this thesis, we approach the task of applying pattern recognition techniques on gene expression data from an exploratory perspective. First, we develop methods and algorithms for the classification and prediction of cancer classes from large-dimensional gene expression data. We also develop algorithms for extracting the information content hidden within the compositions of the discovered classifiers. Second, we look at the problems of clustering structured gene expression data, such as temporal expression profiles, by introducing a method to cluster genes having partially similar profiles. We demonstrate the classification methods on a gene expression dataset containing two acute leukemia classes. A prioritized feature-selection approach is followed to account for incomplete knowledge of gene function and complex inter-gene dependencies. We utilize a combination of class scatter metrics and heuristic search algorithms to determine all those minimal combinations of genes that have potential to discriminate between the two leukemia classes. A modified perceptron training algorithm further trains the discriminant gene-sets. This process results in a large number of distinct and accurate classifiers. We present an algorithm which we then employ to mine these classifiers to discover ‘core’ patterns in their compositions. These gene-cores can be very useful to biologists searching for inter-gene dependencies and gene function. Most current clustering algorithms primarily cluster genes taking into account the entire feature set of conditions. However, it is of interest to discover groups of genes that are co-expressed only under certain conditions, especially when the data is structured. To address this need, we develop an ‘automatic partial-featureset clustering algorithm’ (APCA), and a set of heuristics, that can cluster genes according to partially similar expression profiles. The subset of features relevant to a particular clustering is chosen automatically as a part of the clustering process. We apply our algorithm on a synthetic dataset and contrast the results with those obtained by applying a standard K-means clustering algorithm on the same data.
Dr. Raj Bhatnagar (Advisor)
72 p.

Recommended Citations

Citations

  • Kurra, G. (2002). Pattern Recognition in Large Dimensional and Structured Datasets [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308

    APA Style (7th edition)

  • Kurra, Goutham. Pattern Recognition in Large Dimensional and Structured Datasets. 2002. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308.

    MLA Style (8th edition)

  • Kurra, Goutham. "Pattern Recognition in Large Dimensional and Structured Datasets." Master's thesis, University of Cincinnati, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308

    Chicago Manual of Style (17th edition)