Selecting a set of highly discriminant genes for biological samples is an important task for designing highly efficient classifiers using DNA microarray data. The wavelet transform is a very common tool in signal processing applications, but its potential in the analysis of microarray gene expression data is yet to be explored fully.
In this thesis, a simple wavelet based feature selection method is presented that assigns scores to genes for differentiating samples between two classes. The term ‘gene expression signal’ is used to refer to the gene expression levels across a set of pre-grouped samples. The expression signal is decomposed using several levels of the wavelet transform. The scoring method is based on the observation that the third level 1-D wavelet approximation of a gene expression signal captures the differential expression levels of the gene between two classes. The genes with the highest scores are selected to form a feature set to be used for sample classification. The method was implemented using MATLAB®. Experiments based on three real microarray gene expression datasets were carried out to examine the efficiency of the method. The classification performance of the method was compared to two standard filter based methods: the t-test and BSS/WSS methods using the 3-Nearest Neighbor Classifier. The results show that the wavelet-based method performs at least as well as the sum of squares and the wavelet based method in classifying cancer samples.
The results demonstrate that 1-D wavelet analysis can be a useful tool for studying gene expression patterns across pre-grouped samples.