Skip to Main Content
 

Global Search Box

 
 
 
 

Supplemental Files

File List


dexter.arff (23.52 MB)

hrse.plx (36.6 KB)

internet_ads.arff (9.84 MB)

isolet_test.arff (7.75 MB)

isolet_train.arff (30.96 MB)

madelon.arff (4.98 MB)

multiple_features.arff (6.42 MB)

rse.plx (54.34 KB)

ETD Abstract Container

Abstract Header

Classification in High Dimensional Feature Spaces through Random Subspace Ensembles

Pathical, Santhosh P.

Abstract Details

2010, Master of Science in Engineering, University of Toledo, Engineering (Computer Science).
This thesis presents an empirical simulation study on application of machine learning ensembles based on random subspace methodology to classification problems with high-dimensional feature spaces. The motivation is to address challenges associated with algorithm scalability, data sparsity and information loss due to the so-called curse of dimensionality. A simulation-based empirical study is conducted to assess the performance profile of the random subspace or subsample ensemble classifier for high dimensional spaces with up to 20,000 features. Subsampling rate and methodology, base learner type, base classifier count, and composition of base learners are among the parameters that were explored through the simulation study. The simulation study employed the WEKA Machine Learning Workbench and five datasets with large feature counts up to 20,000 from the UCI Machine Learning Repository. Machine learners naïve Bayes, k-nearest neighbor and C4.5 decision tree were used as base classifiers of the random subspace ensemble which used voting as the combiner method. Homegeneous (i.e. all base classifiers are based on a single machine learner type) as well as heterogeneous (base classifiers are a mix of multiple machine learners) random subspace ensembles were explored on the set of datasets for prediction accuracy, SAR and cpu time performance measures. Simulation study further investigated the effect of random sampling with replacement, random sampling without replacement, and partitioning techniques on the random subspace ensemble. Simulation results indicated that random subspace ensembles which employ as low as 10% to 15% subsampling rates, 25 or more base classifiers, mixed or hybrid composition of base learners, and random sampling without replacement perform competitively with other leading machine learning classifiers on the datasets evaluated. Results also showed in a more generalized context that the random subspace or subsample ensembles scaled up with the increases in the feature space dimensionality of datasets with respect to prediction accuracy and SAR performance measures, and the computational complexity.
Gursel Serpen, PhD (Advisor)
Mansoor Alam, PhD (Committee Member)
Suzan Orra, PhD (Committee Member)
258 p.

Recommended Citations

Citations

  • Pathical, S. P. (2010). Classification in High Dimensional Feature Spaces through Random Subspace Ensembles [Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1290024890

    APA Style (7th edition)

  • Pathical, Santhosh. Classification in High Dimensional Feature Spaces through Random Subspace Ensembles. 2010. University of Toledo, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=toledo1290024890.

    MLA Style (8th edition)

  • Pathical, Santhosh. "Classification in High Dimensional Feature Spaces through Random Subspace Ensembles." Master's thesis, University of Toledo, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1290024890

    Chicago Manual of Style (17th edition)