Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS

Abstract Details

2007, PhD, University of Cincinnati, Engineering : Computer Science.
This thesis proposes a fuzzy set - based classifier for imbalanced data sets, that is when one class, the majority class, or the data set provided for it, is much larger than the other class, the minority class. Current machine learning classification algorithms are biased to the majority class, and therefore perform poorly in recognition of the minority class. The experiments in this thesis show that the proposed classifier eliminates to a large extent this bias by considering a fuzzy set from frequency class representation that takes into account class size. In addition, it also analyzes the effect on the classifier of other characteristics of data such as overlap, complexity, and size, in combination with the imbalance factor. Capabilities and limitations of the proposed fuzzy classifiers are extensively investigated along a range of data sets that combine imbalance with the above factors. The relation of the proposed fuzzy classifier with another, often used frequency - based classifier, namely the Naive Bayes classifier, is considered. A theoretical result indicates that Naive Bayes classifier is a particular case of the fuzzy classifier presented here. More precisely, it is shown that the Bayes classification criterion, the Bayes score, can be obtained as a particular case of constructing the fuzzy set, and hence the fuzzy classifier. Finally, for cases where data re-balancing is necessary, e.g. extremely imbalanced data, an up-sampling algorithm that incorporates information about the whole data set, such as imbalance and distances between and within classes, is proposed.
Dr. Anca Ralescu (Advisor)
157 p.

Recommended Citations

Citations

  • VISA, S. (2007). FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868

    APA Style (7th edition)

  • VISA, SOFIA. FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS. 2007. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868.

    MLA Style (8th edition)

  • VISA, SOFIA. "FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS." Doctoral dissertation, University of Cincinnati, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868

    Chicago Manual of Style (17th edition)