Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ucin1182226868.pdf (2.6 MB)
ETD Abstract Container
Abstract Header
FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS
Author Info
VISA, SOFIA
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868
Abstract Details
Year and Degree
2007, PhD, University of Cincinnati, Engineering : Computer Science.
Abstract
This thesis proposes a fuzzy set - based classifier for imbalanced data sets, that is when one class, the majority class, or the data set provided for it, is much larger than the other class, the minority class. Current machine learning classification algorithms are biased to the majority class, and therefore perform poorly in recognition of the minority class. The experiments in this thesis show that the proposed classifier eliminates to a large extent this bias by considering a fuzzy set from frequency class representation that takes into account class size. In addition, it also analyzes the effect on the classifier of other characteristics of data such as overlap, complexity, and size, in combination with the imbalance factor. Capabilities and limitations of the proposed fuzzy classifiers are extensively investigated along a range of data sets that combine imbalance with the above factors. The relation of the proposed fuzzy classifier with another, often used frequency - based classifier, namely the Naive Bayes classifier, is considered. A theoretical result indicates that Naive Bayes classifier is a particular case of the fuzzy classifier presented here. More precisely, it is shown that the Bayes classification criterion, the Bayes score, can be obtained as a particular case of constructing the fuzzy set, and hence the fuzzy classifier. Finally, for cases where data re-balancing is necessary, e.g. extremely imbalanced data, an up-sampling algorithm that incorporates information about the whole data set, such as imbalance and distances between and within classes, is proposed.
Committee
Dr. Anca Ralescu (Advisor)
Pages
157 p.
Subject Headings
Computer Science
Keywords
fuzzy classifiers
;
imbalanced data
;
machine learning
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
VISA, S. (2007).
FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868
APA Style (7th edition)
VISA, SOFIA.
FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS.
2007. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868.
MLA Style (8th edition)
VISA, SOFIA. "FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS." Doctoral dissertation, University of Cincinnati, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1182226868
Download Count:
1,439
Copyright Info
© 2007, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.