Classification arises in a wide range of applications. A variety of
statistical tools have been developed for learning classification
rules from data. Understanding of their relative merits and
comparisons helps users to choose a proper method in practice. This
thesis focuses on theoretical comparison of model-based
classification methods in statistics with algorithmic methods in
machine learning in terms of the error rate.
Extending Efron's comparison of logistic regression with the LDA
under the normal setting, we compare classification methods based on
the limiting behaviour of the classification boundary of each
method. In doing so, we contrast such algorithmic methods as the
support vector machine and boosting with the LDA and logistic
regression and study their relative efficiencies. The analytical
results also indicate some bias in the support vector machine and
its variants, and we propose a proper modification for removing the
bias.
Besides the comparison of classification methods in efficiency, we
study their robustness to model-misspecification such as non normal
setting and mislabeling. In addition to the theoretical study, we
also present results from numerical experiments under various
settings for comparisons of finite-sample performance.