Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
lszdissertation-1.pdf (875.89 KB)
ETD Abstract Container
Abstract Header
K-groups: A Generalization of K-means by Energy Distance
Author Info
Li, Songzi
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1428583805
Abstract Details
Year and Degree
2015, Doctor of Philosophy (Ph.D.), Bowling Green State University, Statistics.
Abstract
We propose two distribution-based clustering algorithms called K-groups. Our algorithms group the observations in one cluster if they are from a common distribution. Energy distance is a non-negative measure of the distance between distributions that is based on Euclidean distances between random observations, which is zero if and only if the distributions are identical. We use energy distance to measure the statistical distance between two clusters, and search for the best partition which maximizes the total between clusters energy distance. To implement our algorithms, we apply a version of Hartigan and Wong's moving one point idea, and generalize this idea to moving any m points. We also prove that K-groups is a generalization of the K-means algorithm. K-means is a limiting case of the K-groups generalization, with common objective function and updating formula in that case. K-means is one of the well-known clustering algorithms. From previous research, it is known that K-means has several disadvantages. K-means performs poorly when clusters are skewed or overlapping. K-means can not handle categorical data, because the mean is not a good estimate of center. K-means can not be applied when dimension exceeds sample size. Our K-groups methods provide a practical and effective solution to these problems. Simulation studies on the performance of clustering algorithms for univariate and multivariate mixture distributions are presented. Four validation indices (diagonal, Kappa, Rand and corrected Rand) are reported for each example in the simulation study. Results of the empirical studies show that both K-groups algorithms perform as well as K-means when clusters are well-separated and spherically shaped, but K-groups algorithms perform better than K-means when clusters are skewed or overlapping. K-groups algorithms are more robust than K-means with respect to outliers. Results are presented for three multivariate data sets, wine cultivars, dermatology diseases and oncology cases. In our real data examples, the performance of both K-groups algorithms are better than the performance of K-means in each case.
Committee
Rizzo Maria (Advisor)
Rump Christopher (Other)
Chen Hanfeng (Committee Member)
Wei Ning (Committee Member)
Pages
120 p.
Subject Headings
Statistics
Keywords
K-groups
;
K-means
;
Clustering analysis
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Li, S. (2015).
K-groups: A Generalization of K-means by Energy Distance
[Doctoral dissertation, Bowling Green State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1428583805
APA Style (7th edition)
Li, Songzi.
K-groups: A Generalization of K-means by Energy Distance.
2015. Bowling Green State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1428583805.
MLA Style (8th edition)
Li, Songzi. "K-groups: A Generalization of K-means by Energy Distance." Doctoral dissertation, Bowling Green State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1428583805
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
bgsu1428583805
Download Count:
2,847
Copyright Info
© 2015, some rights reserved.
K-groups: A Generalization of K-means by Energy Distance by Songzi Li is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by Bowling Green State University and OhioLINK.