Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Using Hadoop to Cluster Data in Energy System

Abstract Details

2015, Master of Computer Science (M.C.S.), University of Dayton, Computer Science.
With the large amount of data generated by various devices, data scientists face big challenges since conditional machine learning algorithms applied on a single computer can no longer be used for processing/analyzing such large data sets. This thesis takes a distributed computing approach built upon Apache Hadoop, which is a distributed data analysis framework running on multiple computers. The main components of this work includes implementation of k-means machine learning algorithms on the Hadoop Map-Reduce framework, processing raw data from real energy systems, classifying the data using k-means algorithms in Hadoop, and improvement on seed selection for k-means algorithms. Finally, this thesis demonstrates the efficiency and effectiveness of our approach using different data sets.
Zhongmei Yao (Committee Chair)
Mehdi Zargham (Committee Member)
Saverio Perugini (Committee Member)
51 p.

Recommended Citations

Citations

  • Hou, J. (2015). Using Hadoop to Cluster Data in Energy System [Master's thesis, University of Dayton]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1430092547

    APA Style (7th edition)

  • Hou, Jun. Using Hadoop to Cluster Data in Energy System. 2015. University of Dayton, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=dayton1430092547.

    MLA Style (8th edition)

  • Hou, Jun. "Using Hadoop to Cluster Data in Energy System." Master's thesis, University of Dayton, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1430092547

    Chicago Manual of Style (17th edition)