Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1198866625.pdf (1.47 MB)
ETD Abstract Container
Abstract Header
Scalable mining on emerging architectures
Author Info
Buehrer, Gregory T
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625
Abstract Details
Year and Degree
2008, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Abstract
Data mining is the process of discovering interesting and previously unknown information from such stores. Recent advances in data collection technology have generated large-scale data stores. However, as the ability to collect and store data increases, our ability to make use of the data degrades. This degradation is due to two main issues. First, the utility of efficient serial algorithms to mine such data is often lowered because this data is distributed on multiple machines. Second, since the complexity of most mining algorithms is polynomial or even exponential with the input size, even on parallel machines the runtimes exceed practical limitations. This dissertation addresses the concerns of mining large data stores by providing improvements to the state of the art in two directions. First, mining algorithms are restructured to glean the benefits of emerging commodity hardware. Second, we identify a set of useful patterns, and formulate algorithms for extracting them in log-linear time, enabling scalable performance on large datasets. We make several contributions towards data mining on emerging commodity hardware. First, we leverage the parallel disks, memory and width and large number of processors to mine for exact global patterns in tera-scale distributed data sets. We provide a 10-fold improvement to the existing state of the art in distributed mining. Second, we leverage the improved computational throughput of emerging CMPs to provide nearly-linear scale-ups for shared-memory parallel structure mining. Third, we explore data mining on the Cell processor, re-architecting clustering, classification and outlier detection to glean significant runtime improvements. In addition, we identify an important class of itemset patterns that can be extracted in log-linear time, improving significantly on the scalability afforded typically by frequent itemset mining algorithms. We then show how our technique can be used to compress large graphs, improving on the state of the art in web compression. Finally, we leverage the techniques developed to build a general-purpose placement service for distributed data mining centers. The placement has low complexity and provides highly scalable solutions to many common data mining queries.
Committee
Srinivasan Parthasarathy (Advisor)
Pages
264 p.
Subject Headings
Computer Science
Keywords
Data mining
;
CMPs
;
high performance
;
log-linear pattern mining
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Buehrer, G. T. (2008).
Scalable mining on emerging architectures
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625
APA Style (7th edition)
Buehrer, Gregory.
Scalable mining on emerging architectures.
2008. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625.
MLA Style (8th edition)
Buehrer, Gregory. "Scalable mining on emerging architectures." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1198866625
Download Count:
1,883
Copyright Info
© 2007, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.