Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Scalable mining on emerging architectures

Buehrer, Gregory T

Abstract Details

2008, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Data mining is the process of discovering interesting and previously unknown information from such stores. Recent advances in data collection technology have generated large-scale data stores. However, as the ability to collect and store data increases, our ability to make use of the data degrades. This degradation is due to two main issues. First, the utility of efficient serial algorithms to mine such data is often lowered because this data is distributed on multiple machines. Second, since the complexity of most mining algorithms is polynomial or even exponential with the input size, even on parallel machines the runtimes exceed practical limitations. This dissertation addresses the concerns of mining large data stores by providing improvements to the state of the art in two directions. First, mining algorithms are restructured to glean the benefits of emerging commodity hardware. Second, we identify a set of useful patterns, and formulate algorithms for extracting them in log-linear time, enabling scalable performance on large datasets. We make several contributions towards data mining on emerging commodity hardware. First, we leverage the parallel disks, memory and width and large number of processors to mine for exact global patterns in tera-scale distributed data sets. We provide a 10-fold improvement to the existing state of the art in distributed mining. Second, we leverage the improved computational throughput of emerging CMPs to provide nearly-linear scale-ups for shared-memory parallel structure mining. Third, we explore data mining on the Cell processor, re-architecting clustering, classification and outlier detection to glean significant runtime improvements. In addition, we identify an important class of itemset patterns that can be extracted in log-linear time, improving significantly on the scalability afforded typically by frequent itemset mining algorithms. We then show how our technique can be used to compress large graphs, improving on the state of the art in web compression. Finally, we leverage the techniques developed to build a general-purpose placement service for distributed data mining centers. The placement has low complexity and provides highly scalable solutions to many common data mining queries.
Srinivasan Parthasarathy (Advisor)
264 p.

Recommended Citations

Citations

  • Buehrer, G. T. (2008). Scalable mining on emerging architectures [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625

    APA Style (7th edition)

  • Buehrer, Gregory. Scalable mining on emerging architectures. 2008. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625.

    MLA Style (8th edition)

  • Buehrer, Gregory. "Scalable mining on emerging architectures." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1198866625

    Chicago Manual of Style (17th edition)