Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Memory- and knowledge-conscious data mining

Ghoting, Amol

Abstract Details

2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Advances in data collection and storage technologies have allowed organizations to collect increasing amounts of data. Spurred by these advances, the field of knowledge discovery in databases has emerged. The main challenge in the knowledge discovery process is to extract knowledge and insight from massive datasets in a fast and efficient manner. This process is iterative and involves a human in the loop. Therefore, to facilitate effective data understanding, it is imperative that one minimizes response-time to a user's query. To address this challenge, research efforts have largely focused on reducing the computation required to process a single data mining query. However, simply pursuing this direction is insufficient. In this dissertation, we explore two new directions to improve the performance of data mining algorithms. The first direction attempts to improve performance by understanding and improving the memory system performance of data mining algorithms. The second direction attempts to improve performance by redesigning a data mining algorithm such that it reuses computation. In the context of memory-conscious data mining, first, we present results of our study that delves into the memory system performance of data mining algorithms that are designed to operate over static datasets. Second, using the knowledge gleaned in the above investigation, we look at improving the cache performance of frequent pattern mining algorithms. We expect that the presented methodology will be useful in improving the performance of other data mining algorithms as well. Third, a scheduling scheme that is cognizant of the trade-off between response-time and memory usage, when processing and mining data streams, is presented. This scheme allows us to better use the memory system when mining distributed data streams. In the context of knowledge-conscious data mining, first, we show how one can redesign exploratory kMeans clustering such that it can expose and reuse repeated computation across iterations of a single kMeans query and multiple kMeans queries. Second, we present the design of a knowledge caching service for data mining algorithms. This service is easy to use, scalable, and allows for the reuse of computation across multiple users of a data mining system.
Srinivasan Parthasarathy (Advisor)

Recommended Citations

Citations

  • Ghoting, A. (2007). Memory- and knowledge-conscious data mining [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749

    APA Style (7th edition)

  • Ghoting, Amol. Memory- and knowledge-conscious data mining. 2007. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749.

    MLA Style (8th edition)

  • Ghoting, Amol. "Memory- and knowledge-conscious data mining." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749

    Chicago Manual of Style (17th edition)