Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1186979749.pdf (2.24 MB)
ETD Abstract Container
Abstract Header
Memory- and knowledge-conscious data mining
Author Info
Ghoting, Amol
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749
Abstract Details
Year and Degree
2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Abstract
Advances in data collection and storage technologies have allowed organizations to collect increasing amounts of data. Spurred by these advances, the field of knowledge discovery in databases has emerged. The main challenge in the knowledge discovery process is to extract knowledge and insight from massive datasets in a fast and efficient manner. This process is iterative and involves a human in the loop. Therefore, to facilitate effective data understanding, it is imperative that one minimizes response-time to a user's query. To address this challenge, research efforts have largely focused on reducing the computation required to process a single data mining query. However, simply pursuing this direction is insufficient. In this dissertation, we explore two new directions to improve the performance of data mining algorithms. The first direction attempts to improve performance by understanding and improving the memory system performance of data mining algorithms. The second direction attempts to improve performance by redesigning a data mining algorithm such that it reuses computation. In the context of memory-conscious data mining, first, we present results of our study that delves into the memory system performance of data mining algorithms that are designed to operate over static datasets. Second, using the knowledge gleaned in the above investigation, we look at improving the cache performance of frequent pattern mining algorithms. We expect that the presented methodology will be useful in improving the performance of other data mining algorithms as well. Third, a scheduling scheme that is cognizant of the trade-off between response-time and memory usage, when processing and mining data streams, is presented. This scheme allows us to better use the memory system when mining distributed data streams. In the context of knowledge-conscious data mining, first, we show how one can redesign exploratory kMeans clustering such that it can expose and reuse repeated computation across iterations of a single kMeans query and multiple kMeans queries. Second, we present the design of a knowledge caching service for data mining algorithms. This service is easy to use, scalable, and allows for the reuse of computation across multiple users of a data mining system.
Committee
Srinivasan Parthasarathy (Advisor)
Subject Headings
Computer Science
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Ghoting, A. (2007).
Memory- and knowledge-conscious data mining
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749
APA Style (7th edition)
Ghoting, Amol.
Memory- and knowledge-conscious data mining.
2007. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749.
MLA Style (8th edition)
Ghoting, Amol. "Memory- and knowledge-conscious data mining." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1186979749
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1186979749
Download Count:
1,039
Copyright Info
© 2007, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.