Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1154717945.pdf (958.04 KB)
ETD Abstract Container
Abstract Header
Automatic and efficient data virtualization system for scientific datasets
Author Info
WENG, LI
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945
Abstract Details
Year and Degree
2006, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Abstract
There are a number of reasons why efficient access and high performance processing on scientific datasets are challenging. First, scientific datasets are typically stored as binary or character flat-files. Second, data servers need to efficiently serve increasing number of clients and types of queries as more data come online. To address these issues, we concentrated on the following areas: 1) Realizing data virtualization through automatically generated data services over scientific datasets. 2) Supporting data analysis processing by means of SQL-3 query and aggregations for the data virtualization system. 3) Designing new techniques toward efficient execution of data analysis queries using space partitioned partial replicas. 4) Generalizing the functionalities of the replica selection module according to two significant extensions. 5) Exploring the performance optimization potential of multiple queries over massive datasets. In view of the first challenge, we have developed a meta-data descriptor and compiler-oriented Data Virtualization System. We designed a meta-data description language that is used for specifying low-level characteristics of datasets. a scientist could explore a subset of interest and apply complex processing over them using declarative SQL-3 query and aggregations. Compiler algorithms using meta-data descriptor and analyzing aggregations were developed for generating efficient data subsetting service and data aggregation service automatically. In view of the second challenge, we investigated one type of optimization techniques Partial Replication. We proposed and implemented a greedy algorithm based on a cost metric to choose a best combination of partial replicas. Moreover, to generalize the work into a more realistic environment setting, we extended it for range and aggregate queries with both of space partitioned and attribute partitioned partial replicas. They could be unevenly or uniformly stored across distributed storage units. Using a new cost metric, a composite replica selection algorithm comprising of a set of dynamic programming strategy and greedy strategies are devised to resolve this problem. Finally, we further explore the optimization potential of executing multiple queries over massive datasets. These techniques are implemented into a Replica Selection Module which is coupled tightly with the overall architecture of our Automatic Data Virtualization System.
Committee
Gagan Agrawal (Advisor)
Pages
134 p.
Subject Headings
Computer Science
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
WENG, L. (2006).
Automatic and efficient data virtualization system for scientific datasets
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945
APA Style (7th edition)
WENG, LI.
Automatic and efficient data virtualization system for scientific datasets.
2006. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945.
MLA Style (8th edition)
WENG, LI. "Automatic and efficient data virtualization system for scientific datasets." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1154717945
Download Count:
658
Copyright Info
© 2006, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.