Automatic and efficient data virtualization system for scientific datasets

WENG, LI

Keyword Search

School Logo

osu1154717945.pdf (958.04 KB)

Automatic and efficient data virtualization system for scientific datasets

Author Info

WENG, LI

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945

Year and Degree

2006, Doctor of Philosophy, Ohio State University, Computer and Information Science.

Abstract

There are a number of reasons why efficient access and high performance processing on scientific datasets are challenging. First, scientific datasets are typically stored as binary or character flat-files. Second, data servers need to efficiently serve increasing number of clients and types of queries as more data come online. To address these issues, we concentrated on the following areas: 1) Realizing data virtualization through automatically generated data services over scientific datasets. 2) Supporting data analysis processing by means of SQL-3 query and aggregations for the data virtualization system. 3) Designing new techniques toward efficient execution of data analysis queries using space partitioned partial replicas. 4) Generalizing the functionalities of the replica selection module according to two significant extensions. 5) Exploring the performance optimization potential of multiple queries over massive datasets. In view of the first challenge, we have developed a meta-data descriptor and compiler-oriented Data Virtualization System. We designed a meta-data description language that is used for specifying low-level characteristics of datasets. a scientist could explore a subset of interest and apply complex processing over them using declarative SQL-3 query and aggregations. Compiler algorithms using meta-data descriptor and analyzing aggregations were developed for generating efficient data subsetting service and data aggregation service automatically. In view of the second challenge, we investigated one type of optimization techniques Partial Replication. We proposed and implemented a greedy algorithm based on a cost metric to choose a best combination of partial replicas. Moreover, to generalize the work into a more realistic environment setting, we extended it for range and aggregate queries with both of space partitioned and attribute partitioned partial replicas. They could be unevenly or uniformly stored across distributed storage units. Using a new cost metric, a composite replica selection algorithm comprising of a set of dynamic programming strategy and greedy strategies are devised to resolve this problem. Finally, we further explore the optimization potential of executing multiple queries over massive datasets. These techniques are implemented into a Replica Selection Module which is coupled tightly with the overall architecture of our Automatic Data Virtualization System.

Committee

Gagan Agrawal (Advisor)

Pages

134 p.

Subject Headings

Computer Science

WENG, L. (2006). Automatic and efficient data virtualization system for scientific datasets [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945
APA Style (7th edition)
WENG, LI. Automatic and efficient data virtualization system for scientific datasets. 2006. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945.
MLA Style (8th edition)
WENG, LI. "Automatic and efficient data virtualization system for scientific datasets." Doctoral dissertation, Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1154717945
Chicago Manual of Style (17th edition)

Document number:

osu1154717945

Download Count:

658

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Automatic and efficient data virtualization system for scientific datasets

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Automatic and efficient data virtualization system for scientific datasets

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations