A Grid-based Middleware for Scalable Processing of Remote Data

Glimcher, Leonid S.

Keyword Search

School Logo

osu1211302238.pdf (1.02 MB)

A Grid-based Middleware for Scalable Processing of Remote Data

Author Info

Glimcher, Leonid S.

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1211302238

Year and Degree

2008, Doctor of Philosophy, Ohio State University, Computer and Information Science.

Abstract

As scientific simulations are generating large amounts of data,analyzing this data to gain insights into scientific phenomena is increasingly becoming a challenge. With the emergence of grid computing, analysis of large geographically distributed scientific datasets, also referred to as distributed data-intensive science, has emerged as an important area in recent years. It is our belief that a middleware supporting remote datamining would make the development of remote data analysis applications more efficient and less time consuming, allowing the programmer to concentrate on specifying the processing to be performed on data, rather than efficiency of data retrieval or scalability.

In this thesis, we present design and evaluation of a middleware that targets mining data resident on remote repositories, and supports a high-level interface for developing data mining and scientific data processing applications. Our middleware, referred to as FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grids), is based on a precursor system, FREERIDE, created to provide run-time parallelization support for performing generalized reduction computations on locally stored data.

In its final implementation our middleware is used for mining data resident on SRB-based servers, and uses Storage Resource Broker (which is a de facto standard for remote data access) for both data retrieval and its delivery to the processing site. This implementation was evaluated using 5 data processing applications developed for our middleware. We have also conducted an in depth study of how performance of the SRB-based implementation is effected by size of the unit of the remote I/O request, I/O concurrency, and limited network bandwidth available for data transfer.

In order to make our middleware compliant with the grid computing standards, we have also integrated the compute node client component of our SRB-based implementation with Globus Toolkit and MPICH-G2. As a part of this work we evaluated the overhead of using the pre-WS components of the Globus Toolkit for middleware deployment, and found such overhead to be quite modest.

In order to facilitate dataset replica and computing resource selection process, an accurate performance prediction framework was also developed as a part of our middleware. The approach we use to model performance considers a breakdown of application execution time into data retrieval, data communication, and data processing component, and leverage our familiarity with the structure of computation supported by FREERIDE-G. Also, based on where data to be processed has been generated or how it is shared, interesting load balancing and scheduling considerations may arise. Our middleware supports efficient processing of data from geographically distributed sources through a load balancing resource allocation and scheduling algorithm, which minimizes the total time spent on processing the data. To solve this scheduling problem, we consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data, and supporting data integration in cases of vertical partitioning.

Committee

Gagan Agrawal, PhD (Advisor)
Sadayappan P, PhD (Committee Member)
Ferhatosmanoglu Hakan, PhD (Committee Member)

Pages

214 p.

Subject Headings

Computer Science

Keywords

grid computing; middleware; remote data mining; remote data processing; data grids

Glimcher, L. S. (2008). A Grid-based Middleware for Scalable Processing of Remote Data [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211302238
APA Style (7th edition)
Glimcher, Leonid. A Grid-based Middleware for Scalable Processing of Remote Data. 2008. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1211302238.
MLA Style (8th edition)
Glimcher, Leonid. "A Grid-based Middleware for Scalable Processing of Remote Data." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211302238
Chicago Manual of Style (17th edition)

Document number:

osu1211302238

Download Count:

1,054

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

A Grid-based Middleware for Scalable Processing of Remote Data

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

A Grid-based Middleware for Scalable Processing of Remote Data

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations