Specification, Configuration and Execution of Data-intensive Scientific Applications

Kumar, Vijay Shiv

Keyword Search

School Logo

osu1286570224.pdf (7.39 MB)

Specification, Configuration and Execution of Data-intensive Scientific Applications

Author Info

Kumar, Vijay Shiv

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1286570224

Year and Degree

2010, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

Recent advances in digital sensor technology and numerical simulations of real-world phenomena are resulting in the acquisition of unprecedented amounts of raw digital data. Terms like ‘data explosion’ and ‘data tsunami’ have come to describe the uncontrolled rate at which scientific datasets are generated by automated sources ranging from digital microscopes and telescopes to in-silico models simulating the complex dynamics of physical and biological processes. Scientists in various domains now have secure, affordable access to petabyte-scale observational data gathered over time, the analysis of which, is crucial to scientific discovery. The availability of commodity components have fostered the development of large distributed systems with high-performance computing resources to support the execution requirements of scientific data analysis applications. Increased levels of middleware support over the years have aimed to provide high scalability of application execution on these systems. However, the high-resolution, multi-dimensional nature of scientific datasets, and the complexity of analysis requirements present challenges to efficient application execution on such systems. Traditional brute-force analysis techniques to extract useful information from scientific datasets may no longer meet desired performance levels at extreme data scales.

This thesis builds on a comprehensive study involving multi-dimensional data analysis applications at large data scales, and identifies a set of advanced factors or parameters to this class of applications which can be customized in domain-specific ways to obtain substantial improvements in performance. A useful property of these applications is their ability to operate at multiple performance levels based on a set of trade-off parameters, while providing different levels of quality-of-service (QoS) specific to the application instance. To avail the performance benefits brought about by such factors, applications must be configured for execution in specific ways for specific systems. Middleware support for such domain-specific configuration is limited, and there is typically no integration across middleware layers to this end. Low-level manual configuration of applications within a large space of solutions is error-prone and tedious.

This thesis proposes an approach for the development and execution of large scientific multi-dimensional data analysis applications that takes multiple performance parameters into account and supports the notion of domain-specific configuration-as-a-service. My research identifies various aspects that go into the creation of a framework for user-guided, system-directed performance optimizations for such applications. The framework seeks to achieve this goal by integrating software modules that (i) provide a unified, homogeneous model for the high-level specification of any conceptual knowledge that may be used to configure applications within a domain, (ii) perform application configuration in response to user directives, i.e., use the specifications to translate high-level requirements into low-level execution plans optimized for a given system, and (iii) carry out the execution plans on the underlying system in an efficient and scalable manner. A prototype implementation of the framework that integrates several middleware layers is used for evaluating our approach. Experimental results gathered for real-world application scenarios from the domains of astronomy and biomedical imaging demonstrate the utility of our framework towards meeting the scientific performance requirements at very large data scales.

Committee

P Sadayappan, PhD (Advisor)
Joel Saltz, MD, PhD (Committee Member)
Gagan Agrawal, PhD (Committee Member)
Umit Catalyurek, PhD (Committee Member)

Pages

267 p.

Subject Headings

Computer Science

Keywords

data-intensive computing; high performance computing; scientific workflow; out-of-core; multi-dimensional data; semantic modeling

Kumar, V. S. (2010). Specification, Configuration and Execution of Data-intensive Scientific Applications [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1286570224
APA Style (7th edition)
Kumar, Vijay. Specification, Configuration and Execution of Data-intensive Scientific Applications. 2010. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1286570224.
MLA Style (8th edition)
Kumar, Vijay. "Specification, Configuration and Execution of Data-intensive Scientific Applications." Doctoral dissertation, Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1286570224
Chicago Manual of Style (17th edition)

Document number:

osu1286570224

Download Count:

2,520

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Specification, Configuration and Execution of Data-intensive Scientific Applications

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Specification, Configuration and Execution of Data-intensive Scientific Applications

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations