Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Efficient data scheduling for real-time large-scale data-intensive distributed applications

Eltayeb, Mohammed Soleiman

Abstract Details

2004, Doctor of Philosophy, Ohio State University, Electrical Engineering.
Data staging is a method for organizing data transfer in heterogeneous distributed computing environments. Real-time large-scale data intensive applications are distributed applications that frequently generate, utilize and access large-scale data sets with constraints for completion times. These applications are currently emerging in many areas of science and engineering. The data-intensive operations performed by these applications –including transfer of large data sets (order of 100’s of Gigabyte transfer), can quickly consume network and compute resources and hence incur degraded overall performance for the applications in these distributed networked environments. This research proposes several solutions that enable efficient data staging to solve this vital problem in heterogeneous distributed computing. Our solutions, mainly, maximize the satisfiability of the applications by designing efficient data scheduling heuristics for such systems. Two optimization models for this maximization were proposed: one model for optimizing the overall satisfiability and the other for the optimizing autonomous applications. In this research we introduce and develop deterministic solutions for the problem. We propose three main algorithms: Two for dynamic setting and one for static. Our main static data scheduling heuristic, called Concurrent Scheduling over Extended Partial Path (CS/EPP) heuristic, is developed based on assumed data staging model for optimizing overall satisfiability. CS/EPP is based on EPP which is an independent heuristic that computes schedules dynamically for online input data sets. The resulting schedules are adjusted as the various data sets are staged in the system. CS/EPP, also, allows concurrent staging of various data sets and hence improves the optimization procedure. A static non-greedy heuristic called Blocking Analysis Concurrent Scheduling (BACS) is also proposed in this research for improving the data staging performance at the cost of computational complexity when applications arrive in batches. Two heuristic methods are proposed for BACS that allow adjusting the cost of the algorithm. We conclude this dissertation by pointing out avenues for extending this research by investigation of the stochastic dynamic approach for data staging and the task mapping problem for large-scale data intensive applications. The stochastic dynamic approach for data staging assumes that the applications arrival process can be modeled as a stochastic process. We proposed a Stochastic Data Scheduling (SDS) algorithm that utilizes estimates for delays on paths or routes for the staged data sets. Our estimates are produced based on modeling the staging environment as a network of finite queues and analyzing the queues in these paths.
Fusun Özgüner (Advisor)
Eylem Ekici (Other)
Chang-Gun Lee (Other)

Recommended Citations

Citations

  • Eltayeb, M. S. (2004). Efficient data scheduling for real-time large-scale data-intensive distributed applications [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1095719463

    APA Style (7th edition)

  • Eltayeb, Mohammed. Efficient data scheduling for real-time large-scale data-intensive distributed applications. 2004. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1095719463.

    MLA Style (8th edition)

  • Eltayeb, Mohammed. "Efficient data scheduling for real-time large-scale data-intensive distributed applications." Doctoral dissertation, Ohio State University, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=osu1095719463

    Chicago Manual of Style (17th edition)