Data-Aware Distributed Batch Scheduling

Author(s):  
Tevfik Kosar

As the data requirements of scientific distributed applications increase, the access to remote data becomes the main performance bottleneck for these applications. Traditional distributed computing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. The insufficiency of the traditional systems and existing CPU-oriented schedulers in dealing with the complex data handling problem has yielded a new emerging era: the data-aware schedulers. This chapter discusses the challenges in this area as well as future trends, with a focus on Stork case study.

1998 ◽  
Vol 1 (1) ◽  
Author(s):  
Angela Di Serio

One of the current research areas in the field of computer science is distributed computing systems. In distributed systems, software is partitioned into modules and executed using a number of processors concurrently. A major difficulty in using distributed and paralleling computing systems has been ease of use. There is not a clear methodology for programmers for using these systems effectively. This work seeks to assess the viability of using analytic performance analysis to assist in the evaluation of candidate algorithms through its application to a case study. This will help us to estimate the total execution time and the optimal number of processors.


Sign in / Sign up

Export Citation Format

Share Document