A Survey of Scheduling and Management Techniques for Data-Intensive Application Workflows

Author(s):  
Suraj Pandey ◽  
Rajkumar Buyya

This chapter presents a comprehensive survey of algorithms, techniques, and frameworks used for scheduling and management of data-intensive application workflows. Many complex scientific experiments are expressed in the form of workflows for structured, repeatable, controlled, scalable, and automated executions. This chapter focuses on the type of workflows that have tasks processing huge amount of data, usually in the range from hundreds of mega-bytes to petabytes. Scientists are already using Grid systems that schedule these workflows onto globally distributed resources for optimizing various objectives: minimize total makespan of the workflow, minimize cost and usage of network bandwidth, minimize cost of computation and storage, meet the deadline of the application, and so forth. This chapter lists and describes techniques used in each of these systems for processing huge amount of data. A survey of workflow management techniques is useful for understanding the working of the Grid systems providing insights on performance optimization of scientific applications dealing with data-intensive workloads.

2013 ◽  
pp. 1170-1190
Author(s):  
Suraj Pandey ◽  
Rajkumar Buyya

This chapter presents a comprehensive survey of algorithms, techniques, and frameworks used for scheduling and management of data-intensive application workflows. Many complex scientific experiments are expressed in the form of workflows for structured, repeatable, controlled, scalable, and automated executions. This chapter focuses on the type of workflows that have tasks processing huge amount of data, usually in the range from hundreds of mega-bytes to petabytes. Scientists are already using Grid systems that schedule these workflows onto globally distributed resources for optimizing various objectives: minimize total makespan of the workflow, minimize cost and usage of network bandwidth, minimize cost of computation and storage, meet the deadline of the application, and so forth. This chapter lists and describes techniques used in each of these systems for processing huge amount of data. A survey of workflow management techniques is useful for understanding the working of the Grid systems providing insights on performance optimization of scientific applications dealing with data-intensive workloads.


Author(s):  
Simab Hasan Rizvi

In Today's age of Tetra Scale computing, the application has become more data intensive than ever. The increased data volume from applications, in now tackling larger and larger problems, and has fuelled the need for efficient management of this data. In this paper, a technique called Content Addressable Storage or CAS, for managing large volume of data is evaluated. This evaluation focuses on the benefits and demerits of using CAS it focuses, i) improved application performance via lockless and lightweight synchronization ofaccess to shared storage data, ii) improved cache performance, iii) increase in storage capacity and, iv) increase network bandwidth. The presented design of a CAS-Based file store significantly improves the storage performance that provides lightweight lock less user defined consistency semantics. As a result, this file system shows a 28% increase in read bandwidth and 13% increase in write bandwidth, over a popular file system in common use. In this paper the potential benefits of using CAS for a virtual machine are estimated. The study also explains mobility application for active use and public deployment.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Xiaoxiao Shi

With Internet entering all walks of life, development of internet and usage expansion demand better performance, especially the application of 5G network that adopts NAS networking mode. Some of the network bandwidth cannot fully support the current network demand, which causes network fluctuations and other concerns. In this paper, a method for optimizing the topological structure of the bottom layer of the communication network is proposed that has outage performance close to optimal routing scheme. In specific, path in areas with poor network conditions is first optimized using Viterbi algorithm. Then, network element nodes on the path are optimized using Bayes recommendation algorithm for reasonable flow distribution. Dual planning of improved Viterbi algorithm is used to realize the main and standby path planning, and then, Bayesian recommendation algorithm based on the average value is used to optimize the network elements. Therefore, it is very efficient to realize overall performance optimization.


2014 ◽  
Vol 573 ◽  
pp. 571-575
Author(s):  
M.H. Anandbabu ◽  
B. Palanichelvam ◽  
R. Suganya

Grid computing is that the major analysis space wherever the distributed resources square measure used. In programming, the largest challenge is to amass optimum answer for the submitted jobs within the grid. For giant subtask need time intense computation, this paper introduces a replacement fault recovery mechanism into grid systems associated an thorough study on grid service. We have a tendency to propose a replacement algorithmic program on considering these factors. In our planned algorithmic program Recovery Mutual programming, a catalog is employed which is able to be responsive in accumulation of saving its state sporadically. Consequently the turnout of a system is exaggerated with the localized approach.


Author(s):  
Sriram Krishnan ◽  
Luca Clementi ◽  
Zhaohui Ding ◽  
Wilfred Li

Grid systems provide mechanisms for single sign-on, and uniform APIs for job submission and data transfer, in order to allow the coupling of distributed resources in a seamless manner. However, new users face a daunting barrier of entry due to the high cost of deployment and maintenance. They are often required to learn complex concepts relative to Grid infrastructures (credential management, scheduling systems, data staging, etc). To most scientific users, running their applications with minimal changes and yet getting results faster is highly desirable, without having to know much about how the resources are used. Hence, a higher level of abstraction must be provided for the underlying infrastructure to be used effectively. For this purpose, we have developed the Opal toolkit for exposing applications on Grid resources as simple Web services. Opal provides a basic set of Application Programming Interfaces (APIs) that allows users to execute their deployed applications, query job status, and retrieve results. Opal also provides a mechanism to define command-line arguments and automatically generates user interfaces for the Web services dynamically. In addition, Opal services can be hooked up to a Metascheduler such as CSF4 to leverage a distributed set of resources, and accessed via a multitude of interfaces such as Web browsers, rich desktop environments, workflow tools, and command-line clients.


2020 ◽  
Vol 167 ◽  
pp. 1189-1199 ◽  
Author(s):  
Ankit Thakkar ◽  
Kinjal Chaudhari ◽  
Monika Shah

Sign in / Sign up

Export Citation Format

Share Document