scholarly journals Fault-Tolerant Scheduling for Scientific Workflow with Task Replication Method in Cloud

Author(s):  
Zhongjin Li ◽  
Jiacheng Yu ◽  
Haiyang Hu ◽  
Jie Chen ◽  
Hua Hu ◽  
...  
2021 ◽  
Author(s):  
Sridevi S ◽  
Jeevaa Katiravan Jeevaa Katiravan

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7238
Author(s):  
Zulfiqar Ahmad ◽  
Ali Imran Jehangiri ◽  
Mohammed Alaa Ala’anzy ◽  
Mohamed Othman ◽  
Arif Iqbal Umar

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.


Information ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 169 ◽  
Author(s):  
Na Wu ◽  
Decheng Zuo ◽  
Zhan Zhang

Improving reliability is one of the major concerns of scientific workflow scheduling in clouds. The ever-growing computational complexity and data size of workflows present challenges to fault-tolerant workflow scheduling. Therefore, it is essential to design a cost-effective fault-tolerant scheduling approach for large-scale workflows. In this paper, we propose a dynamic fault-tolerant workflow scheduling (DFTWS) approach with hybrid spatial and temporal re-execution schemes. First, DFTWS calculates the time attributes of tasks and identifies the critical path of workflow in advance. Then, DFTWS assigns appropriate virtual machine (VM) for each task according to the task urgency and budget quota in the phase of initial resource allocation. Finally, DFTWS performs online scheduling, which makes real-time fault-tolerant decisions based on failure type and task criticality throughout workflow execution. The proposed algorithm is evaluated on real-world workflows. Furthermore, the factors that affect the performance of DFTWS are analyzed. The experimental results demonstrate that DFTWS achieves a trade-off between high reliability and low cost objectives in cloud computing environments.


2018 ◽  
Vol 8 (3) ◽  
pp. 1-19
Author(s):  
Nagaraj V. Dharwadkar ◽  
Shivananda R. Poojara ◽  
Priyanka M. Kadam

Scientific workflows are very complex, large-scale applications and require more computational power for data transmission and execution. In this article, the authors address the problem of scheduling scientific workflow on a number of virtual machines (VM) with the objective of reducing the total makespan of workflow and failure. This article implements checkpoints and replication strategies with the parallel task execution (PTE) algorithm to schedule scientific workflow for minimum time and cost. In order to reduce execution overhead and improve performance of the scientific application, the task uses clustering methods. Specifically, Horizontal Reclustering (HR) method were implemented to reduce failure and scheduling overhead. The authors have combined checkpoint, replication and PTE algorithms together and applied it to the HR method. Results show that the proposed strategies and method works efficiently in terms of reducing failure, makespan and execution cost compared to existing methods.


Author(s):  
Antonios Litke ◽  
Konstantinos Tserpes ◽  
Konstantinos Dolkas ◽  
Theodora Varvarigou

Author(s):  
I-LING YEN ◽  
IFTIKHAR AHMED ◽  
RAMANUJAM JAGANNATH ◽  
SREEPARNA KUNDU

While there have been significant advances in fault tolerance research, the effort has focused on the design of individual fault-tolerant systems or methodologies. Recently, some research has been initiated to develop techniques that can provide a spectrum of fault tolerance capabilities. In this paper, we present the design of a fault tolerance framework that can support a wide range of applications with various fault tolerance requirements, various criticality levels, and various system models. The framework is designed to be parameterizable so that the user can configure it to obtain the desired features. Also, the framework is designed to be an off-the-shelf component such that application programs can be integrated within it easily to obtain the fault-tolerant version of the application system. A specialized N-modular redundancy (SNMR) scheme has been developed to serve as the primary approach for achieving efficient and cost-effective fault tolerance for the framework. In most cases, the SNMR scheme yields better performance and lower cost in providing fault tolerance as compared with conventional NMR schemes. It also enhances the scalability and customizability of the general replication method. This paper discusses the main issues in the design and implementation of the SNMR framework, including the major concept of the SNMR framework, various SNMR algorithms that tolerate various types of faults, an object-oriented overall system design and the interface protocol class hierarchy. The interface protocol class hierarchy provides a nice paradigm for the implementation of customizable, highly reusable, and easily extensible SNMR framework.


Author(s):  
Bharanidharan A ◽  
Jahashri RAJ ◽  
Srinivasan K ◽  
Tarun V

An effective method for the reduction of execution overhead and for improving the computational granularity of scientific workflow tasks that are executing on distributed resources is Task clustering. A job is composed of many tasks and may have a higher risk of suffering from failures than in executing a single task job. In this paper, we direct a hypothetical investigation of the effect of transient failures on the runtime execution of logical work process executions .This system proposes a maximum likelihood estimation-based parameter algorithm which is used for a general task failure modeling framework to model the workflow performance. In this paper, the system proposed here is Dynamic Balanced clustering method which combines the methods of vertical clustering, horizontal clustering and dynamic clustering to reduce the execution overhead for the scientific workflow task execution.


Sign in / Sign up

Export Citation Format

Share Document