Fault-Tolerant Scheduling for Scientific Workflow with Task Replication Method in Cloud

An Exotic IWD - SVR Based Approach for Failure Prognostication in Cloud-Based Scientific Workflows

10.21203/rs.3.rs-716843/v1 ◽

2021 ◽

Author(s):

Sridevi S ◽

Jeevaa Katiravan Jeevaa Katiravan

Keyword(s):

Large Scale ◽

Performance Metrics ◽

Prediction Models ◽

Fault Tolerant ◽

Scientific Workflow ◽

Scientific Workflows ◽

Support Vector ◽

Learning Approaches ◽

Task Failure ◽

Proactive Measures

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.

Download Full-text

Fault-Tolerant and Data-Intensive Resource Scheduling and Management for Scientific Applications in Cloud Computing

Sensors ◽

10.3390/s21217238 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7238

Author(s):

Zulfiqar Ahmad ◽

Ali Imran Jehangiri ◽

Mohammed Alaa Ala’anzy ◽

Mohamed Othman ◽

Arif Iqbal Umar

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Research Work ◽

Resource Scheduling ◽

Scientific Workflow ◽

Scientific Workflows ◽

Scientific Applications ◽

Data Intensive ◽

Computing Paradigm ◽

Cost Constraints

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.

Download Full-text

A Fault Tolerant Method for Network Distributed Flight Control System based on Task Replication

10.1109/ccdc52312.2021.9601528 ◽

2021 ◽

Author(s):

Cui Yuwei

Keyword(s):

Control System ◽

Flight Control ◽

Fault Tolerant ◽

Flight Control System ◽

Task Replication

Download Full-text

Dynamic Fault-Tolerant Workflow Scheduling with Hybrid Spatial-Temporal Re-Execution in Clouds

Information ◽

10.3390/info10050169 ◽

2019 ◽

Vol 10 (5) ◽

pp. 169 ◽

Cited By ~ 2

Author(s):

Na Wu ◽

Decheng Zuo ◽

Zhan Zhang

Keyword(s):

Large Scale ◽

Fault Tolerant ◽

Critical Path ◽

High Reliability ◽

Low Cost ◽

Cost Effective ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Workflow Execution ◽

Computing Environments

Improving reliability is one of the major concerns of scientific workflow scheduling in clouds. The ever-growing computational complexity and data size of workflows present challenges to fault-tolerant workflow scheduling. Therefore, it is essential to design a cost-effective fault-tolerant scheduling approach for large-scale workflows. In this paper, we propose a dynamic fault-tolerant workflow scheduling (DFTWS) approach with hybrid spatial and temporal re-execution schemes. First, DFTWS calculates the time attributes of tasks and identifies the critical path of workflow in advance. Then, DFTWS assigns appropriate virtual machine (VM) for each task according to the task urgency and budget quota in the phase of initial resource allocation. Finally, DFTWS performs online scheduling, which makes real-time fault-tolerant decisions based on failure type and task criticality throughout workflow execution. The proposed algorithm is evaluated on real-world workflows. Furthermore, the factors that affect the performance of DFTWS are analyzed. The experimental results demonstrate that DFTWS achieves a trade-off between high reliability and low cost objectives in cloud computing environments.

Download Full-text

Fault Tolerant and Optimal Task Clustering for Scientific Workflow in Cloud

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2018070101 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1-19

Author(s):

Nagaraj V. Dharwadkar ◽

Shivananda R. Poojara ◽

Priyanka M. Kadam

Keyword(s):

Data Transmission ◽

Large Scale ◽

Virtual Machines ◽

Fault Tolerant ◽

Scientific Workflow ◽

Computational Power ◽

Clustering Methods ◽

Improve Performance ◽

Task Execution ◽

Scientific Application

Scientific workflows are very complex, large-scale applications and require more computational power for data transmission and execution. In this article, the authors address the problem of scheduling scientific workflow on a number of virtual machines (VM) with the objective of reducing the total makespan of workflow and failure. This article implements checkpoints and replication strategies with the parallel task execution (PTE) algorithm to schedule scientific workflow for minimum time and cost. In order to reduce execution overhead and improve performance of the scientific application, the task uses clustering methods. Specifically, Horizontal Reclustering (HR) method were implemented to reduce failure and scheduling overhead. The authors have combined checkpoint, replication and PTE algorithms together and applied it to the HR method. Results show that the proposed strategies and method works efficiently in terms of reducing failure, makespan and execution cost compared to existing methods.

Download Full-text

A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids

Advances in Grid Computing - EGC 2005 - Lecture Notes in Computer Science ◽

10.1007/11508380_104 ◽

2005 ◽

pp. 1022-1031

Author(s):

Antonios Litke ◽

Konstantinos Tserpes ◽

Konstantinos Dolkas ◽

Theodora Varvarigou

Keyword(s):

Resource Management ◽

Fault Tolerant ◽

Management Scheme ◽

Task Replication

Download Full-text

THE DESIGN AND IMPLEMENTATION OF A CUSTOMIZABLE FAULT TOLERANCE FRAMEWORK

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194099000127 ◽

1999 ◽

Vol 09 (02) ◽

pp. 181-202

Author(s):

I-LING YEN ◽

IFTIKHAR AHMED ◽

RAMANUJAM JAGANNATH ◽

SREEPARNA KUNDU

Keyword(s):

Fault Tolerance ◽

Fault Tolerant ◽

Cost Effective ◽

Class Hierarchy ◽

Replication Method ◽

Design And Implementation ◽

Wide Range ◽

Modular Redundancy ◽

Types Of Faults ◽

Interface Protocol

While there have been significant advances in fault tolerance research, the effort has focused on the design of individual fault-tolerant systems or methodologies. Recently, some research has been initiated to develop techniques that can provide a spectrum of fault tolerance capabilities. In this paper, we present the design of a fault tolerance framework that can support a wide range of applications with various fault tolerance requirements, various criticality levels, and various system models. The framework is designed to be parameterizable so that the user can configure it to obtain the desired features. Also, the framework is designed to be an off-the-shelf component such that application programs can be integrated within it easily to obtain the fault-tolerant version of the application system. A specialized N-modular redundancy (SNMR) scheme has been developed to serve as the primary approach for achieving efficient and cost-effective fault tolerance for the framework. In most cases, the SNMR scheme yields better performance and lower cost in providing fault tolerance as compared with conventional NMR schemes. It also enhances the scalability and customizability of the general replication method. This paper discusses the main issues in the design and implementation of the SNMR framework, including the major concept of the SNMR framework, various SNMR algorithms that tolerate various types of faults, an object-oriented overall system design and the interface protocol class hierarchy. The interface protocol class hierarchy provides a nice paradigm for the implementation of customizable, highly reusable, and easily extensible SNMR framework.

Download Full-text

A Fault-Tolerant Mechanism for Distributed/Parallel System Based on Task Replication Techniques

International Journal of Computers and Applications ◽

10.1080/1206212x.2002.11441672 ◽

2002 ◽

Vol 24 (3) ◽

pp. 129-135

Author(s):

J. Aguilar ◽

M. Hernandez

Keyword(s):

Fault Tolerant ◽

Parallel System ◽

Task Replication

Download Full-text

AN EFFICIENT FAULT TOLERANT CLUSTERING FOR SCIENTIFIC WORKFLOW

International Journal of Advanced Information and Communication Technology ◽

10.46532/ijaict-2020004 ◽

2020 ◽

pp. 16-19

Author(s):

Bharanidharan A ◽

Jahashri RAJ ◽

Srinivasan K ◽

Tarun V

Keyword(s):

Fault Tolerant ◽

Likelihood Estimation ◽

Scientific Workflow ◽

Work Process ◽

Task Execution ◽

Modeling Framework ◽

Distributed Resources ◽

Failure Modeling ◽

Workflow Tasks ◽

Balanced Clustering

An effective method for the reduction of execution overhead and for improving the computational granularity of scientific workflow tasks that are executing on distributed resources is Task clustering. A job is composed of many tasks and may have a higher risk of suffering from failures than in executing a single task job. In this paper, we direct a hypothetical investigation of the effect of transient failures on the runtime execution of logical work process executions .This system proposes a maximum likelihood estimation-based parameter algorithm which is used for a general task failure modeling framework to model the workflow performance. In this paper, the system proposed here is Dynamic Balanced clustering method which combines the methods of vertical clustering, horizontal clustering and dynamic clustering to reduce the execution overhead for the scientific workflow task execution.

Download Full-text

A partial task replication algorithm for fault- tolerant FPGA-based soft-multiprocessors

2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST) ◽

10.1109/rtest.2015.7369842 ◽

2015 ◽

Cited By ~ 1

Author(s):

Masoume Zabihi ◽

Hamed Farbeh ◽

Seyed Ghassem Miremadi

Keyword(s):

Fault Tolerant ◽

Task Replication

Download Full-text