Visual programming of fault-tolerant distributed applications

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.

Download Full-text

Lightweight Fault-tolerant Message Passing System for Parallel and Distributed Applications

International e-Conference of Computer Science 2006 ◽

10.1201/b12168-6 ◽

2007 ◽

pp. 30-33

Keyword(s):

Message Passing ◽

Fault Tolerant ◽

Distributed Applications

Download Full-text

A Probabilistic Fault-Tolerant Recovery Mechanism for Task and Result Certification of Large-Scale Distributed Applications

Advances in Grid and Pervasive Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01671-4_42 ◽

2009 ◽

pp. 471-482

Author(s):

Rim Chayeh ◽

Christophe Cerin ◽

Mohamed Jemni

Keyword(s):

Large Scale ◽

Fault Tolerant ◽

Distributed Applications ◽

Recovery Mechanism

Download Full-text

Checkpointing Algorithms for Fault-Tolerant Execution of Large-Scale Distributed Applications in Cloud

Wireless Personal Communications ◽

10.1007/s11277-020-07949-0 ◽

2020 ◽

Author(s):

Priti Kumari ◽

Parmeet Kaur

Keyword(s):

Large Scale ◽

Fault Tolerant ◽

Distributed Applications

Download Full-text

Reducing Negative Complexity by a Computational Semiotic System

Semiotics and Intelligent Systems Development ◽

10.4018/978-1-59904-063-9.ch012 ◽

2007 ◽

pp. 330-342 ◽

Cited By ~ 1

Author(s):

Gerd Doben-Henisch

Keyword(s):

Virtual Machines ◽

Fault Tolerant ◽

Visual Programming ◽

State Machine ◽

Learning Systems ◽

Semiotic System ◽

Abstract State Machine ◽

Set Up ◽

Self Learning ◽

Programming Interface

The chapter describes the set-up for an experiment in computational semiotics. Starting with a hypothesis about negative complexity in the environment of human persons today it describes a strategy, how to assist human persons to reduce this complexity by using a semiotic system. The basic ingredients of this strategy are a visual programming interface with an appropriate abstract state machine, which has to be realized by distributed virtual machines. The distributed virtual machines must be scalable, have to allow parallel processing, have to be fault tolerant, and should have the potential to work in real time. The objects, which have to be processed by these virtual machines, are logical models (LModels), which represent dynamic knowledge, including self learning systems. The descriptions are based on a concrete open source project called Planet Earth Simulator.

Download Full-text

An Adaptable and Generic Fault-Tolerant System for Distributed Applications

2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT) ◽

10.1109/acsat.2012.63 ◽

2012 ◽

Author(s):

Ouanes Aissaoui ◽

Abdelkrim Amirat ◽

Fadila Atil

Keyword(s):

Fault Tolerant ◽

Distributed Applications ◽

Fault Tolerant System

Download Full-text

STAR: a fault-tolerant system for distributed applications

Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing ◽

10.1109/spdp.1993.395471 ◽

2002 ◽

Cited By ~ 2

Author(s):

P. Sens ◽

B. Folliot

Keyword(s):

Fault Tolerant ◽

Distributed Applications ◽

Fault Tolerant System

Download Full-text

Programming fault-tolerant distributed applications in HOPS

Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition ◽

10.1109/pccc.1989.37433 ◽

2003 ◽

Author(s):

J. Silverman ◽

T. Raeuchle ◽

H. Madduri

Keyword(s):

Fault Tolerant ◽

Distributed Applications

Download Full-text

Flexible Distributed Workflow Management Systems Design Based on CORBA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.157-158.839 ◽

2012 ◽

Vol 157-158 ◽

pp. 839-842 ◽

Cited By ~ 3

Author(s):

Ya Li ◽

Hai Rui Wang ◽

Xiong Tong ◽

Li Zhang

Keyword(s):

Fault Tolerant ◽

Workflow Management ◽

Systems Design ◽

Distributed Applications ◽

General Purpose ◽

Management Systems ◽

Workflow Management Systems ◽

Workflow Systems ◽

Workflow System ◽

Distributed Components

The paper addresses the problem of flexible Workflow Management Systems (WFMS) in distributed environment. Concerning the serious deficiency of flexibility in the current workflow systems, we describe how our workflow system meets the requirements of interoperability, scalability, flexibility, dependability and adaptability. With an additional route engine, the execution path will be adjusted dynamically according to the execution conditions so as to improve the flexibility and dependability of the system. A dynamic register mechanism of domain engines is introduced to improve the scalability and adaptability of the system. The system is general purpose and open: it has been designed and implemented as a set of CORBA services. The system serves as an example of the use of middleware technologies to provide a fault-tolerant execution environment for long running distributed applications. The system also provides a mechanism for communication of distributed components in order to support inter-organizational WFMS.

Download Full-text

A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing ◽

10.1109/pdp.2011.82 ◽

2011 ◽

Cited By ~ 2

Author(s):

Virginie Galtier ◽

Constantinos Makassikis ◽

Stephane Vialle

Keyword(s):

Fault Tolerant ◽

Distributed Applications

Download Full-text