Lightweight Fault-tolerant Message Passing System for Parallel and Distributed Applications

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-22
Author(s):  
Haoran Li ◽  
Chenyang Lu ◽  
Christopher D. Gill

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.


2016 ◽  
Vol 13 (6) ◽  
pp. 172988141666366
Author(s):  
Long Peng ◽  
Fei Guan ◽  
Luc Perneel ◽  
Martin Timmerman

Component-based approaches are prevalent in software development for robotic applications due to their reusability and productivity. In this article, we present an Embedded modular Software framework for a networked ro BoTic system (EmSBoT) targeting resource-constrained devices such as microcontroller-based robots. EmSBoT is primarily built upon μCOS-III with real-time support. However, its operating system abstraction layer makes it available for various operating systems. It employs a unified port-based communication mechanism to achieve message passing while hiding the heterogeneous distributed environment from applications, which also endows the framework with fault-tolerant capabilities. We describe the design and core features of the EmSBoT framework in this article. The implementation and experimental evaluation show its availability with small footprint size, effectiveness, and OS independence.


Sign in / Sign up

Export Citation Format

Share Document