Efficient Fault Tolerance on Cloud Environments

2018 ◽  
Vol 8 (3) ◽  
pp. 20-31 ◽  
Author(s):  
Sam Goundar ◽  
Akashdeep Bhardwaj

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.

Author(s):  
Sam Goundar ◽  
Akashdeep Bhardwaj

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.


Author(s):  
Wenbing Zhao

The use of good random numbers is crucial to the security of many mission-critical systems. However, when such systems are replicated for Byzantine fault tolerance, a serious issue arises, i.e., how do we preserve the integrity of the systems while ensuring strong replica consistency? Despite the fact that there exists a large body of work on how to render replicas deterministic under the benign fault model, the solutions regarding the random number control are often overly simplistic without regard to the security requirement, and hence, they are not suitable for practical Byzantine fault tolerance. In this chapter, we present a novel integrity-preserving replica coordination algorithm for Byzantine fault tolerant systems. The central idea behind our CD-BFT algorithm is that all random numbers to be used by the replicas are collectively determined, based on the contributions made by a quorum of replicas, at least f+1 of which are not faulty.


2020 ◽  
Vol 174 (3-4) ◽  
pp. 229-258
Author(s):  
Qian Matteo Chen ◽  
Alberto Finzi ◽  
Toni Mancini ◽  
Igor Melatti ◽  
Enrico Tronci

In critical infrastructures like airports, much care has to be devoted in protecting radio communication networks from external electromagnetic interference. Protection of such mission-critical radio communication networks is usually tackled by exploiting radiogoniometers: at least three suitably deployed radiogoniometers, and a gateway gathering information from them, permit to monitor and localise sources of electromagnetic emissions that are not supposed to be present in the monitored area. Typically, radiogoniometers are connected to the gateway through relay nodes. As a result, some degree of fault-tolerance for the network of relay nodes is essential in order to offer a reliable monitoring. On the other hand, deployment of relay nodes is typically quite expensive. As a result, we have two conflicting requirements: minimise costs while guaranteeing a given fault-tolerance. In this paper, we address the problem of computing a deployment for relay nodes that minimises the overall cost while at the same time guaranteeing proper working of the network even when some of the relay nodes (up to a given maximum number) become faulty (fault-tolerance). We show that, by means of a computation-intensive pre-processing on a HPC infrastructure, the above optimisation problem can be encoded as a 0/1 Linear Program, becoming suitable to be approached with standard Artificial Intelligence reasoners like MILP, PB-SAT, and SMT/OMT solvers. Our problem formulation enables us to present experimental results comparing the performance of these three solving technologies on a real case study of a relay node network deployment in areas of the Leonardo da Vinci Airport in Rome, Italy.


2014 ◽  
Vol 631-632 ◽  
pp. 669-675
Author(s):  
Yong Xiong ◽  
Ji Liang Lin

Taking α-lattice flocking as research object, the influence when faults occur in flock and its fault tolerance control algorithm is studied. The impact on flocking performance is analyzed by means of flocking property indexes when communication error, actuator failure or sensor malfunction occur. A flocking fault diagnosis method and fault tolerance control strategy based on communication and data association are introduced. Considering failure mobile robots as obstacles, a complex shaped obstacles avoidance algorithm is proposed. Simulation shows the effectiveness of the method.


Author(s):  
Wenbing Zhao

The use of good random numbers is crucial to the security of many mission-critical systems. However, when such systems are replicated for Byzantine fault tolerance, a serious issue arises (i.e., how do we preserve the integrity of the systems while ensuring strong replica consistency?). Despite the fact that there exists a large body of work on how to render replicas deterministic under the benign fault model, the solutions regarding the random number control are often overly simplistic without regard to the security requirement, and hence, they are not suitable for practical Byzantine fault tolerance. In this chapter, the authors present a novel integrity-preserving replica coordination algorithm for Byzantine fault tolerant systems. The central idea behind our CD-BFT algorithm is that all random numbers to be used by the replicas are collectively determined, based on the contributions made by a quorum of replicas, at least f+1 of which are not faulty.


2019 ◽  
Vol 2 (1) ◽  
pp. 43-52
Author(s):  
Alireza Alikhani ◽  
Safa Dehghan M ◽  
Iman Shafieenejad

In this study, satellite formation flying guidance in the presence of under actuation using inter-vehicle Coulomb force is investigated. The Coulomb forces are used to stabilize the formation flying mission. For this purpose, the charge of satellites is determined to create appropriate attraction and repulsion and also, to maintain the distance between satellites. Static Coulomb formation of satellites equations including three satellites in triangular form was developed. Furthermore, the charge value of the Coulomb propulsion system required for such formation was obtained. Considering Under actuation of one of the formation satellites, the fault-tolerance approach is proposed for achieving mission goals. Following this approach, in the first step fault-tolerant guidance law is designed. Accordingly, the obtained results show stationary formation. In the next step, tomaintain the formation shape and dimension, a fault-tolerant control law is designed.


Fault Tolerant Reliable Protocol (FTRP) is proposed as a novel routing protocol designed for Wireless Sensor Networks (WSNs). FTRP offers fault tolerance reliability for packet exchange and support for dynamic network changes. The key concept used is the use of node logical clustering. The protocol delegates the routing ownership to the cluster heads where fault tolerance functionality is implemented. FTRP utilizes cluster head nodes along with cluster head groups to store packets in transient. In addition, FTRP utilizes broadcast, which reduces the message overhead as compared to classical flooding mechanisms. FTRP manipulates Time to Live values for the various routing messages to control message broadcast. FTRP utilizes jitter in messages transmission to reduce the effect of synchronized node states, which in turn reduces collisions. FTRP performance has been extensively through simulations against Ad-hoc On-demand Distance Vector (AODV) and Optimized Link State (OLSR) routing protocols. Packet Delivery Ratio (PDR), Aggregate Throughput and End-to-End delay (E-2-E) had been used as performance metrics. In terms of PDR and aggregate throughput, it is found that FTRP is an excellent performer in all mobility scenarios whether the network is sparse or dense. In stationary scenarios, FTRP performed well in sparse network; however, in dense network FTRP’s performance had degraded yet in an acceptable range. This degradation is attributed to synchronized nodes states. Reliably delivering a message comes to a cost, as in terms of E-2-E. results show that FTRP is considered a good performer in all mobility scenarios where the network is sparse. In sparse stationary scenario, FTRP is considered good performer, however in dense stationary scenarios FTRP’s E-2-E is not acceptable. There are times when receiving a network message is more important than other costs such as energy or delay. That makes FTRP suitable for wide range of WSNs applications, such as military applications by monitoring soldiers’ biological data and supplies while in battlefield and battle damage assessment. FTRP can also be used in health applications in addition to wide range of geo-fencing, environmental monitoring, resource monitoring, production lines monitoring, agriculture and animals tracking. FTRP should be avoided in dense stationary deployments such as, but not limited to, scenarios where high application response is critical and life endangering such as biohazards detection or within intensive care units.


2021 ◽  
Vol 10 (2) ◽  
pp. 34
Author(s):  
Alessio Botta ◽  
Jonathan Cacace ◽  
Riccardo De Vivo ◽  
Bruno Siciliano ◽  
Giorgio Ventre

With the advances in networking technologies, robots can use the almost unlimited resources of large data centers, overcoming the severe limitations imposed by onboard resources: this is the vision of Cloud Robotics. In this context, we present DewROS, a framework based on the Robot Operating System (ROS) which embodies the three-layer, Dew-Robotics architecture, where computation and storage can be distributed among the robot, the network devices close to it, and the Cloud. After presenting the design and implementation of DewROS, we show its application in a real use-case called SHERPA, which foresees a mixed ground and aerial robotic platform for search and rescue in an alpine environment. We used DewROS to analyze the video acquired by the drones in the Cloud and quickly spot signs of human beings in danger. We perform a wide experimental evaluation using different network technologies and Cloud services from Google and Amazon. We evaluated the impact of several variables on the performance of the system. Our results show that, for example, the video length has a minimal impact on the response time with respect to the video size. In addition, we show that the response time depends on the Round Trip Time (RTT) of the network connection when the video is already loaded into the Cloud provider side. Finally, we present a model of the annotation time that considers the RTT of the connection used to reach the Cloud, discussing results and insights into how to improve current Cloud Robotics applications.


Sign in / Sign up

Export Citation Format

Share Document