A Failure Detection System for Large Scale Distributed Systems

Author(s):  
Andrei Lavinia ◽  
Ciprian Dobre ◽  
Florin Pop ◽  
Valentin Cristea

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.

2011 ◽  
Vol 2 (3) ◽  
pp. 64-87 ◽  
Author(s):  
Andrei Lavinia ◽  
Ciprian Dobre ◽  
Florin Pop ◽  
Valentin Cristea

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.


Author(s):  
Ahmad Iwan Fadli ◽  
Selo Sulistyo ◽  
Sigit Wibowo

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation.  It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.


Author(s):  
Thomas Weise ◽  
Raymond Chiong

The ubiquitous presence of distributed systems has drastically changed the way the world interacts, and impacted not only the economics and governance but also the society at large. It is therefore important for the architecture and infrastructure within the distributed environment to be continuously renewed in order to cope with the rapid changes driven by the innovative technologies. However, many problems in distributed computing are either of dynamic nature, large scale, NP complete, or a combination of any of these. In most cases, exact solutions are hardly found. As a result, a number of intelligent nature-inspired algorithms have been used recently, as these algorithms are capable of achieving good quality solutions in reasonable computational time. Among all the nature-inspired algorithms, evolutionary algorithms are considerably the most extensively applied ones. This chapter presents a systematic review of evolutionary algorithms employed to solve various problems related to distributed systems. The review is aimed at providing an insight of evolutionary approaches, in particular genetic algorithms and genetic programming, in solving problems in five different areas of network optimization: network topology, routing, protocol synthesis, network security, and parameter settings and configuration. Some interesting applications from these areas will be discussed in detail with the use of illustrative examples.


Author(s):  
Yifeng Zhu ◽  
Hong Jiang

This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.


Sign in / Sign up

Export Citation Format

Share Document