A Failure Detection System for Large Scale Distributed Systems

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.

Download Full-text

A Failure Detection System for Large Scale Distributed Systems

2010 International Conference on Complex, Intelligent and Software Intensive Systems ◽

10.1109/cisis.2010.29 ◽

2010 ◽

Cited By ~ 7

Author(s):

Andrei Lavinia ◽

Ciprian Dobre ◽

Florin Pop ◽

Valentin Cristea

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Detection System ◽

Failure Detection

Download Full-text

Applying Machine Learning for Improving Performance Classification on Driving Behavior

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.56919 ◽

2021 ◽

Vol 4 (1) ◽

pp. 8

Author(s):

Ahmad Iwan Fadli ◽

Selo Sulistyo ◽

Sigit Wibowo

Keyword(s):

Machine Learning ◽

Traffic Accident ◽

Large Scale ◽

Detection System ◽

Difficult Problem ◽

Sensor Data ◽

Driving Safety ◽

Support Vector ◽

Classification Methods ◽

Machine Learning Classification

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation. It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.

Download Full-text

Evolutionary Approaches and Their Applications to Distributed Systems

Intelligent Systems for Automated Learning and Adaptation ◽

10.4018/978-1-60566-798-0.ch006 ◽

2010 ◽

pp. 114-149 ◽

Cited By ~ 1

Author(s):

Thomas Weise ◽

Raymond Chiong

Keyword(s):

Distributed Systems ◽

Evolutionary Algorithms ◽

Network Topology ◽

Large Scale ◽

Computational Time ◽

Distributed Environment ◽

Rapid Changes ◽

Np Complete ◽

Nature Inspired Algorithms ◽

Ubiquitous Presence

The ubiquitous presence of distributed systems has drastically changed the way the world interacts, and impacted not only the economics and governance but also the society at large. It is therefore important for the architecture and infrastructure within the distributed environment to be continuously renewed in order to cope with the rapid changes driven by the innovative technologies. However, many problems in distributed computing are either of dynamic nature, large scale, NP complete, or a combination of any of these. In most cases, exact solutions are hardly found. As a result, a number of intelligent nature-inspired algorithms have been used recently, as these algorithms are capable of achieving good quality solutions in reasonable computational time. Among all the nature-inspired algorithms, evolutionary algorithms are considerably the most extensively applied ones. This chapter presents a systematic review of evolutionary algorithms employed to solve various problems related to distributed systems. The review is aimed at providing an insight of evolutionary approaches, in particular genetic algorithms and genetic programming, in solving problems in five different areas of network optimization: network topology, routing, protocol synthesis, network security, and parameter settings and configuration. Some interesting applications from these areas will be discussed in detail with the use of illustrative examples.

Download Full-text

Development of Failure Detection System for Network Control using Collective Intelligence of Social Networking Service in Large-Scale Disasters

Proceedings of the 27th ACM Conference on Hypertext and Social Media - HT '16 ◽

10.1145/2914586.2914620 ◽

2016 ◽

Cited By ~ 1

Author(s):

Chihiro Maru ◽

Miki Enoki ◽

Akihiro Nakao ◽

Shu Yamamoto ◽

Saneyasu Yamaguchi ◽

...

Keyword(s):

Social Networking ◽

Large Scale ◽

Collective Intelligence ◽

Detection System ◽

Failure Detection ◽

Network Control ◽

Social Networking Service

Download Full-text

Efficient Update Control of Bloom Filter Replicas in Large Scale Distributed Systems

Handbook of Research on Scalable Computing Technologies ◽

10.4018/978-1-60566-661-7.ch034 ◽

2010 ◽

pp. 785-807 ◽

Cited By ~ 2

Author(s):

Yifeng Zhu ◽

Hong Jiang

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Control Mechanism ◽

False Negative ◽

Bloom Filter ◽

Analytical Models ◽

Bloom Filters ◽

Distributed Environment ◽

Membership Query ◽

Efficient Data

This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.

Download Full-text