A Formal Approach for Failure Detection in Large-Scale Distributed Systems Using Abstract State Machines

Author(s):  
Andreea Buga ◽  
Sorana Tania Nemeș
2011 ◽  
Vol 2 (3) ◽  
pp. 64-87 ◽  
Author(s):  
Andrei Lavinia ◽  
Ciprian Dobre ◽  
Florin Pop ◽  
Valentin Cristea

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.


Author(s):  
Andrei Lavinia ◽  
Ciprian Dobre ◽  
Florin Pop ◽  
Valentin Cristea

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1261
Author(s):  
Christopher Gradwohl ◽  
Vesna Dimitrievska ◽  
Federico Pittino ◽  
Wolfgang Muehleisen ◽  
András Montvay ◽  
...  

Photovoltaic (PV) technology allows large-scale investments in a renewable power-generating system at a competitive levelized cost of electricity (LCOE) and with a low environmental impact. Large-scale PV installations operate in a highly competitive market environment where even small performance losses have a high impact on profit margins. Therefore, operation at maximum performance is the key for long-term profitability. This can be achieved by advanced performance monitoring and instant or gradual failure detection methodologies. We present in this paper a combined approach on model-based fault detection by means of physical and statistical models and failure diagnosis based on physics of failure. Both approaches contribute to optimized PV plant operation and maintenance based on typically available supervisory control and data acquisition (SCADA) data. The failure detection and diagnosis capabilities were demonstrated in a case study based on six years of SCADA data from a PV plant in Slovenia. In this case study, underperforming values of the inverters of the PV plant were reliably detected and possible root causes were identified. Our work has led us to conclude that the combined approach can contribute to an efficient and long-term operation of photovoltaic power plants with a maximum energy yield and can be applied to the monitoring of photovoltaic plants.


2007 ◽  
Vol 41 (2) ◽  
pp. 83-88
Author(s):  
Flavio P. Junqueira ◽  
Vassilis Plachouras ◽  
Fabrizio Silvestri ◽  
Ivana Podnar

Sign in / Sign up

Export Citation Format

Share Document