On the impact of fault tolerance tactics on architecture patterns

Author(s):  
Neil B. Harrison ◽  
Paris Avgeriou ◽  
Uwe Zdun
Keyword(s):  
2018 ◽  
Vol 8 (3) ◽  
pp. 20-31 ◽  
Author(s):  
Sam Goundar ◽  
Akashdeep Bhardwaj

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.


2014 ◽  
Vol 644-650 ◽  
pp. 1714-1716
Author(s):  
Xiao Peng Wang

Recent advances in semantic algorithms and pervasive communication are based entirely on the assumption that DHTs and hierarchical databases are not in conflict with write-back caches. Here, we argue the emulation of the partition table that would allow for further study into information retrieval systems. Our focus in this paper is not on whether the famous embedded algorithm for the study of e-business by Harris et al. is NP-complete, but rather on motivating an application for Byzantine fault tolerance (Mesophryon).


2014 ◽  
Vol 631-632 ◽  
pp. 669-675
Author(s):  
Yong Xiong ◽  
Ji Liang Lin

Taking α-lattice flocking as research object, the influence when faults occur in flock and its fault tolerance control algorithm is studied. The impact on flocking performance is analyzed by means of flocking property indexes when communication error, actuator failure or sensor malfunction occur. A flocking fault diagnosis method and fault tolerance control strategy based on communication and data association are introduced. Considering failure mobile robots as obstacles, a complex shaped obstacles avoidance algorithm is proposed. Simulation shows the effectiveness of the method.


Author(s):  
Gennaro Rodrigues ◽  
FELIPE ROSA ◽  
Adria de Oliveira ◽  
Fernanda Lima Kastensmidt ◽  
Luciano Ost ◽  
...  

2020 ◽  
Vol 20 (2) ◽  
pp. e14
Author(s):  
Diego Montezanti

  Reliability and fault tolerance have become aspects of growing relevance in the field of HPC, due to the increased probability that faults of different kinds will occur in these systems. This is fundamentally due to the increasing complexity of the processors, in the search to improve performance, which leads to a rise in the scale of integration and in the number of components that work near their technological limits, being increasingly prone to failures. Another factor that affects is the growth in the size of parallel systems to obtain greater computational power, in terms of number of cores and processing nodes. As applications demand longer uninterrupted computation times, the impact of faults grows, due to the cost of relaunching an execution that was aborted due to the occurrence of a fault or concluded with erroneous results. Consequently, it is necessary to run these applications on highly available and reliable systems, requiring strategies capable of providing detection, protection and recovery against faults. In the next years it is planned to reach Exa-scale, in which there will be supercomputers with millions of processing cores, capable of performing on the order of 1018 operations per second. This is a great window of opportunity for HPC applications, but it also increases the risk that they will not complete their executions. Recent studies show that, as systems continue to include more processors, the Mean Time Between Errors decreases, resulting in higher failure rates and increased risk of corrupted results; large parallel applications are expected to deal with errors that occur every few minutes, requiring external help to progress efficiently. Silent Data Corruptions are the most dangerous errors that can occur, since they can generate incorrect results in programs that appear to execute correctly. Scientific applications and large-scale simulations are the most affected, making silent error handling the main challenge towards resilience in HPC. In message passing applications, a silent error, affecting a single task, can produce a pattern of corruption that spreads to all communicating processes; in the worst case scenario, the erroneous final results cannot be detected at the end of the execution and will be taken as correct. Since scientific applications have execution times of the order of hours or even days, it is essential to find strategies that allow applications to reach correct solutions in a bounded time, despite the underlying failures. These strategies also prevent energy consumption from skyrocketing, since if they are not used, the executions should be launched again from the beginning. However, the most popular parallel programming models used in supercomputers lack support for fault tolerance.


Author(s):  
Sam Goundar ◽  
Akashdeep Bhardwaj

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.


2019 ◽  
Vol 12 (3) ◽  
pp. 59-64 ◽  
Author(s):  
Владимир Кононов ◽  
Vladimir Kononov

The features and possibilities of using foreign and domestic foundry technologies in the creation of CMOS-ADC for high-speed multichannel DSP systems with increased fault resistance to the effects of TKCH are considered. The architecture and technique of ADC balancing, which provide an increase in the conversion rate when several ADCS operate in the alternating mode, are presented. The technique of double reservation of sources of «weight» currents is considered. The necessity of using an additional current source and dual series-connected CMOS transistors instead of single transistors of the same conductivity type is substantiated. It is noted that the proposed solutions provide effective amortization of the impact of TKCH with the most "dangerous" energies of 20-100 MeV/nucleon


Sign in / Sign up

Export Citation Format

Share Document