scholarly journals State of Art Survey for Fault Tolerance Feasibility in Distributed Systems

Author(s):  
Arshad A. Hussein ◽  
Adel AL-zebari ◽  
Naaman Omar ◽  
Karwan Jameel Merceedi ◽  
Abdulraheem Jamil Ahmed ◽  
...  

The use of technology has grown dramatically, and computer systems are now interconnected via various communication mediums. The use of distributed systems (DS) in our daily activities has only gotten better with data distributions. This is due to the fact that distributed systems allow nodes to arrange and share their resources across linked systems or devices, allowing humans to be integrated with geographically spread computer capacity. Due to multiple system failures at multiple failure points, distributed systems may result in a lack of service availability. to avoid multiple system failures at multiple failure points by using fault tolerance (FT) techniques in distributed systems to ensure replication, high redundancy, and high availability of distributed services. In this paper shows ease fault tolerance systems, its requirements, and explain about distributed system. Also, discuss distributed system architecture; furthermore, explain used techniques of fault tolerance, in additional that review some recent literature on fault tolerance in distributed systems and finally, discuss and compare the fault tolerance literature.

2012 ◽  
Vol 2 (3) ◽  
pp. 98-109
Author(s):  
Ahmad Shukri Mohd Noor ◽  
Tutut Herawan ◽  
Mustafa Mat Deris

High availability is important for large scale distributed systems. Replication provides effective ways to enhance performance, high availability and fault tolerance in distributed systems. An efficient and effective replication technique is the key to improve the availability performance. Data and processes can be replicated for failures recovery. There are currently projects successfully implemented in two-replica distribution technique (TRDT) or primary–backup technique. However, these projects have their weaknesses of increasing cost overhead and inherit irrecoverable scenarios from TRDT such as double faults when both copies of replicated components are damaged. The authors propose the Neighbor Replica Distributed Technique (NRDT) availability prediction model. Focusing on improving high availability in which it predicts future expectation of interdependent server’s availability in a distributed online system over an extended period of time. The results and discussion are explored further in the article.


Minimum-process harmonized checkpointing is well thought-out an attractive methodology to acquaint with fault tolerance in mobile systems patently. We design a minimum- process synchronous checkpointing algorithm for mobile distributed system. We try to minimize the intrusion of processes during checkpointing. We collect the transitive dependencies in the beginning, and therefore, the obstructive time of processes is bare minimum. During obstructive period, processes can do their normal computations, send messages and can process selective messages. In case of failure during checkpointing, all applicable processes are necessitated to abandon their transient snapshots only. In this way, we try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others. We also try to minimize the harmonization message complexity during checkpointing.


Author(s):  
Ahmad Shukri Mohd Noor ◽  
Nur Farhah Mat Zian ◽  
Fatin Nurhanani M. Shaiful Bahri

<p>Distributed systems mainly provide access to a large amount of data and computational resources through a wide range of interfaces. Besides its dynamic nature, which means that resources may enter and leave the environment at any time, many distributed systems applications will be running in an environment where faults are more likely to occur due to their ever-increasing scales and the complexity. Due to diverse faults and failures conditions, fault tolerance has become a critical element for distributed computing in order for the system to perform its function correctly even in the present of faults. Replication techniques primarily concentrate on the two fault tolerance manners precisely masking the failures as well as reconfigure the system in response. This paper presents a brief survey on different replication techniques such as Read One Write All (ROWA), Quorum Consensus (QC), Tree Quorum (TQ) Protocol, Grid Configuration (GC) Protocol, Two-Replica Distribution Techniques (TRDT), Neighbour Replica Triangular Grid (NRTG) and Neighbour Replication Distributed Techniques (NRDT). These techniques have its own redeeming features and shortcoming which forms the subject matter of this survey.</p>


2021 ◽  
pp. 102217
Author(s):  
Yu Wu ◽  
Duo Liu ◽  
Xianzhang Chen ◽  
Jinting Ren ◽  
Renping Liu ◽  
...  

2021 ◽  
Vol 40 (2) ◽  
pp. 65-69
Author(s):  
Richard Wai

Modern day cloud native applications have become broadly representative of distributed systems in the wild. However, unlike traditional distributed system models with conceptually static designs, cloud-native systems emphasize dynamic scaling and on-line iteration (CI/CD). Cloud-native systems tend to be architected around a networked collection of distinct programs ("microservices") that can be added, removed, and updated in real-time. Typically, distinct containerized programs constitute individual microservices that then communicate among the larger distributed application through heavy-weight protocols. Common communication stacks exchange JSON or XML objects over HTTP, via TCP/TLS, and incur significant overhead, particularly when using small size message sizes. Additionally, interpreted/JIT/VM-based languages such as Javascript (NodeJS/Deno), Java, and Python are dominant in modern microservice programs. These language technologies, along with the high-overhead messaging, can impose superlinear cost increases (hardware demands) on scale-out, particularly towards hyperscale and/or with latency-sensitive workloads.


Author(s):  
Asif Imran ◽  
Alim Ul Gias ◽  
Rayhanur Rahman ◽  
Amit Seal ◽  
Tajkia Rahman ◽  
...  

2001 ◽  
Vol 02 (03) ◽  
pp. 317-329 ◽  
Author(s):  
MUSTAFA MAT DERIS ◽  
ALI MAMAT ◽  
PUA CHAI SENG ◽  
MOHD YAZID SAMAN

This article addresses the performance of data replication protocol in terms of data availability and communication costs. Specifically, we present a new protocol called Three Dimensional Grid Structure (TDGS) protocol, to manage data replication in distributed system. The protocol provides high availability for read and write operations with limited fault-tolerance at low communication cost. With TDGS protocol, a read operation is limited to two data copies, while a write operation is required with minimal number of copies. In comparison to other protocols. TDGS requires lower communication cost for an operation, while providing higher data availability.


Author(s):  
В.А. Рудометкин

В настоящее время большинство сервисов переходят в онлайн, что позволяет пользователям получать услугу в любое время. Высокая доступность услуги приводит к росту количества пользователей, что влечет за собой повышение нагрузки на систему, поэтому необходимо уделить особое внимание отказоустойчивости системы перед началом ее разработки. Рассматриваются основные проблемы высоконагруженных систем, способ оптимизации приложения путем распараллеливания задач по ядрам процессора. В данной статье описывается необходимость перехода на микросервисную архитектуру, ее недостатки и способы их устранения. В процессе решения проблем масштабирования, затрагиваются проблемы распределенных транзакций и долгого ответа от сервера. Nowadays, most of the services are moving online, which allows users to receive the service at any time. The high availability of the service leads to an increase in the number of users, which entails an increase in the load on the system, therefore, it is necessary to pay special attention to the fault tolerance of the system before starting its development. The main problems of high-load systems, a way to optimize an application by parallelizing tasks across processor cores are considered. This article describes the need to migrate to a microservice architecture, its weaknesses, and how to fix them. In the process of solving scaling problems, the problems of distributed transactions and long response from the server are addressed.


Sign in / Sign up

Export Citation Format

Share Document