scholarly journals Data replication strategies for fault tolerance and availability on commodity clusters

Author(s):  
C. Amza ◽  
A.L. Cox ◽  
W. Zwaenepoel
2019 ◽  
Vol 5 (1) ◽  
pp. 65-79
Author(s):  
Yunhong Ji ◽  
Yunpeng Chai ◽  
Xuan Zhou ◽  
Lipeng Ren ◽  
Yajie Qin

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.


Author(s):  
Gianni Pucciani ◽  
Flavia Donno ◽  
Andrea Domenici ◽  
Heinz Stockinger

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.


Author(s):  
Ahmad Shukri Mohd Noor ◽  
Nur Farhah Mat Zian ◽  
Noor Hafhizah Abd Rahim ◽  
Rabiei Mamat ◽  
Wan Nur Amira Wan Azman

The availability of the data in a distributed system can be increase by implementing fault tolerance mechanism in the system. Reactive method in fault tolerance mechanism deals with restarting the failed services, placing redundant copies of data in multiple nodes across network, in other words data replication and migrating the data for recovery. Even if the idea of data replication is solid, the challenge is to choose the right replication technique that able to provide better data availability as well as consistency that involves read and write operations on the redundant copies. Circular Neighboring Replication (CNR) technique exploits neighboring policy in replicating the data items in the system performs well with regards to lower copies needed to maintain the system availability at the highest. In a performance analysis with existing techniques, results show that CNR improves system availability by average 37% by offering only two replicas needed to maintain data availability and consistency. The study demonstrates the possibility of the proposed technique and the potential of deploying in larger and complex environment.


2015 ◽  
Vol 37 ◽  
pp. 399
Author(s):  
Sogand Sahabi Moghaddam ◽  
Abbas Karimi

Multicast data replication provides a possible solution for improving data accessibility in highly dynamic and fault prone mobile ad hoc environments. Our novel multicast data replication approach operates in a self-organizing manner where the network nodes that has unit host detector construct a connected dominating set (CDS) based on the topology graph by collecting information from neighboring nodes using multicast if gathered data from neighbors have two non-adjacent neighbors then use that virtual backbone for efficient data replication, data search and routing. In this study, we compare our proposed approach with SCALAR and evaluate it in average hop counts and successful delivery ratio with different node numbers and speeds.It is shown that the average hop counts increased but with falling rate and 20 percent successful delivery ratio is achieved, so it is demonstrated that PM act with respect to fault tolerance improvement, power consumption and load balancing is occurred.


2018 ◽  
Vol 8 (3) ◽  
pp. 60-77
Author(s):  
Sanjaya Kumar Panda ◽  
Saswati Naik

This article describes how data replication plays an important role in distributed systems. It primarily focuses on the redundancy of data at two or more nodes, to achieve both fault tolerance and improved performance. Therefore, many researchers have proposed various data replication algorithms to manage the redundancy of data. However, they have not considered the faults that are associated with the nodes, such as permanent, transient and intermittent. Moreover, they have not incorporated any recovery approach to rejoin the failed nodes. Therefore, the authors propose a data replication algorithm, called dynamic vote-based data replication (DVDR). The main contribution of DVDR is to consider all types of faults and rejoin the failed nodes. DVDR is based on dynamic vote assignment among the connected nodes, and referred as passive and non-hierarchical one. The authors perform rigorous analysis of DVDR and compare with an existing dynamic vote assignment algorithm. The result shows the efficacy of the proposed algorithm.


Author(s):  
Sanjaya Kumar Panda ◽  
Saswati Naik

This article describes how data replication plays an important role in distributed systems. It primarily focuses on the redundancy of data at two or more nodes, to achieve both fault tolerance and improved performance. Therefore, many researchers have proposed various data replication algorithms to manage the redundancy of data. However, they have not considered the faults that are associated with the nodes, such as permanent, transient and intermittent. Moreover, they have not incorporated any recovery approach to rejoin the failed nodes. Therefore, the authors propose a data replication algorithm, called dynamic vote-based data replication (DVDR). The main contribution of DVDR is to consider all types of faults and rejoin the failed nodes. DVDR is based on dynamic vote assignment among the connected nodes, and referred as passive and non-hierarchical one. The authors perform rigorous analysis of DVDR and compare with an existing dynamic vote assignment algorithm. The result shows the efficacy of the proposed algorithm.


Author(s):  
Vassilios V. Dimakopoulos ◽  
Spiridoula Margariti ◽  
Mirto Ntetsika ◽  
Evaggelia Pitoura

Maintaining multiple copies of data items is a commonly used mechanism for improving the performance and fault-tolerance of any distributed system. By placing copies of data items closer to their requesters, the response time of queries can be improved. An additional reason for replication is load balancing. For instance, by allocating many copies to popular data items, the query load can be evenly distributed among the servers that hold these copies. Similarly, by eliminating hotspots, replication can lead to a better distribution of the communication load over the network links. Besides performance-related reasons, replication improves system availability, since the larger the number of copies of an item, the more site failures can be tolerated. In this chapter we survey replication methods applicable to p2p systems. Although there exist some general techniques, methodologies are distinguished according to the overlay organization (structured and unstructured) they are aimed at. After replicas are created and distributed, a major issue is their maintenance. We present strategies that have been proposed for keeping replicas up to date so as to achieve a desired level of consistency.


Sign in / Sign up

Export Citation Format

Share Document