Data replication strategies for fault tolerance and availability on commodity clusters

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.

Download Full-text

Consistency of Replicated Datasets in Grid Computing

Handbook of Research on Grid Technologies and Utility Computing ◽

10.4018/978-1-60566-184-1.ch006 ◽

2009 ◽

pp. 49-58

Author(s):

Gianni Pucciani ◽

Flavia Donno ◽

Andrea Domenici ◽

Heinz Stockinger

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Grid Computing ◽

Data Management ◽

Data Replication ◽

Data Access ◽

Grid Middleware ◽

Pros And Cons ◽

Consistency Problem ◽

Replica Consistency

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.

Download Full-text

Novelty circular neighboring technique using reactive fault tolerance method

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp5211-5217 ◽

2019 ◽

Vol 9 (6) ◽

pp. 5211

Author(s):

Ahmad Shukri Mohd Noor ◽

Nur Farhah Mat Zian ◽

Noor Hafhizah Abd Rahim ◽

Rabiei Mamat ◽

Wan Nur Amira Wan Azman

Keyword(s):

Fault Tolerance ◽

Data Replication ◽

Data Availability ◽

Complex Environment ◽

Tolerance Mechanism ◽

System Availability ◽

Replication Technique ◽

The Right ◽

Tolerance Method ◽

Reactive Method

The availability of the data in a distributed system can be increase by implementing fault tolerance mechanism in the system. Reactive method in fault tolerance mechanism deals with restarting the failed services, placing redundant copies of data in multiple nodes across network, in other words data replication and migrating the data for recovery. Even if the idea of data replication is solid, the challenge is to choose the right replication technique that able to provide better data availability as well as consistency that involves read and write operations on the redundant copies. Circular Neighboring Replication (CNR) technique exploits neighboring policy in replicating the data items in the system performs well with regards to lower copies needed to maintain the system availability at the highest. In a performance analysis with existing techniques, results show that CNR improves system availability by average 37% by offering only two replicas needed to maintain data availability and consistency. The study demonstrates the possibility of the proposed technique and the potential of deploying in larger and complex environment.

Download Full-text

Multicast Data Replication Approach for Improving Fault Tolerance in Mobile Ad hoc Networks

Ciência e Natura ◽

10.5902/2179460x20801 ◽

2015 ◽

Vol 37 ◽

pp. 399

Author(s):

Sogand Sahabi Moghaddam ◽

Abbas Karimi

Keyword(s):

Fault Tolerance ◽

Ad Hoc ◽

Dominating Set ◽

Data Replication ◽

Delivery Ratio ◽

Data Accessibility ◽

Network Nodes ◽

Efficient Data ◽

Mobile Ad Hoc ◽

Hoc Networks

Multicast data replication provides a possible solution for improving data accessibility in highly dynamic and fault prone mobile ad hoc environments. Our novel multicast data replication approach operates in a self-organizing manner where the network nodes that has unit host detector construct a connected dominating set (CDS) based on the topology graph by collecting information from neighboring nodes using multicast if gathered data from neighbors have two non-adjacent neighbors then use that virtual backbone for efficient data replication, data search and routing. In this study, we compare our proposed approach with SCALAR and evaluate it in average hop counts and successful delivery ratio with different node numbers and speeds.It is shown that the average hop counts increased but with falling rate and 20 percent successful delivery ratio is achieved, so it is demonstrated that PM act with respect to fault tolerance improvement, power consumption and load balancing is occurred.

Download Full-text

An Efficient Data Replication Algorithm for Distributed Systems

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2018070105 ◽

2018 ◽

Vol 8 (3) ◽

pp. 60-77

Author(s):

Sanjaya Kumar Panda ◽

Saswati Naik

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Data Replication ◽

Rigorous Analysis ◽

Assignment Algorithm ◽

Efficient Data ◽

Improved Performance ◽

Recovery Approach ◽

Types Of Faults

This article describes how data replication plays an important role in distributed systems. It primarily focuses on the redundancy of data at two or more nodes, to achieve both fault tolerance and improved performance. Therefore, many researchers have proposed various data replication algorithms to manage the redundancy of data. However, they have not considered the faults that are associated with the nodes, such as permanent, transient and intermittent. Moreover, they have not incorporated any recovery approach to rejoin the failed nodes. Therefore, the authors propose a data replication algorithm, called dynamic vote-based data replication (DVDR). The main contribution of DVDR is to consider all types of faults and rejoin the failed nodes. DVDR is based on dynamic vote assignment among the connected nodes, and referred as passive and non-hierarchical one. The authors perform rigorous analysis of DVDR and compare with an existing dynamic vote assignment algorithm. The result shows the efficacy of the proposed algorithm.

Download Full-text

An Efficient Data Replication Algorithm for Distributed Systems

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch065 ◽

2021 ◽

pp. 1344-1363

Author(s):

Sanjaya Kumar Panda ◽

Saswati Naik

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Data Replication ◽

Rigorous Analysis ◽

Assignment Algorithm ◽

Efficient Data ◽

Improved Performance ◽

Recovery Approach ◽

Types Of Faults

This article describes how data replication plays an important role in distributed systems. It primarily focuses on the redundancy of data at two or more nodes, to achieve both fault tolerance and improved performance. Therefore, many researchers have proposed various data replication algorithms to manage the redundancy of data. However, they have not considered the faults that are associated with the nodes, such as permanent, transient and intermittent. Moreover, they have not incorporated any recovery approach to rejoin the failed nodes. Therefore, the authors propose a data replication algorithm, called dynamic vote-based data replication (DVDR). The main contribution of DVDR is to consider all types of faults and rejoin the failed nodes. DVDR is based on dynamic vote assignment among the connected nodes, and referred as passive and non-hierarchical one. The authors perform rigorous analysis of DVDR and compare with an existing dynamic vote assignment algorithm. The result shows the efficacy of the proposed algorithm.

Download Full-text

An Efficient Data Replication Technique with Fault Tolerance Approach using BVAG with Checkpoint and Rollback-Recovery

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120155 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Sharifah Hafizah Sy Ahmad Ubaidillah ◽

Basem Alkazemi ◽

A. Noraziah

Keyword(s):

Fault Tolerance ◽

Data Replication ◽

Rollback Recovery ◽

Tolerance Approach ◽

Efficient Data ◽

Replication Technique ◽

Checkpoint And Rollback

Download Full-text

Agile Store: Experience with Quorum-Based Data Replication Techniques for Adaptive Byzantine Fault Tolerance

24th IEEE Symposium on Reliable Distributed Systems (SRDS'05) ◽

10.1109/reldis.2005.7 ◽

2006 ◽

Cited By ~ 1

Author(s):

Lei Kong ◽

D.J. Manohar ◽

M. Ahamad ◽

A. Subbiah ◽

M. Sun ◽

...

Keyword(s):

Fault Tolerance ◽

Data Replication ◽

Byzantine Fault Tolerance ◽

Byzantine Fault

Download Full-text

Data Replication in P2P Systems

Handbook of Research on P2P and Grid Systems for Service-Oriented Computing ◽

10.4018/978-1-61520-686-5.ch025 ◽

2010 ◽

pp. 589-615

Author(s):

Vassilios V. Dimakopoulos ◽

Spiridoula Margariti ◽

Mirto Ntetsika ◽

Evaggelia Pitoura

Keyword(s):

Fault Tolerance ◽

Load Balancing ◽

Response Time ◽

Distributed System ◽

Data Replication ◽

System Availability ◽

P2p Systems ◽

Network Links ◽

Multiple Copies ◽

Additional Reason

Maintaining multiple copies of data items is a commonly used mechanism for improving the performance and fault-tolerance of any distributed system. By placing copies of data items closer to their requesters, the response time of queries can be improved. An additional reason for replication is load balancing. For instance, by allocating many copies to popular data items, the query load can be evenly distributed among the servers that hold these copies. Similarly, by eliminating hotspots, replication can lead to a better distribution of the communication load over the network links. Besides performance-related reasons, replication improves system availability, since the larger the number of copies of an item, the more site failures can be tolerated. In this chapter we survey replication methods applicable to p2p systems. Although there exist some general techniques, methodologies are distinguished according to the overlay organization (structured and unstructured) they are aimed at. After replicas are created and distributed, a major issue is their maintenance. We present strategies that have been proposed for keeping replicas up to date so as to achieve a desired level of consistency.

Download Full-text

Data replication strategies for fault tolerance and availability on commodity clusters

Use of Genetic Programming Operators in Data Replication and Fault Tolerance

Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases

Consistency of Replicated Datasets in Grid Computing

Novelty circular neighboring technique using reactive fault tolerance method

Multicast Data Replication Approach for Improving Fault Tolerance in Mobile Ad hoc Networks

An Efficient Data Replication Algorithm for Distributed Systems

An Efficient Data Replication Algorithm for Distributed Systems

An Efficient Data Replication Technique with Fault Tolerance Approach using BVAG with Checkpoint and Rollback-Recovery

Agile Store: Experience with Quorum-Based Data Replication Techniques for Adaptive Byzantine Fault Tolerance

Data Replication in P2P Systems

Export Citation Format