scholarly journals Efficient Dynamic Replication Algorithm Using Agent for Data Grid

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Priyanka Vashisht ◽  
Rajesh Kumar ◽  
Anju Sharma

In data grids scientific and business applications produce huge volume of data which needs to be transferred among the distributed and heterogeneous nodes of data grids. Data replication provides a solution for managing data files efficiently in large grids. The data replication helps in enhancing the data availability which reduces the overall access time of the file. In this paper an algorithm, namely, EDRA using agents for data grid, has been proposed and implemented. EDRA consists of dynamic replication of hierarchical structure taken into account for the selection of best replica. Decision for selecting the best replica is based on scheduling parameters. The scheduling parameters are bandwidth, load gauge, and computing capacity of the node. The scheduling in data grid helps in reducing the data access time. The distribution of the load on the nodes of data grid is done evenly by considering scheduling parameters. EDRA is implemented using data grid simulator, namely, OptorSim. European Data Grid CMS test bed topology is used in this experiment. The simulation results are obtained by comparing BHR, LRU, No Replication, and EDRA. The result shows the efficiency of EDRA algorithm in terms of mean job execution time, network usage, and storage usage of node.

2015 ◽  
Vol 4 (1) ◽  
pp. 163 ◽  
Author(s):  
Alireza Saleh ◽  
Reza Javidan ◽  
Mohammad Taghi FatehiKhajeh

<p>Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.</p>


Author(s):  
Mohammad Shorfuzzaman ◽  
Rasit Eskicioglu ◽  
Peter Graham

Data Grids provide services and infrastructure for distributed data-intensive applications that need to access, transfer and modify massive datasets stored at distributed locations around the world. For example, the next-generation of scientific applications such as many in high-energy physics, molecular modeling, and earth sciences will involve large collections of data created from simulations or experiments. The size of these data collections is expected to be of multi-terabyte or even petabyte scale in many applications. Ensuring efficient, reliable, secure and fast access to such large data is hindered by the high latencies of the Internet. The need to manage and access multiple petabytes of data in Grid environments, as well as to ensure data availability and access optimization are challenges that must be addressed. To improve data access efficiency, data can be replicated at multiple locations so that a user can access the data from a site near where it will be processed. In addition to the reduction of data access time, replication in Data Grids also uses network and storage resources more efficiently. In this chapter, the state of current research on data replication and arising challenges for the new generation of data-intensive grid environments are reviewed and open problems are identified. First, fundamental data replication strategies are reviewed which offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Then, specific algorithms for selecting appropriate replicas and maintaining replica consistency are discussed. The impact of data replication on job scheduling performance in Data Grids is also analyzed. A set of appropriate metrics including access latency, bandwidth savings, server load, and storage overhead for use in making critical comparisons of various data replication techniques is also discussed. Overall, this chapter provides a comprehensive study of replication techniques in Data Grids that not only serves as a tool to understanding this evolving research area but also provides a reference to which future e orts may be mapped.


Author(s):  
Ghalem Belalem

Data grids have become an interesting and popular domain in grid community (Foster and Kesselmann, 2004). Generally, the grids are proposed as solutions for large scale systems, where data replication is a well-known technique used to reduce access latency and bandwidth, and increase availability. In splitting of the advantages of replication, there are many problems that should be solved such as, • The replica placement that determines the optimal locations of replicated data in order to reduce the storage cost and data access (Xu et al, 2002); • The problem of determining which replica will be accessed to in terms of consistency when we need to execute a read or write operation (Ranganathan and Foster, 2001); • The problem of degree of replication which consists in finding a minimal number of replicas without reducing the performance of user applications; • The problem of replica consistency that concerns the consistency of a set of replicated data. This consistency provides a completely coherent view of all the replicas for a user (Gray et al 1996). Our principal aim, in this article, is to integrate into consistency management service, an approach based on an economic model for resolving conflicts detected in the data grid.


Author(s):  
Mary Magdalene Jane.F ◽  
R. Nadarajan ◽  
Maytham Safar

Data caching in mobile clients is an important technique to enhance data availability and improve data access time. Due to cache size limitations, cache replacement policies are used to find a suitable subset of items for eviction from the cache. In this paper, the authors study the issues of cache replacement for location-dependent data under a geometric location model and propose a new cache replacement policy RAAR (Re-entry probability, Area of valid scope, Age, Rate of Access) by taking into account the spatial and temporal parameters. Mobile queries experience a popularity drift where the item loses its popularity after the user exhausts the corresponding service, thus calling for a scenario in which once popular documents quickly become cold (small active sets). The experimental evaluations using synthetic datasets for regular and small active sets show that this replacement policy is effective in improving the system performance in terms of the cache hit ratio of mobile clients.


2013 ◽  
Vol 5 (1) ◽  
pp. 70-81 ◽  
Author(s):  
Mohammed K. Madi ◽  
Yuhanis Yusof ◽  
Suhaidi Hassan

Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. To increase resource availability and to ease resource sharing in such environment, there is a need for replication services. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites. Replica placement mechanism is the process of identifying where to place copies of replicated data files in a Grid system. Existing work identifies the suitable sites based on number of requests and read cost of the required file. Such approaches consume large bandwidth and increases the computational time. The authors propose a replica placement strategy (RPS) that finds the best locations to store replicas based on four criteria, namely, 1) Read Cost, 2) File Transfer Time, 3) Sites’ Workload, and 4) Replication Sites. OptorSim is used to evaluate the performance of this replica placement strategy. The simulation results show that RPS requires less execution time and consumes less network usage compared to existing approaches of Simple Optimizer and LFU (Least Frequently Used).


2018 ◽  
Vol 2018 ◽  
pp. 1-16
Author(s):  
Saadi Hamad Thalij ◽  
Veli Hakkoymaz

Distributed systems offer resources to be accessed geographically for large-scale data requests of different users. In many cases, replication of the vital data files and storing their replica in multiple locations accessible to the requesting clients is vital in improving the data availability, reliability, security, and reduction of the execution time. It is important that real-time distributed databases maintain the consistency constraints and also guarantee the time constraints required by the client requests. However, when the size of the distributed system increases, the user access time also tends to increase, which in turn increases the vitality of the replica placement. Thus, the primary issues that emerge are deciding upon an optimal replication number and identifying perfect locations to store the replicated data. These open challenges have been considered in this study, which turns to develop a dynamic data replication algorithm for real-time distributed databases using a multiobjective glowworm swarm optimization (MGSO) strategy. The proposed algorithm adapts the random patterns of the read-write requests and employs a dynamic window mechanism for replication. It also models the replica number and placement problem as a multiobjective optimization problem and utilizes MGSO for resolving it. The cost models are presented to ensure the time constraint satisfaction in servicing user requests. The performance of the MGSO dynamic data replication algorithm has been studied using competitive analysis, and the results show the efficiency of the proposed algorithm for the distributed databases.


Author(s):  
B. Meroufel ◽  
G. Belalem

As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques are critical for improving the efficient utilization of expensive resources in high performance data grid systems. One of the most popular strategies of fault tolerance is the replication, it creates multiple copies of resources in the system and it has been proved to be an effective way to achieve data availability and system reliability. In this paper the authors propose a new adaptive dynamic replication that combines between a replication based on availability and replication based on popularity. The authors' adaptive dynamic replication uses two types of replicas (primary and ordinary) and two types of placement nodes (best client and best responsible nodes) for the new replicas. In addition to the replication, we used other strategies such as fault detection, fault prediction, dynamicity management, self-stabilization. All these services are grouped in one fault tolerance box named Collaborative Services for Fault Tolerance (CSFT) that structure them in hierarchical services and organize the relationships between them.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Mahsa Beigrezaei ◽  
Abolfazel Toroghi Haghighat ◽  
Seyedeh Leili Mirtaheri

The efficiency of data-intensive applications in distributed environments such as Cloud, Fog, and Grid is directly related to data access delay. Delays caused by queue workload and delays caused by failure can decrease data access efficiency. Data replication is a critical technique in reducing access latency. In this paper, a fuzzy-based replication algorithm is proposed, which avoids the mentioned imposed delays by considering a comprehensive set of significant parameters to improve performance. The proposed algorithm selects the appropriate replica using a hierarchical method, taking into account the transmission cost, queue delay, and failure probability. The algorithm determines the best place for replication using a fuzzy inference system considering the queue workload, number of accesses in the future, last access time, and communication capacity. It uses the Simple Exponential Smoothing method to predict future file popularity. The OptorSim simulator evaluates the proposed algorithm in different access patterns. The results show that the algorithm improves performance in terms of the number of replications, the percentage of storage filled, and the mean job execution time. The proposed algorithm has the highest efficiency in random access patterns, especially random Zipf access patterns. It also has good performance when the number of jobs and file size are increased.


Author(s):  
Nazanin Saadat ◽  
Amir Masoud Rahmani

One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.


Sign in / Sign up

Export Citation Format

Share Document