A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids

One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.

Download Full-text

ARRA: An Associated Replica Replacement Algorithm Based on Apriori Approach for Data Intensive Jobs in Data Grid

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1409 ◽

2010 ◽

Vol 439-440 ◽

pp. 1409-1414 ◽

Cited By ~ 5

Author(s):

Jian Hua Jiang ◽

Hui Fang Ji ◽

Gao Chao Xu ◽

Xiao Hui Wei

Keyword(s):

Data File ◽

Data Grid ◽

Relative Advantage ◽

Data Intensive ◽

File Access ◽

Replacement Algorithm ◽

Data Files ◽

Network Usage ◽

Effective Network ◽

Replica Replacement

Creating many replicas in the processing of data-intensive jobs in data grid is an efficient strategy. Replica replacement is the crucial step to this strategy. Economic model, popularity model and hybrid model etc. have been proposed to solve this issue of replica replacement with analysis and prediction based on each data file, however, these models neglect association relationships among different data files. To find out these association relationships hidden in data-intensive jobs, Apriori algorithm in data mining field is adopted to analyze behaviors of each data-intensive job. An associated replica replacement algorithm based on Apriori approach in data grid is proposed in this paper. This algorithm has two major steps: 1) associated behavior analysis and classification of data files in each node; 2) generation and application of replica replacement rules. Our proposed algorithm is simulated in Optorsim to be compared with LFU algorithm. The experiment shows that there is a relative advantage compared with LFU in mean job times of all jobs, number of remote file access and effective network usage perspectives.

Download Full-text

Predictive File Replication on the Data Grids

Evolving Developments in Grid and Cloud Computing ◽

10.4018/978-1-4666-0056-0.ch005 ◽

2012 ◽

pp. 67-83

Author(s):

ChenHan Liao ◽

Na Helian ◽

Sining Wu ◽

Mamunur M. Rashid

Keyword(s):

Decision Tree ◽

The Other ◽

Resource Usage ◽

Simulation Environment ◽

Data Grids ◽

Access Latency ◽

Replication Strategy ◽

Network Usage ◽

File Replication ◽

Effective Network

Most replication methods either monitor the popularity of files or use complicated functions to calculate the overall cost of whether or not a replication decision or a deletion decision should be issued. However, once the replication decision is issued, the popularity of the files is changed and may have already impacted access latency and resource usage. This article proposes a decision-tree-based predictive file replication strategy that forecasts files’ future popularity based on their characteristics on the Grids. The proposed strategy has shown superb performance in terms of mean job time and effective network usage compared with the other two replication strategies, LRU and Economic under OptorSim simulation environment.

Download Full-text

A four-phase data replication algorithm for data grid

Journal of Advanced Computer Science & Technology ◽

10.14419/jacst.v4i1.4009 ◽

2015 ◽

Vol 4 (1) ◽

pp. 163 ◽

Cited By ~ 2

Author(s):

Alireza Saleh ◽

Reza Javidan ◽

Mohammad Taghi FatehiKhajeh

Keyword(s):

Large Scale ◽

Data Replication ◽

Data Grid ◽

Data Availability ◽

Access Time ◽

File Transfer ◽

System Availability ◽

Large Scale Data ◽

Effective Network ◽

Average Access Time

<p>Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.</p>

Download Full-text

Predictive File Replication on the Data Grids

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2010092805 ◽

2010 ◽

Vol 2 (1) ◽

pp. 69-86 ◽

Cited By ~ 3

Author(s):

ChenHan Liao ◽

Na Helian ◽

Sining Wu ◽

Mamunur M. Rashid

Keyword(s):

Decision Tree ◽

The Other ◽

Resource Usage ◽

Simulation Environment ◽

Data Grids ◽

Access Latency ◽

Replication Strategy ◽

Network Usage ◽

File Replication ◽

Effective Network

Most replication methods either monitor the popularity of files or use complicated functions to calculate the overall cost of whether or not a replication decision or a deletion decision should be issued. However, once the replication decision is issued, the popularity of the files is changed and may have already impacted access latency and resource usage. This article proposes a decision-tree-based predictive file replication strategy that forecasts files’ future popularity based on their characteristics on the Grids. The proposed strategy has shown superb performance in terms of mean job time and effective network usage compared with the other two replication strategies, LRU and Economic under OptorSim simulation environment.

Download Full-text

Efficient Dynamic Replication Algorithm Using Agent for Data Grid

The Scientific World JOURNAL ◽

10.1155/2014/767016 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 7

Author(s):

Priyanka Vashisht ◽

Rajesh Kumar ◽

Anju Sharma

Keyword(s):

Data Replication ◽

Data Access ◽

Data Grid ◽

Data Availability ◽

Access Time ◽

Test Bed ◽

Data Grids ◽

Dynamic Replication ◽

Data Files ◽

Using Data

In data grids scientific and business applications produce huge volume of data which needs to be transferred among the distributed and heterogeneous nodes of data grids. Data replication provides a solution for managing data files efficiently in large grids. The data replication helps in enhancing the data availability which reduces the overall access time of the file. In this paper an algorithm, namely, EDRA using agents for data grid, has been proposed and implemented. EDRA consists of dynamic replication of hierarchical structure taken into account for the selection of best replica. Decision for selecting the best replica is based on scheduling parameters. The scheduling parameters are bandwidth, load gauge, and computing capacity of the node. The scheduling in data grid helps in reducing the data access time. The distribution of the load on the nodes of data grid is done evenly by considering scheduling parameters. EDRA is implemented using data grid simulator, namely, OptorSim. European Data Grid CMS test bed topology is used in this experiment. The simulation results are obtained by comparing BHR, LRU, No Replication, and EDRA. The result shows the efficiency of EDRA algorithm in terms of mean job execution time, network usage, and storage usage of node.

Download Full-text

The State of the Art and Open Problems in Data Replication in Grid Environments

Handbook of Research on Scalable Computing Technologies ◽

10.4018/978-1-60566-661-7.ch022 ◽

2010 ◽

pp. 486-516 ◽

Cited By ~ 2

Author(s):

Mohammad Shorfuzzaman ◽

Rasit Eskicioglu ◽

Peter Graham

Keyword(s):

Data Replication ◽

Data Access ◽

The State ◽

Data Availability ◽

Data Grids ◽

Open Problems ◽

Data Intensive ◽

Grid Environments ◽

Bandwidth Savings ◽

And Storage

Data Grids provide services and infrastructure for distributed data-intensive applications that need to access, transfer and modify massive datasets stored at distributed locations around the world. For example, the next-generation of scientific applications such as many in high-energy physics, molecular modeling, and earth sciences will involve large collections of data created from simulations or experiments. The size of these data collections is expected to be of multi-terabyte or even petabyte scale in many applications. Ensuring efficient, reliable, secure and fast access to such large data is hindered by the high latencies of the Internet. The need to manage and access multiple petabytes of data in Grid environments, as well as to ensure data availability and access optimization are challenges that must be addressed. To improve data access efficiency, data can be replicated at multiple locations so that a user can access the data from a site near where it will be processed. In addition to the reduction of data access time, replication in Data Grids also uses network and storage resources more efficiently. In this chapter, the state of current research on data replication and arising challenges for the new generation of data-intensive grid environments are reviewed and open problems are identified. First, fundamental data replication strategies are reviewed which offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Then, specific algorithms for selecting appropriate replicas and maintaining replica consistency are discussed. The impact of data replication on job scheduling performance in Data Grids is also analyzed. A set of appropriate metrics including access latency, bandwidth savings, server load, and storage overhead for use in making critical comparisons of various data replication techniques is also discussed. Overall, this chapter provides a comprehensive study of replication techniques in Data Grids that not only serves as a tool to understanding this evolving research area but also provides a reference to which future e orts may be mapped.

Download Full-text

A survey of dynamic replication strategies for improving data availability in data grids

Future Generation Computer Systems ◽

10.1016/j.future.2011.06.009 ◽

2012 ◽

Vol 28 (2) ◽

pp. 337-349 ◽

Cited By ~ 46

Author(s):

Tehmina Amjad ◽

Muhammad Sher ◽

Ali Daud

Keyword(s):

Data Availability ◽

Data Grids ◽

Dynamic Replication

Download Full-text

Keysystems in large systems implementing distributed data processing and storage technologies

Highly available systems ◽

10.18127/j20729472-202103-01 ◽

2021 ◽

Author(s):

V.G. Belenkov ◽

V.I. Korolev ◽

V.I. Budzko ◽

D.A. Melnikov

Keyword(s):

Data Processing ◽

Distributed Processing ◽

Specific Work ◽

Distributed Data ◽

Distributed Data Processing ◽

Technical Specifications ◽

The Creation ◽

Processing And Storage ◽

Storage Technologies ◽

And Storage

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.

Download Full-text

Policy Driven Negotiation to Improve the QoS in Data Grid

Encyclopedia of E-Business Development and Management in the Global Economy ◽

10.4018/978-1-61520-611-7.ch105 ◽

2010 ◽

pp. 1041-1056

Author(s):

Ghalem Belalem

Keyword(s):

Large Scale ◽

Data Access ◽

Data Grid ◽

Management Service ◽

Replica Placement ◽

Data Grids ◽

Large Scale Systems ◽

Consistency Management ◽

Access Latency ◽

Replicated Data

Data grids have become an interesting and popular domain in grid community (Foster and Kesselmann, 2004). Generally, the grids are proposed as solutions for large scale systems, where data replication is a well-known technique used to reduce access latency and bandwidth, and increase availability. In splitting of the advantages of replication, there are many problems that should be solved such as, • The replica placement that determines the optimal locations of replicated data in order to reduce the storage cost and data access (Xu et al, 2002); • The problem of determining which replica will be accessed to in terms of consistency when we need to execute a read or write operation (Ranganathan and Foster, 2001); • The problem of degree of replication which consists in finding a minimal number of replicas without reducing the performance of user applications; • The problem of replica consistency that concerns the consistency of a set of replicated data. This consistency provides a completely coherent view of all the replicas for a user (Gray et al 1996). Our principal aim, in this article, is to integrate into consistency management service, an approach based on an economic model for resolving conflicts detected in the data grid.

Download Full-text