Data virtualization in grid environments through the GRelC Data Access and Integration Service

Author(s):  
S. Fiore ◽  
A. Negro ◽  
G. Aloisio
Author(s):  
Sandro Fiore ◽  
Alessandro Negro ◽  
Salvatore Vadacca ◽  
Massimo Cafaro ◽  
Giovanni Aloisio ◽  
...  

Grid computing is an emerging and enabling technology allowing organizations to easily share, integrate and manage resources in a distributed environment. Computational Grid allows running millions of jobs in parallel, but the huge amount of generated data has caused another interesting problem: the management (classification, storage, discovery etc.) of distributed data, i.e., a Data Grid specific issue. In the last decade, many efforts concerning the management of data (grid-storage services, metadata services, grid-database access and integration services etc.) identify data management as a real challenge for the next generation petascale grid environments. This work provides an architectural overview of the GRelC DAS, a grid database access service developed in the context of the GRelC Project and currently used for production/tutorial activities both in gLite and Globus based grid environments.


Author(s):  
Eduardo Gallo ◽  
Henrique Fabricio Gagliardi ◽  
Fabricio Alves Barbosa da Silva ◽  
Virgilio Cavicchioli Neto ◽  
Domingos Alves

Author(s):  
Mohammad Shorfuzzaman ◽  
Rasit Eskicioglu ◽  
Peter Graham

Data Grids provide services and infrastructure for distributed data-intensive applications that need to access, transfer and modify massive datasets stored at distributed locations around the world. For example, the next-generation of scientific applications such as many in high-energy physics, molecular modeling, and earth sciences will involve large collections of data created from simulations or experiments. The size of these data collections is expected to be of multi-terabyte or even petabyte scale in many applications. Ensuring efficient, reliable, secure and fast access to such large data is hindered by the high latencies of the Internet. The need to manage and access multiple petabytes of data in Grid environments, as well as to ensure data availability and access optimization are challenges that must be addressed. To improve data access efficiency, data can be replicated at multiple locations so that a user can access the data from a site near where it will be processed. In addition to the reduction of data access time, replication in Data Grids also uses network and storage resources more efficiently. In this chapter, the state of current research on data replication and arising challenges for the new generation of data-intensive grid environments are reviewed and open problems are identified. First, fundamental data replication strategies are reviewed which offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Then, specific algorithms for selecting appropriate replicas and maintaining replica consistency are discussed. The impact of data replication on job scheduling performance in Data Grids is also analyzed. A set of appropriate metrics including access latency, bandwidth savings, server load, and storage overhead for use in making critical comparisons of various data replication techniques is also discussed. Overall, this chapter provides a comprehensive study of replication techniques in Data Grids that not only serves as a tool to understanding this evolving research area but also provides a reference to which future e orts may be mapped.


2012 ◽  
pp. 517-527
Author(s):  
Sandro Fiore ◽  
Alessandro Negro ◽  
Salvatore Vadacca ◽  
Massimo Cafaro ◽  
Giovanni Aloisio ◽  
...  

Grid computing is an emerging and enabling technology allowing organizations to easily share, integrate and manage resources in a distributed environment. Computational Grid allows running millions of jobs in parallel, but the huge amount of generated data has caused another interesting problem: the management (classification, storage, discovery etc.) of distributed data, i.e., a Data Grid specific issue. In the last decade, many efforts concerning the management of data (grid-storage services, metadata services, grid-database access and integration services, etc.) identify data management as a real challenge for the next generation petascale grid environments. This work provides an architectural overview of the GRelC DAS, a grid database access service developed in the context of the GRelC Project and currently used for production/tutorial activities both in gLite and Globus based grid environments.


Sign in / Sign up

Export Citation Format

Share Document