ArchaeoGRID Science Gateways for Easy Access to Distributed Computing Infrastructure for Large Data Storage and Analysis in Archaeology and History

Author(s):  
Giuliano Pelfer

This article describes how archaeological and historical research grew as a multidisciplinary and interdisciplinary activity due to availability of larger amount of data within the reconstruction of global historical and archaeological contexts at a global spatio-temporal scale. The increased information, also integrated with data from the Earth Sciences, has had an effect on the exponential increase of complex sets of data and of refined methods of analysis. For such purposes, this article discusses the ArchaeoGRID Science Gateway paradigm for accessing ArchaeoGRID Cyberinfrastructure (CI), a Distributed Computing Infrastructure (DCI), that can supply storage and computing resources for managing and analyzing large amount of archaeological and historical data. In fact, ArchaeoGRID Science Gateway is emerging as high-level web environment that makes easier the access, in a transparent way, to DCI, as local high-performance computing, Grids and Clouds, from no specialized Virtual Research Communities (VRC) of archaeologists and historians.

Author(s):  
Giuliano Pelfer

This article describes how archaeological and historical research grew as a multidisciplinary and interdisciplinary activity due to availability of larger amount of data within the reconstruction of global historical and archaeological contexts at a global spatio-temporal scale. The increased information, also integrated with data from the Earth Sciences, has had an effect on the exponential increase of complex sets of data and of refined methods of analysis. For such purposes, this article discusses the ArchaeoGRID Science Gateway paradigm for accessing ArchaeoGRID Cyberinfrastructure (CI), a Distributed Computing Infrastructure (DCI), that can supply storage and computing resources for managing and analyzing large amount of archaeological and historical data. In fact, ArchaeoGRID Science Gateway is emerging as high-level web environment that makes easier the access, in a transparent way, to DCI, as local high-performance computing, Grids and Clouds, from no specialized Virtual Research Communities (VRC) of archaeologists and historians.


2018 ◽  
Vol 7 (4.6) ◽  
pp. 13
Author(s):  
Mekala Sandhya ◽  
Ashish Ladda ◽  
Dr. Uma N Dulhare ◽  
. . ◽  
. .

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining. 


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.


2013 ◽  
Vol 3 (1) ◽  
pp. 13-26 ◽  
Author(s):  
Sanjay P. Ahuja ◽  
Sindhu Mani

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.


2017 ◽  
Vol 898 ◽  
pp. 092004
Author(s):  
C Adam ◽  
D Barberis ◽  
S Crépé-Renaudin ◽  
K De ◽  
F Fassi ◽  
...  

Author(s):  
Kyle Chard ◽  
Eli Dart ◽  
Ian Foster ◽  
David Shifflett ◽  
Steven Tuecke ◽  
...  

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.


Sign in / Sign up

Export Citation Format

Share Document