ArchaeoGRID Science Gateways for Easy Access to Distributed Computing Infrastructure for Large Data Storage and Analysis in Archaeology and History

This article describes how archaeological and historical research grew as a multidisciplinary and interdisciplinary activity due to availability of larger amount of data within the reconstruction of global historical and archaeological contexts at a global spatio-temporal scale. The increased information, also integrated with data from the Earth Sciences, has had an effect on the exponential increase of complex sets of data and of refined methods of analysis. For such purposes, this article discusses the ArchaeoGRID Science Gateway paradigm for accessing ArchaeoGRID Cyberinfrastructure (CI), a Distributed Computing Infrastructure (DCI), that can supply storage and computing resources for managing and analyzing large amount of archaeological and historical data. In fact, ArchaeoGRID Science Gateway is emerging as high-level web environment that makes easier the access, in a transparent way, to DCI, as local high-performance computing, Grids and Clouds, from no specialized Virtual Research Communities (VRC) of archaeologists and historians.

A Review: Map Reduce Framework for Cloud Computing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.6.20224 ◽

2018 ◽

Vol 7 (4.6) ◽

pp. 13

Author(s):

Mekala Sandhya ◽

Ashish Ladda ◽

Dr. Uma N Dulhare ◽

. . ◽

. .

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Large Data ◽

Mass Data ◽

Internet Information

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Collaborative Computing: Networking, Applications and Worksharing ◽

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education

10.1007/978-3-642-03354-4_7 ◽

2009 ◽

pp. 70-84 ◽

Cited By ~ 5

Author(s):

Renato J. Figueiredo ◽

P. Oscar Boykin ◽

José A. B. Fortes ◽

Tao Li ◽

Jie-Kwon Peir ◽

...

Keyword(s):

Distributed Computing ◽

Computer Architecture ◽

Research And Education ◽

Computational Science and Its Applications – ICCSA 2016 - Lecture Notes in Computer Science ◽

Distributed Computing Infrastructure Based on Dynamic Container Clusters

10.1007/978-3-319-42108-7_20 ◽

2016 ◽

pp. 263-275 ◽

Cited By ~ 3

Author(s):

Vladimir Korkhov ◽

Sergey Kobyshev ◽

Artem Krosheninnikov ◽

Alexander Degtyarev ◽

Alexander Bogdanov

Keyword(s):

Distributed Computing ◽

Exascale Data Processing in Heterogeneous Distributed Computing Infrastructure for Applications in High Energy Physics

Physics of Particles and Nuclei ◽

10.1134/s1063779620060052 ◽

2020 ◽

Vol 51 (6) ◽

pp. 995-1068

Author(s):

A. A. Klimentov

Keyword(s):

Distributed Computing ◽

Data Processing ◽

High Energy Physics ◽

High Energy ◽

Heterogeneous Distributed Computing ◽

Computing Infrastructure ◽

Energy Physics

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Large-Scale Distributed Computing and Applications ◽

Data Storage, Retrieval and Management

10.4018/978-1-61520-703-9.ch006 ◽

2010 ◽

pp. 111-140

Author(s):

Valentin Cristea ◽

Ciprian Dobre ◽

Corina Stratan ◽

Florin Pop

Keyword(s):

Data Storage ◽

Resource Sharing ◽

High Performance ◽

Large Scale ◽

Workflow Management ◽

Large Data ◽

Data Retrieval ◽

Distributed Data Storage ◽

Processing Power ◽

Data Transfers

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.

Empirical Performance Analysis of HPC Benchmarks Across Variations in Cloud Computing

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2013010102 ◽

2013 ◽

Vol 3 (1) ◽

pp. 13-26 ◽

Cited By ~ 4

Author(s):

Sanjay P. Ahuja ◽

Sindhu Mani

Keyword(s):

Data Storage ◽

High Performance ◽

Large Data ◽

Extensive Study ◽

Memory Bandwidth ◽

Platform As A Service ◽

Data Intensive ◽

Computational Performance ◽

Empirical Performance ◽

Data Intensive Applications

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.

Computing shifts to monitor ATLAS distributed computing infrastructure and operations

Journal of Physics Conference Series ◽

10.1088/1742-6596/898/9/092004 ◽

2017 ◽

Vol 898 ◽

pp. 092004

Author(s):

C Adam ◽

D Barberis ◽

S Crépé-Renaudin ◽

K De ◽

F Fassi ◽

...

Keyword(s):

Distributed Computing ◽

Monitoring of Distributed Computing Infrastructure in the VI-SEEM Project

2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH) ◽

10.1109/infoteh.2019.8717755 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mihajlo Savic ◽

Nikola Obradovic ◽

Aleksandar Kelec ◽

Milos Ljubojevic

Keyword(s):

Distributed Computing ◽

The Modern Research Data Portal: A design pattern for networked, data-intensive science

10.7287/peerj.preprints.3194v2 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kyle Chard ◽

Eli Dart ◽

Ian Foster ◽

David Shifflett ◽

Steven Tuecke ◽

...

Keyword(s):

Best Practices ◽

Data Storage ◽

Design Pattern ◽

High Speed ◽

High Performance ◽

Data Transfer ◽

Large Data ◽

Research Data ◽

Control Logic ◽

Data Portal

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.