scholarly journals JAZELLE: An enhanced data management system for high energy physics

1990 ◽  
Author(s):  
A. S. Johnson ◽  
M. I. Briedenbach ◽  
H. Hissen ◽  
P. F. Kunz ◽  
D. J. Sherden ◽  
...  
2020 ◽  
Vol 226 ◽  
pp. 01007
Author(s):  
Alexei Klimentov ◽  
Douglas Benjamin ◽  
Alessandro Di Girolamo ◽  
Kaushik De ◽  
Johannes Elmsheuser ◽  
...  

The ATLAS experiment at CERN’s Large Hadron Collider uses theWorldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the workload management system PanDA and the distributed data management system Rucio, ATLAS provides seamless access to hundreds of WLCG grid and cloud based resources that are distributed worldwide, to thousands of physicists. PanDA annually processes more than an exabyte of data using an average of 350,000 distributed batch slots, to enable hundreds of new scientific results from ATLAS. However, the resources available to the experiment have been insufficient to meet ATLAS simulation needs over the past few years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The High Energy Physics (HEP) community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight recent R&D projects in HEP related to data lake prototype, federated data storage and data carousel.


2019 ◽  
Vol 214 ◽  
pp. 04054
Author(s):  
Martin Barisits ◽  
Thomas Beermann ◽  
Joaquin Bogado ◽  
Vincent Garonne ◽  
Tomas Javurek ◽  
...  

Rucio, the distributed data management system of the ATLAS experiment already manages more than 400 Petabytes of physics data on the grid. Rucio was incrementally improved throughout LHC Run-2 and is currently being prepared for the HL-LHC era of the experiment. Next to these improvements the system is currently evolving into a full-scale generic data management system for application beyond ATLAS, or even beyond high-energy physics. This contribution focuses on the development roadmap of Rucio for LHC Run-3, such as event level data management, generic meta-data support and increased usage of networks and tapes. At the same time Rucio is evolving beyond the original ATLAS requirements. This includes additional authentication mechanisms, generic database compatibility, deployment and packaging of the software stack in containers, and a project paradigm shift to a full-scale open source project..


2020 ◽  
Vol 245 ◽  
pp. 11006 ◽  
Author(s):  
Mario Lassnig ◽  
Martin Barisits ◽  
Paul J Laycock ◽  
Cédric Serfon ◽  
Eric W Vaandering ◽  
...  

For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organisations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analysed by researchers, or archived for safe long-term storage. To fulfil these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS at LHC to manage its large volumes of data in an efficient and scalable way. But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown, many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities that arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments Belle II and CMS, the neutrino experiment DUNE, the scattering radar experiment EISCAT3D, the gravitational wave observatories LIGO and VIRGO, the SKA radio telescope, and the dark matter search experiment XENON.


2019 ◽  
Vol 214 ◽  
pp. 04020 ◽  
Author(s):  
Martin Barisits ◽  
Fernando Barreiro ◽  
Thomas Beermann ◽  
Karan Bhatia ◽  
Kaushik De ◽  
...  

Transparent use of commercial cloud resources for scientific experiments is a hard problem. In this article, we describe the first steps of the Data Ocean R&D collaboration between the high-energy physics experiment ATLAS together with Google Cloud Platform, to allow seamless use of Google Compute Engine and Google Cloud Storage for physics analysis. We start by describing the three preliminary use cases that were identified at the beginning of the project. The following sections then detail the work done in the data management system Rucio and the workflow management systems PanDA and Harvester to interface Google Cloud Platform with the ATLAS distributed computing environment, and show the results of the integration tests. Afterwards, we describe the setup and results from a full ATLAS user analysis that was executed natively on Google Cloud Platform, and give estimates on projected costs. We close with a summary and and outlook on future work.


2014 ◽  
Vol 25 (02) ◽  
pp. 1430001 ◽  
Author(s):  
ALBERTO PACE

In recent years, intense usage of computing has been the main strategy of investigations in several scientific research projects. The progress in computing technology has opened unprecedented opportunities for systematic collection of experimental data and the associated analysis that were considered impossible only few years ago. This paper focuses on the strategies in use: it reviews the various components that are necessary for an effective solution that ensures the storage, the long term preservation, and the worldwide distribution of large quantities of data that are necessary in a large scientific research project. The paper also mentions several examples of data management solutions used in High Energy Physics for the CERN Large Hadron Collider (LHC) experiments in Geneva, Switzerland which generate more than 30,000 terabytes of data every year that need to be preserved, analyzed, and made available to a community of several tenth of thousands scientists worldwide.


2020 ◽  
Vol 245 ◽  
pp. 03035
Author(s):  
Federico Stagni ◽  
Andrei Tsaregorodtsev ◽  
André Sailer ◽  
Christophe Haen

Efficient access to distributed computing and storage resources is mandatory for the success of current and future High Energy and Nuclear Physics Experiments. DIRAC is an interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for the Workload, Data and Production Management tasks of large scientific communities. A single DIRAC installation provides a complete solution for the distributed computing of one, or more than one collaboration. The DIRAC Workload Management System (WMS) provides a transparent, uniform interface for managing computing resources. The DIRAC Data Management System (DMS) offers all the necessary tools to ensure data handling operations: it supports transparent access to storage resources based on multiple technologies, and is easily expandable. Distributed Data management can be performed, also using third party services, and operations are resilient with respect to failures. DIRAC is highly customizable and can be easily extended. For these reasons, a vast and heterogeneous set of scientific collaborations have adopted DIRAC as the base for their computing models. Users from different experiments can interact with the system in different ways, depending on their specific tasks, expertise level and previous experience using command line tools, python APIs or Web Portals. The requirements of the diverse DIRAC user communities and hosting infrastructures triggered multiple developments to improve the system usability: examples include the adoption of industry standard authorization and authentication infrastructure solutions, the management of diverse computing resources (cloud, HPC, GPGPU, etc.), the handling of high-intensity work and data flows, but also advanced monitoring and accounting using no-SQL based solutions and message queues. This contribution will highlight DIRAC’s current, upcoming and planned capabilities and technologies.


2021 ◽  
Vol 251 ◽  
pp. 02048
Author(s):  
Mandrichenko Igor

Metadata management is one of three major areas of scientific data management along with replica management and workflow management. Metadata is the information describing the data stored in a data item, a file or an object. It includes the data item provenance, recording conditions, format and other attributes. MetaCat is a metadata management database designed and developed for High Energy Physics experiments. As a component of a data management system, it’s main objectives are to provide efficient metadata storage and management and fast data selection functionality. MetaCat is required to work on the scale of 100 million files (or objects) and beyond. The article will discuss the functionality of MetaCat and technological solutions used to implement the product.


Sign in / Sign up

Export Citation Format

Share Document