An integrated storage and data management system for a high energy physics experiment

Author(s):  
Paolo Calafiura ◽  
Gerhard Wirrer ◽  
Bernd Panzer-Steindel
1990 ◽  
Author(s):  
A. S. Johnson ◽  
M. I. Briedenbach ◽  
H. Hissen ◽  
P. F. Kunz ◽  
D. J. Sherden ◽  
...  

2020 ◽  
Vol 245 ◽  
pp. 11006 ◽  
Author(s):  
Mario Lassnig ◽  
Martin Barisits ◽  
Paul J Laycock ◽  
Cédric Serfon ◽  
Eric W Vaandering ◽  
...  

For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organisations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analysed by researchers, or archived for safe long-term storage. To fulfil these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS at LHC to manage its large volumes of data in an efficient and scalable way. But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown, many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities that arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments Belle II and CMS, the neutrino experiment DUNE, the scattering radar experiment EISCAT3D, the gravitational wave observatories LIGO and VIRGO, the SKA radio telescope, and the dark matter search experiment XENON.


2019 ◽  
Vol 214 ◽  
pp. 04020 ◽  
Author(s):  
Martin Barisits ◽  
Fernando Barreiro ◽  
Thomas Beermann ◽  
Karan Bhatia ◽  
Kaushik De ◽  
...  

Transparent use of commercial cloud resources for scientific experiments is a hard problem. In this article, we describe the first steps of the Data Ocean R&D collaboration between the high-energy physics experiment ATLAS together with Google Cloud Platform, to allow seamless use of Google Compute Engine and Google Cloud Storage for physics analysis. We start by describing the three preliminary use cases that were identified at the beginning of the project. The following sections then detail the work done in the data management system Rucio and the workflow management systems PanDA and Harvester to interface Google Cloud Platform with the ATLAS distributed computing environment, and show the results of the integration tests. Afterwards, we describe the setup and results from a full ATLAS user analysis that was executed natively on Google Cloud Platform, and give estimates on projected costs. We close with a summary and and outlook on future work.


2020 ◽  
Vol 226 ◽  
pp. 01007
Author(s):  
Alexei Klimentov ◽  
Douglas Benjamin ◽  
Alessandro Di Girolamo ◽  
Kaushik De ◽  
Johannes Elmsheuser ◽  
...  

The ATLAS experiment at CERN’s Large Hadron Collider uses theWorldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the workload management system PanDA and the distributed data management system Rucio, ATLAS provides seamless access to hundreds of WLCG grid and cloud based resources that are distributed worldwide, to thousands of physicists. PanDA annually processes more than an exabyte of data using an average of 350,000 distributed batch slots, to enable hundreds of new scientific results from ATLAS. However, the resources available to the experiment have been insufficient to meet ATLAS simulation needs over the past few years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The High Energy Physics (HEP) community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight recent R&D projects in HEP related to data lake prototype, federated data storage and data carousel.


2019 ◽  
Vol 214 ◽  
pp. 04054
Author(s):  
Martin Barisits ◽  
Thomas Beermann ◽  
Joaquin Bogado ◽  
Vincent Garonne ◽  
Tomas Javurek ◽  
...  

Rucio, the distributed data management system of the ATLAS experiment already manages more than 400 Petabytes of physics data on the grid. Rucio was incrementally improved throughout LHC Run-2 and is currently being prepared for the HL-LHC era of the experiment. Next to these improvements the system is currently evolving into a full-scale generic data management system for application beyond ATLAS, or even beyond high-energy physics. This contribution focuses on the development roadmap of Rucio for LHC Run-3, such as event level data management, generic meta-data support and increased usage of networks and tapes. At the same time Rucio is evolving beyond the original ATLAS requirements. This includes additional authentication mechanisms, generic database compatibility, deployment and packaging of the software stack in containers, and a project paradigm shift to a full-scale open source project..


2019 ◽  
Author(s):  
Juan Carlos Cabanillas Noris ◽  
Ildefonso León Monzón ◽  
Mario Iván Martínez Hernández ◽  
Solangel Rojas Torres

Sign in / Sign up

Export Citation Format

Share Document