computing grid
Recently Published Documents


TOTAL DOCUMENTS

174
(FIVE YEARS 39)

H-INDEX

11
(FIVE YEARS 1)

2022 ◽  
Vol 4 ◽  
Author(s):  
Alessandro Di Girolamo ◽  
Federica Legger ◽  
Panos Paparrigopoulos ◽  
Jaroslava Schovancová ◽  
Thomas Beermann ◽  
...  

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, the Operational Intelligence project aims at increasing the level of automation in computing operations and reducing human interventions. The distributed computing systems currently deployed by the LHC experiments have proven to be mature and capable of meeting the experimental goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters, and operational teams is needed to efficiently manage such heterogenous infrastructures. Under the scope of the Operational Intelligence project, experts from several areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. In this community study contribution, we report on the development of a suite of operational intelligence services to cover various use cases: workload management, data management, and site operations.


2021 ◽  
Author(s):  
Andrii Salnikov ◽  
Balázs Kónya

AbstractDistributed e-Infrastructure is a key component of modern BIG Science. Service discovery in e-Science environments, such as Worldwide LHC Computing Grid (WLCG), is a crucial functionality that relies on service registry. In this paper we re-formulate the requirements for the service endpoint registry based on our more than 10 years experience with many systems designed or used within the WLCG e-Infrastructure. To satisfy those requirements the paper proposes a novel idea to use the existing well-established Domain Name System (DNS) infrastructure together with a suitable data model as a service endpoint registry. The presented ARC Hierarchical Endpoints Registry (ARCHERY) system consists of a minimalistic data model representing services and their endpoints within e-Infrastructures, a rendering of the data model embedded into DNS-records, a lightweight software layer for DNS-record management and client-side data discovery. Our approach for the ARCHERY registry required minimal software development and inherits all the benefits of one of the most reliable distributed information discovery source of the internet, the DNS infrastructure. In particular, deployment, management and operation of ARCHERY is fully relying on DNS. Results of ARCHERY deployment use-cases are provided together with performance analysis.


Author(s):  
Quentin Guilloteau ◽  
Olivier Richard ◽  
Bogdan Robu ◽  
Eric Rutten

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Valentin Kuznetsov ◽  
Luca Giommi ◽  
Daniele Bonacorsi

AbstractMachine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the $$t{\bar{t}}$$ t t ¯ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Yutaro Iiyama ◽  
Benedikt Maier ◽  
Daniel Abercrombie ◽  
Maxim Goncharov ◽  
Christoph Paus

AbstractDynamo is a full-stack software solution for scientific data management. Dynamo’s architecture is modular, extensible, and customizable, making the software suitable for managing data in a wide range of installation scales, from a few terabytes stored at a single location to hundreds of petabytes distributed across a worldwide computing grid. This article documents the core system design of Dynamo and describes the applications that implement various data management tasks. A brief report is also given on the operational experiences of the system at the CMS experiment at the CERN Large Hadron Collider and at a small-scale analysis facility.


2021 ◽  
Vol 251 ◽  
pp. 02005
Author(s):  
Fernando Barreiro Megino ◽  
Singh Bawa Harinder ◽  
Kaushik De ◽  
Johannes Elmsheuser ◽  
Alexei Klimentov ◽  
...  

The CERN ATLAS Experiment successfully uses a worldwide distributed computing Grid infrastructure to support its physics programme at the Large Hadron Collider (LHC). The Grid workflow system PanDA routinely manages up to 700,000 concurrently running production and analysis jobs to process simulation and detector data. In total more than 500 PB of data are distributed over more than 150 sites in the WLCG and handled by the ATLAS data management system Rucio. To prepare for the ever growing data rate in future LHC runs new developments are underway to embrace industry accepted protocols and technologies, and utilize opportunistic resources in a standard way. This paper reviews how the Google and Amazon Cloud computing services have been seamlessly integrated as a Grid site within PanDA and Rucio. Performance and brief cost evaluations will be discussed. Such setups could offer advanced Cloud tool-sets and provide added value for analysis facilities that are under discussions for LHC Run-4.


Author(s):  
Low Tang Jung ◽  
Ahmed Abba Haruna

In the computing grid environment, jobs scheduling is fundamentally the process of allocating computing jobs with choices relevant to the available resources. As the scale of grid computing system grows in size over time, exponential increase in energy consumption is foreseen. As such, large data centers (DC) are embarking on green computing initiatives to address the IT operations impact on the environment. The main component within a computing system consuming the most electricity and generating the most heat is the microprocessor. The heat generated by these high-performance microprocessors is emitting high CO2 footprint. Therefore, jobs scheduling with thermal considerations (thermal-aware) to the microprocessors is important in DC grid operations. An approach for jobs scheduling is proposed in this chapter for reducing electricity usage (green computing) in DC grid. This approach is the outcome of the R&D works based on the DC grid environment in Universiti Teknologi PETRONAS, Malaysia.


2021 ◽  
Vol 251 ◽  
pp. 02058
Author(s):  
Lorena Lobato Pardavila ◽  
Burt Holzman ◽  
Edward Karavakis ◽  
Lincoln Bryant ◽  
Steven Timm

The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider’s data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy. In this article we summarize our experience building docker images based on work from the SLATE project (slateci.io) and deployed in OKD, the community distribution of Red Hat OpenShift. Additionally, we discuss our method of certificate management and maintenance utilizing Kubernetes CronJobs. Finally, we also report on the configuration currently running at Fermilab.


2021 ◽  
Vol 251 ◽  
pp. 02028
Author(s):  
Brian Bockelman ◽  
Andrea Ceccanti ◽  
Thomas Dack ◽  
Dave Dykstra ◽  
Maarten Litmaath ◽  
...  

Since 2017, the Worldwide LHC Computing Grid (WLCG) has been working towards enabling token based authentication and authorisation throughout its entire middleware stack. Following the publication of the WLCG Common JSON Web Token (JWT) Schema v1.0 [1] in 2019, middleware developers have been able to enhance their services to consume and validate the JWT-based [2] OAuth2.0 [3] tokens and process the authorization information they convey. Complex scenarios, involving multiple delegation steps and command line flows, are a key challenge to be addressed in order for the system to be fully operational. This paper expands on the anticipated token based workflows, with a particular focus on local storage of tokens and their discovery by services. The authors include a walk-through of this token flow in the RUCIO managed data-transfer scenario, including delegation to FTS and authorised access to storage elements. Next steps are presented, including the current target of submitting production jobs authorised by Tokens within 2021.


2021 ◽  
Vol 251 ◽  
pp. 02039
Author(s):  
Michael Böhler ◽  
René Caspart ◽  
Max Fischer ◽  
Oliver Freyermuth ◽  
Manuel Giffels ◽  
...  

The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT. In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource providers like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers.


Sign in / Sign up

Export Citation Format

Share Document