computing grid Latest Research Papers

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, the Operational Intelligence project aims at increasing the level of automation in computing operations and reducing human interventions. The distributed computing systems currently deployed by the LHC experiments have proven to be mature and capable of meeting the experimental goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters, and operational teams is needed to efficiently manage such heterogenous infrastructures. Under the scope of the Operational Intelligence project, experts from several areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. In this community study contribution, we report on the development of a suite of operational intelligence services to cover various use cases: workload management, data management, and site operations.

Download Full-text

DNS-embedded service endpoint registry for distributed e-Infrastructures

Cluster Computing ◽

10.1007/s10586-021-03455-5 ◽

2021 ◽

Author(s):

Andrii Salnikov ◽

Balázs Kónya

Keyword(s):

Data Model ◽

Service Discovery ◽

Record Management ◽

Big Science ◽

Information Discovery ◽

Data Discovery ◽

Distributed Information ◽

Computing Grid ◽

Client Side ◽

Science Service

AbstractDistributed e-Infrastructure is a key component of modern BIG Science. Service discovery in e-Science environments, such as Worldwide LHC Computing Grid (WLCG), is a crucial functionality that relies on service registry. In this paper we re-formulate the requirements for the service endpoint registry based on our more than 10 years experience with many systems designed or used within the WLCG e-Infrastructure. To satisfy those requirements the paper proposes a novel idea to use the existing well-established Domain Name System (DNS) infrastructure together with a suitable data model as a service endpoint registry. The presented ARC Hierarchical Endpoints Registry (ARCHERY) system consists of a minimalistic data model representing services and their endpoints within e-Infrastructures, a rendering of the data model embedded into DNS-records, a lightweight software layer for DNS-record management and client-side data discovery. Our approach for the ARCHERY registry required minimal software development and inherits all the benefits of one of the most reliable distributed information discovery source of the internet, the DNS infrastructure. In particular, deployment, management and operation of ARCHERY is fully relying on DNS. Results of ARCHERY deployment use-cases are provided together with performance analysis.

Download Full-text

Controlling the Injection of Best-Effort Tasks to Harvest Idle Computing Grid Resources

10.1109/icstcc52150.2021.9607292 ◽

2021 ◽

Author(s):

Quentin Guilloteau ◽

Olivier Richard ◽

Bogdan Robu ◽

Eric Rutten

Keyword(s):

Best Effort ◽

Computing Grid ◽

Grid Resources

Download Full-text

MLaaS4HEP: Machine Learning as a Service for HEP

Computing and Software for Big Science ◽

10.1007/s41781-021-00061-3 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Valentin Kuznetsov ◽

Luca Giommi ◽

Daniele Bonacorsi

Keyword(s):

Machine Learning ◽

Large Scale ◽

High Energy Physics ◽

Modular Design ◽

High Energy ◽

Easy Access ◽

Data Streaming ◽

Custom Made ◽

Computing Grid ◽

Data Inference

AbstractMachine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the $$t{\bar{t}}$$ t t ¯ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.

Download Full-text

Dynamo: Handling Scientific Data Across Sites and Storage Media

Computing and Software for Big Science ◽

10.1007/s41781-021-00054-2 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Yutaro Iiyama ◽

Benedikt Maier ◽

Daniel Abercrombie ◽

Maxim Goncharov ◽

Christoph Paus

Keyword(s):

Data Management ◽

Hadron Collider ◽

Scientific Data ◽

Small Scale ◽

Core System ◽

Scientific Data Management ◽

Storage Media ◽

Single Location ◽

Wide Range ◽

Computing Grid

AbstractDynamo is a full-stack software solution for scientific data management. Dynamo’s architecture is modular, extensible, and customizable, making the software suitable for managing data in a wide range of installation scales, from a few terabytes stored at a single location to hundreds of petabytes distributed across a worldwide computing grid. This article documents the core system design of Dynamo and describes the applications that implement various data management tasks. A brief report is also given on the operational experiences of the system at the CMS experiment at the CERN Large Hadron Collider and at a small-scale analysis facility.

Download Full-text

Seamless integration of commercial Clouds with ATLAS Distributed Computing

EPJ Web of Conferences ◽

10.1051/epjconf/202125102005 ◽

2021 ◽

Vol 251 ◽

pp. 02005

Author(s):

Fernando Barreiro Megino ◽

Singh Bawa Harinder ◽

Kaushik De ◽

Johannes Elmsheuser ◽

Alexei Klimentov ◽

...

Keyword(s):

Distributed Computing ◽

Hadron Collider ◽

Added Value ◽

Data Management System ◽

Seamless Integration ◽

New Developments ◽

Computing Services ◽

Workflow System ◽

Cloud Computing Services ◽

Computing Grid

The CERN ATLAS Experiment successfully uses a worldwide distributed computing Grid infrastructure to support its physics programme at the Large Hadron Collider (LHC). The Grid workflow system PanDA routinely manages up to 700,000 concurrently running production and analysis jobs to process simulation and detector data. In total more than 500 PB of data are distributed over more than 150 sites in the WLCG and handled by the ATLAS data management system Rucio. To prepare for the ever growing data rate in future LHC runs new developments are underway to embrace industry accepted protocols and technologies, and utilize opportunistic resources in a standard way. This paper reviews how the Google and Amazon Cloud computing services have been seamlessly integrated as a Grid site within PanDA and Rucio. Performance and brief cost evaluations will be discussed. Such setups could offer advanced Cloud tool-sets and provide added value for analysis facilities that are under discussions for LHC Run-4.

Download Full-text

Incentive-Based Scheduling for Green Computational Grid

Role of IoT in Green Energy Systems - Advances in Environmental Engineering and Green Technologies ◽

10.4018/978-1-7998-6709-8.ch012 ◽

2021 ◽

pp. 272-293

Author(s):

Low Tang Jung ◽

Ahmed Abba Haruna

Keyword(s):

High Performance ◽

Data Centers ◽

Large Data ◽

Computing System ◽

Green Computing ◽

Grid Environment ◽

Computing Grid ◽

Main Component ◽

It Operations ◽

Over Time

In the computing grid environment, jobs scheduling is fundamentally the process of allocating computing jobs with choices relevant to the available resources. As the scale of grid computing system grows in size over time, exponential increase in energy consumption is foreseen. As such, large data centers (DC) are embarking on green computing initiatives to address the IT operations impact on the environment. The main component within a computing system consuming the most electricity and generating the most heat is the microprocessor. The heat generated by these high-performance microprocessors is emitting high CO2 footprint. Therefore, jobs scheduling with thermal considerations (thermal-aware) to the microprocessors is important in DC grid operations. An approach for jobs scheduling is proposed in this chapter for reducing electricity usage (green computing) in DC grid. This approach is the outcome of the R&D works based on the DC grid environment in Universiti Teknologi PETRONAS, Malaysia.

Download Full-text

FTS3: Data Movement Service in containers deployed in OKD

EPJ Web of Conferences ◽

10.1051/epjconf/202125102058 ◽

2021 ◽

Vol 251 ◽

pp. 02058

Author(s):

Lorena Lobato Pardavila ◽

Burt Holzman ◽

Edward Karavakis ◽

Lincoln Bryant ◽

Steven Timm

Keyword(s):

File Transfer ◽

Data Movement ◽

Transfer Data ◽

Computing Grid ◽

Certificate Management

The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider’s data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy. In this article we summarize our experience building docker images based on work from the SLATE project (slateci.io) and deployed in OKD, the community distribution of Red Hat OpenShift. Additionally, we discuss our method of certificate management and maintenance utilizing Kubernetes CronJobs. Finally, we also report on the configuration currently running at Fermilab.

Download Full-text

WLCG Token Usage and Discovery

EPJ Web of Conferences ◽

10.1051/epjconf/202125102028 ◽

2021 ◽

Vol 251 ◽

pp. 02028

Author(s):

Brian Bockelman ◽

Andrea Ceccanti ◽

Thomas Dack ◽

Dave Dykstra ◽

Maarten Litmaath ◽

...

Keyword(s):

Data Transfer ◽

Command Line ◽

Computing Grid ◽

Local Storage

Since 2017, the Worldwide LHC Computing Grid (WLCG) has been working towards enabling token based authentication and authorisation throughout its entire middleware stack. Following the publication of the WLCG Common JSON Web Token (JWT) Schema v1.0 [1] in 2019, middleware developers have been able to enhance their services to consume and validate the JWT-based [2] OAuth2.0 [3] tokens and process the authorization information they convey. Complex scenarios, involving multiple delegation steps and command line flows, are a key challenge to be addressed in order for the system to be fully operational. This paper expands on the anticipated token based workflows, with a particular focus on local storage of tokens and their discovery by services. The authors include a walk-through of this token flow in the RUCIO managed data-transfer scenario, including delegation to FTS and authorised access to storage elements. Next steps are presented, including the current target of submitting production jobs authorised by Tokens within 2021.

Download Full-text

Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure

EPJ Web of Conferences ◽

10.1051/epjconf/202125102039 ◽

2021 ◽

Vol 251 ◽

pp. 02039

Author(s):

Michael Böhler ◽

René Caspart ◽

Max Fischer ◽

Oliver Freyermuth ◽

Manuel Giffels ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

World Wide ◽

The Core ◽

The World ◽

Virtualization Technology ◽

Core Components ◽

Computing Grid ◽

New Challenges ◽

Performance Computing

The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT. In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource providers like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers.

Download Full-text

computing grid
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence

DNS-embedded service endpoint registry for distributed e-Infrastructures

Controlling the Injection of Best-Effort Tasks to Harvest Idle Computing Grid Resources

MLaaS4HEP: Machine Learning as a Service for HEP

Dynamo: Handling Scientific Data Across Sites and Storage Media

Seamless integration of commercial Clouds with ATLAS Distributed Computing

Incentive-Based Scheduling for Green Computational Grid

FTS3: Data Movement Service in containers deployed in OKD

WLCG Token Usage and Discovery

Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure

Export Citation Format

computing gridRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence

DNS-embedded service endpoint registry for distributed e-Infrastructures

Controlling the Injection of Best-Effort Tasks to Harvest Idle Computing Grid Resources

MLaaS4HEP: Machine Learning as a Service for HEP

Dynamo: Handling Scientific Data Across Sites and Storage Media

Seamless integration of commercial Clouds with ATLAS Distributed Computing

Incentive-Based Scheduling for Green Computational Grid

FTS3: Data Movement Service in containers deployed in OKD

WLCG Token Usage and Discovery

Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure

computing grid
Recently Published Documents