MLaaS4HEP: Machine Learning as a Service for HEP

AbstractMachine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the $$t{\bar{t}}$$ t t ¯ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.

Download Full-text

Performance of an Operating High Energy Physics Data Grid: DØSAR-Grid

International Journal of Modern Physics A ◽

10.1142/s0217751x05027850 ◽

2005 ◽

Vol 20 (16) ◽

pp. 3874-3876 ◽

Cited By ~ 2

Author(s):

B. Abbott ◽

P. Baringer ◽

T. Bolton ◽

Z. Greenwood ◽

E. Gregores ◽

...

Keyword(s):

Geographical Distribution ◽

High Energy Physics ◽

High Energy ◽

End Users ◽

Southern Analysis ◽

Data Analyses ◽

Use Of Technology ◽

Computing Grid ◽

Physics Experiments ◽

Energy Physics

The DØ experiment at Fermilab's Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of DØ collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will exacerbate in future high energy physics experiments, like the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users in DØ by developing a grid in the DØ Southern Analysis Region (DØSAR), DØSAR-Grid, using all available resources within it and a home-grown local task manager, McFarm. We will present the architecture in which the DØSAR-Grid is implemented, the use of technology and the functionality of the grid, and the experience from operating the grid in simulation, reprocessing and data analyses for a currently running HEP experiment.

Download Full-text

Clustering error messages produced by distributed computing infrastructure during the processing of high energy physics data

International Journal of Modern Physics A ◽

10.1142/s0217751x21500706 ◽

2021 ◽

Vol 36 (10) ◽

pp. 2150070

Author(s):

Maria Grigorieva ◽

Dmitry Grin

Keyword(s):

Distributed Computing ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Machine Learning Algorithms ◽

Service Failures ◽

Fault Handling ◽

Scientific Experiments ◽

Computing Centers ◽

Distributed Computing Infrastructures

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing jobs per day. ATLAS — the largest experiment at the LHC — creates an enormous flow of data which has to be recorded and analyzed by a complex heterogeneous and distributed computing environment. Statistically, about 10–12% of computing jobs end with a failure: network faults, service failures, authorization failures, and other error conditions trigger error messages which provide detailed information about the issue, which can be used for diagnosis and proactive fault handling. However, this analysis is complicated by the sheer scale of textual log data, and often exacerbated by the lack of a well-defined structure: human experts have to interpret the detected messages and create parsing rules manually, which is time-consuming and does not allow identifying previously unknown error conditions without further human intervention. This paper is dedicated to the description of a pipeline of methods for the unsupervised clustering of multi-source error messages. The pipeline is data-driven, based on machine learning algorithms, and executed fully automatically, allowing categorizing error messages according to textual patterns and meaning.

Download Full-text

Deep Learning and Its Application to LHC Physics

Annual Review of Nuclear and Particle Science ◽

10.1146/annurev-nucl-101917-021019 ◽

2018 ◽

Vol 68 (1) ◽

pp. 161-181 ◽

Cited By ~ 61

Author(s):

Dan Guest ◽

Kyle Cranmer ◽

Daniel Whiteson

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Energy Physics ◽

High Energy ◽

Learning Tools ◽

Future Prospects ◽

Higher Dimensional ◽

Lhc Physics ◽

Core Concepts ◽

Energy Physics

Machine learning has played an important role in the analysis of high-energy physics data for decades. The emergence of deep learning in 2012 allowed for machine learning tools which could adeptly handle higher-dimensional and more complex problems than previously feasible. This review is aimed at the reader who is familiar with high-energy physics but not machine learning. The connections between machine learning and high-energy physics data analysis are explored, followed by an introduction to the core concepts of neural networks, examples of the key results demonstrating the power of deep learning for analysis of LHC data, and discussion of future prospects and concerns.

Download Full-text

High-energy physics strategies and future large-scale projects

Nuclear Instruments and Methods in Physics Research Section B Beam Interactions with Materials and Atoms ◽

10.1016/j.nimb.2015.03.090 ◽

2015 ◽

Vol 355 ◽

pp. 4-10 ◽

Cited By ~ 6

Author(s):

F. Zimmermann

Keyword(s):

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Energy Physics

Download Full-text

Beyond HEP: Photon and accelerator science computing infrastructure at DESY

EPJ Web of Conferences ◽

10.1051/epjconf/202024507036 ◽

2020 ◽

Vol 245 ◽

pp. 07036

Author(s):

Christoph Beyer ◽

Stefan Bujack ◽

Stefan Dietrich ◽

Thomas Finnern ◽

Martin Flemming ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Resource Provisioning ◽

Small Scale ◽

Online Processing ◽

Offline Processing ◽

National Analysis ◽

Energy Physics

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.

Download Full-text

Large Scale Computing and Storage Requirements for High Energy Physics

10.2172/1003817 ◽

2010 ◽

Author(s):

Richard A. Gerber ◽

Harvey Wasserman

Keyword(s):

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Large Scale Computing ◽

And Storage ◽

Energy Physics

Download Full-text

THE MAGNETIZED UNIVERSE

International Journal of Modern Physics D ◽

10.1142/s0218271804004530 ◽

2004 ◽

Vol 13 (03) ◽

pp. 391-502 ◽

Cited By ~ 275

Author(s):

MASSIMO GIOVANNINI

Keyword(s):

Magnetic Fields ◽

Experimental Evidence ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Present Review ◽

Energy Physics

Cosmology, high-energy physics and astrophysics are today converging to the study of large scale magnetic fields. While the experimental evidence for the existence of large scale magnetization in galaxies, clusters and super-clusters is rather compelling, the origin of the phenomenon remains puzzling especially in light of the most recent observations. The purpose of the present review is to describe the physical motivations and the open theoretical problems related to the existence of large scale magnetic fields.

Download Full-text

The TrackML high-energy physics tracking challenge on Kaggle

EPJ Web of Conferences ◽

10.1051/epjconf/201921406037 ◽

2019 ◽

Vol 214 ◽

pp. 06037

Author(s):

Moritz Kiehn ◽

Sabrina Amrouche ◽

Paolo Calafiura ◽

Victor Estrade ◽

Steven Farrell ◽

...

Keyword(s):

Machine Learning ◽

High Energy Physics ◽

High Energy ◽

Training Dataset ◽

High Luminosity ◽

Test Dataset ◽

Event Reconstruction ◽

Computer Scientists ◽

Set Up ◽

Energy Physics

The High-Luminosity LHC (HL-LHC) is expected to reach unprecedented collision intensities, which in turn will greatly increase the complexity of tracking within the event reconstruction. To reach out to computer science specialists, a tracking machine learning challenge (TrackML) was set up on Kaggle by a team of ATLAS, CMS, and LHCb physicists tracking experts and computer scientists building on the experience of the successful Higgs Machine Learning challenge in 2014. A training dataset based on a simulation of a generic HL-LHC experiment tracker has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track.The participants to the challenge should find the tracks in the test dataset, which means building the list of 3D points belonging to each track.The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. A metric reflecting the accuracy of a model at finding the proper associations that matter most to physics analysis will allow to select good candidates to augment or replace existing algorithms.

Download Full-text

Computing in High Energy Physics

International Journal of Modern Physics A ◽

10.1142/s0217751x0502570x ◽

2005 ◽

Vol 20 (14) ◽

pp. 3021-3032

Author(s):

Ian M. Fisk

Keyword(s):

Distributed Computing ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Next Generation ◽

Physical Infrastructure ◽

Commodity Computing ◽

Physics Experiments ◽

Insight Into ◽

Energy Physics

In this review, the computing challenges facing the current and next generation of high energy physics experiments will be discussed. High energy physics computing represents an interesting infrastructure challenge as the use of large-scale commodity computing clusters has increased. The causes and ramifications of these infrastructure challenges will be outlined. Increasing requirements, limited physical infrastructure at computing facilities, and limited budgets have driven many experiments to deploy distributed computing solutions to meet the growing computing needs for analysis reconstruction, and simulation. The current generation of experiments have developed and integrated a number of solutions to facilitate distributed computing. The current work of the running experiments gives an insight into the challenges that will be faced by the next generation of experiments and the infrastructure that will be needed.

Download Full-text

Evaluation of a new visualization and analytics solution for slow control data for large scale experiments

EPJ Web of Conferences ◽

10.1051/epjconf/202024507001 ◽

2020 ◽

Vol 245 ◽

pp. 07001

Author(s):

Laura Sargsyan ◽

Filipe Martins

Keyword(s):

Time Series ◽

Control System ◽

Distribution System ◽

Large Scale ◽

High Energy Physics ◽

Time Series Data ◽

High Energy ◽

Series Data ◽

Test Version ◽

Long Time

Large experiments in high energy physics require efficient and scalable monitoring solutions to digest data of the detector control system. Plotting multiple graphs in the slow control system and extracting historical data for long time periods are resource intensive tasks. The proposed solution leverages the new virtualization, data analytics and visualization technologies such as InfluxDB time-series database for faster access large scale data, Grafana to visualize time-series data and an OpenShift container platform to automate build, deployment, and management of application. The monitoring service runs separately from the control system thus reduces a workload on the control system computing resources. As an example, a test version of the new monitoring was applied to the ATLAS Tile Calorimeter using the CERN Cloud Process as a Service platform. Many dashboards in Grafana have been created to monitor and analyse behaviour of the High Voltage distribution system. They visualize not only values measured by the control system, but also run information and analytics data (difference, deviation, etc.). The new monitoring with a feature-rich visualization, filtering possibilities and analytics tools allows to extend detector control and monitoring capabilities and can help experts working on large scale experiments.

Download Full-text