scholarly journals MLaaS4HEP: Machine Learning as a Service for HEP

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Valentin Kuznetsov ◽  
Luca Giommi ◽  
Daniele Bonacorsi

AbstractMachine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the $$t{\bar{t}}$$ t t ¯ Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.

2005 ◽  
Vol 20 (16) ◽  
pp. 3874-3876 ◽  
Author(s):  
B. Abbott ◽  
P. Baringer ◽  
T. Bolton ◽  
Z. Greenwood ◽  
E. Gregores ◽  
...  

The DØ experiment at Fermilab's Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of DØ collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will exacerbate in future high energy physics experiments, like the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users in DØ by developing a grid in the DØ Southern Analysis Region (DØSAR), DØSAR-Grid, using all available resources within it and a home-grown local task manager, McFarm. We will present the architecture in which the DØSAR-Grid is implemented, the use of technology and the functionality of the grid, and the experience from operating the grid in simulation, reprocessing and data analyses for a currently running HEP experiment.


2021 ◽  
Vol 36 (10) ◽  
pp. 2150070
Author(s):  
Maria Grigorieva ◽  
Dmitry Grin

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing jobs per day. ATLAS — the largest experiment at the LHC — creates an enormous flow of data which has to be recorded and analyzed by a complex heterogeneous and distributed computing environment. Statistically, about 10–12% of computing jobs end with a failure: network faults, service failures, authorization failures, and other error conditions trigger error messages which provide detailed information about the issue, which can be used for diagnosis and proactive fault handling. However, this analysis is complicated by the sheer scale of textual log data, and often exacerbated by the lack of a well-defined structure: human experts have to interpret the detected messages and create parsing rules manually, which is time-consuming and does not allow identifying previously unknown error conditions without further human intervention. This paper is dedicated to the description of a pipeline of methods for the unsupervised clustering of multi-source error messages. The pipeline is data-driven, based on machine learning algorithms, and executed fully automatically, allowing categorizing error messages according to textual patterns and meaning.


2018 ◽  
Vol 68 (1) ◽  
pp. 161-181 ◽  
Author(s):  
Dan Guest ◽  
Kyle Cranmer ◽  
Daniel Whiteson

Machine learning has played an important role in the analysis of high-energy physics data for decades. The emergence of deep learning in 2012 allowed for machine learning tools which could adeptly handle higher-dimensional and more complex problems than previously feasible. This review is aimed at the reader who is familiar with high-energy physics but not machine learning. The connections between machine learning and high-energy physics data analysis are explored, followed by an introduction to the core concepts of neural networks, examples of the key results demonstrating the power of deep learning for analysis of LHC data, and discussion of future prospects and concerns.


2020 ◽  
Vol 245 ◽  
pp. 07036
Author(s):  
Christoph Beyer ◽  
Stefan Bujack ◽  
Stefan Dietrich ◽  
Thomas Finnern ◽  
Martin Flemming ◽  
...  

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.


2004 ◽  
Vol 13 (03) ◽  
pp. 391-502 ◽  
Author(s):  
MASSIMO GIOVANNINI

Cosmology, high-energy physics and astrophysics are today converging to the study of large scale magnetic fields. While the experimental evidence for the existence of large scale magnetization in galaxies, clusters and super-clusters is rather compelling, the origin of the phenomenon remains puzzling especially in light of the most recent observations. The purpose of the present review is to describe the physical motivations and the open theoretical problems related to the existence of large scale magnetic fields.


2019 ◽  
Vol 214 ◽  
pp. 06037
Author(s):  
Moritz Kiehn ◽  
Sabrina Amrouche ◽  
Paolo Calafiura ◽  
Victor Estrade ◽  
Steven Farrell ◽  
...  

The High-Luminosity LHC (HL-LHC) is expected to reach unprecedented collision intensities, which in turn will greatly increase the complexity of tracking within the event reconstruction. To reach out to computer science specialists, a tracking machine learning challenge (TrackML) was set up on Kaggle by a team of ATLAS, CMS, and LHCb physicists tracking experts and computer scientists building on the experience of the successful Higgs Machine Learning challenge in 2014. A training dataset based on a simulation of a generic HL-LHC experiment tracker has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track.The participants to the challenge should find the tracks in the test dataset, which means building the list of 3D points belonging to each track.The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. A metric reflecting the accuracy of a model at finding the proper associations that matter most to physics analysis will allow to select good candidates to augment or replace existing algorithms.


2005 ◽  
Vol 20 (14) ◽  
pp. 3021-3032
Author(s):  
Ian M. Fisk

In this review, the computing challenges facing the current and next generation of high energy physics experiments will be discussed. High energy physics computing represents an interesting infrastructure challenge as the use of large-scale commodity computing clusters has increased. The causes and ramifications of these infrastructure challenges will be outlined. Increasing requirements, limited physical infrastructure at computing facilities, and limited budgets have driven many experiments to deploy distributed computing solutions to meet the growing computing needs for analysis reconstruction, and simulation. The current generation of experiments have developed and integrated a number of solutions to facilitate distributed computing. The current work of the running experiments gives an insight into the challenges that will be faced by the next generation of experiments and the infrastructure that will be needed.


2020 ◽  
Vol 245 ◽  
pp. 07001
Author(s):  
Laura Sargsyan ◽  
Filipe Martins

Large experiments in high energy physics require efficient and scalable monitoring solutions to digest data of the detector control system. Plotting multiple graphs in the slow control system and extracting historical data for long time periods are resource intensive tasks. The proposed solution leverages the new virtualization, data analytics and visualization technologies such as InfluxDB time-series database for faster access large scale data, Grafana to visualize time-series data and an OpenShift container platform to automate build, deployment, and management of application. The monitoring service runs separately from the control system thus reduces a workload on the control system computing resources. As an example, a test version of the new monitoring was applied to the ATLAS Tile Calorimeter using the CERN Cloud Process as a Service platform. Many dashboards in Grafana have been created to monitor and analyse behaviour of the High Voltage distribution system. They visualize not only values measured by the control system, but also run information and analytics data (difference, deviation, etc.). The new monitoring with a feature-rich visualization, filtering possibilities and analytics tools allows to extend detector control and monitoring capabilities and can help experts working on large scale experiments.


Sign in / Sign up

Export Citation Format

Share Document