China-EU scientific cooperation on JUNO distributed computing

The Jiangmen Underground Neutrino Observatory (JUNO) is an underground 20 kton liquid scintillator detector being built in the south of China. Targeting an unprecedented relative energy resolution of 3% at 1 MeV, JUNO will be able to study neutrino oscillation phenomena and determine neutrino mass ordering with a statistical significance of 3-4 sigma within six years running time. These physics challenges are addressed by a large Collaboration localized in three continents. In this context, key to the success of JUNO will be the realization of a distributed computing infrastructure to fulfill foreseen computing needs. Computing infrastructure development is performed jointly by the Institute for High Energy Physics (IHEP) (part of Chinese Academy of Sciences (CAS)), and a number of Italian, French and Russian data centers, already part of WLCG (Worldwide LHC Computing Grid). Upon its establishment, JUNO is expected to deliver not less than 2 PB of data per year, to be stored in the data centers throughout China and Europe. Data analysis activities will be also carried out in cooperation. This contribution is meant to report on China-EU cooperation to design and build together the JUNO computing infrastructure and to describe its main characteristics and requirements.

Download Full-text

Exascale Data Processing in Heterogeneous Distributed Computing Infrastructure for Applications in High Energy Physics

Physics of Particles and Nuclei ◽

10.1134/s1063779620060052 ◽

2020 ◽

Vol 51 (6) ◽

pp. 995-1068

Author(s):

A. A. Klimentov

Keyword(s):

Distributed Computing ◽

Data Processing ◽

High Energy Physics ◽

High Energy ◽

Heterogeneous Distributed Computing ◽

Distributed Computing Infrastructure ◽

Computing Infrastructure ◽

Energy Physics

Download Full-text

Clustering error messages produced by distributed computing infrastructure during the processing of high energy physics data

International Journal of Modern Physics A ◽

10.1142/s0217751x21500706 ◽

2021 ◽

Vol 36 (10) ◽

pp. 2150070

Author(s):

Maria Grigorieva ◽

Dmitry Grin

Keyword(s):

Distributed Computing ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Machine Learning Algorithms ◽

Service Failures ◽

Fault Handling ◽

Scientific Experiments ◽

Computing Centers ◽

Distributed Computing Infrastructures

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing jobs per day. ATLAS — the largest experiment at the LHC — creates an enormous flow of data which has to be recorded and analyzed by a complex heterogeneous and distributed computing environment. Statistically, about 10–12% of computing jobs end with a failure: network faults, service failures, authorization failures, and other error conditions trigger error messages which provide detailed information about the issue, which can be used for diagnosis and proactive fault handling. However, this analysis is complicated by the sheer scale of textual log data, and often exacerbated by the lack of a well-defined structure: human experts have to interpret the detected messages and create parsing rules manually, which is time-consuming and does not allow identifying previously unknown error conditions without further human intervention. This paper is dedicated to the description of a pipeline of methods for the unsupervised clustering of multi-source error messages. The pipeline is data-driven, based on machine learning algorithms, and executed fully automatically, allowing categorizing error messages according to textual patterns and meaning.

Download Full-text

Computing in High Energy Physics

International Journal of Modern Physics A ◽

10.1142/s0217751x0502570x ◽

2005 ◽

Vol 20 (14) ◽

pp. 3021-3032

Author(s):

Ian M. Fisk

Keyword(s):

Distributed Computing ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Next Generation ◽

Physical Infrastructure ◽

Commodity Computing ◽

Physics Experiments ◽

Insight Into ◽

Energy Physics

In this review, the computing challenges facing the current and next generation of high energy physics experiments will be discussed. High energy physics computing represents an interesting infrastructure challenge as the use of large-scale commodity computing clusters has increased. The causes and ramifications of these infrastructure challenges will be outlined. Increasing requirements, limited physical infrastructure at computing facilities, and limited budgets have driven many experiments to deploy distributed computing solutions to meet the growing computing needs for analysis reconstruction, and simulation. The current generation of experiments have developed and integrated a number of solutions to facilitate distributed computing. The current work of the running experiments gives an insight into the challenges that will be faced by the next generation of experiments and the infrastructure that will be needed.

Download Full-text

Addressing Scalability with Message Queues: Architecture and Use Cases for DIRAC Interware

EPJ Web of Conferences ◽

10.1051/epjconf/201921403018 ◽

2019 ◽

Vol 214 ◽

pp. 03018

Author(s):

Wojciech Krzemien ◽

Federico Stagni ◽

Christophe Haen ◽

Zoltan Mathe ◽

Andrew McNab ◽

...

Keyword(s):

Distributed Computing ◽

High Energy Physics ◽

Linear Collider ◽

High Energy ◽

General Purpose ◽

Use Cases ◽

Third Party ◽

Distributed Computing Systems ◽

Message Queue ◽

Communication Scheme

The Message Queue (MQ) architecture is an asynchronous communication scheme that provides an attractive solution for certain scenarios in a distributed computing model. The introduction of MQ as an intermediate component in-between the interacting processes allows to decouple the end-points making the system more flexible and providing high scalability and redundancy. DIRAC is a general-purpose interware software for distributed computing systems, which offers a common interface to a number of heterogeneous providers and guarantees transparent and reliable usage of the resources. The DIRAC platform has been adapted by several scientific projects, including High Energy Physics communities like LHCb, the Linear Collider and Belle2. A Message Queue generic interface has been incorporated into the DIRAC framework to help solving the scalability challenges that must be addressed during LHC Run3, starting in 2021. It allows to use the MQ scheme for a message exchange among the DIRAC components or to communicate with third-party services. Within this contribution we describe the integration of MQ systems with DIRAC and several use cases are shown. Message Queues are foreseen to be used in the pilot logging system, and as a backbone of the DIRAC component logging system and monitoring.

Download Full-text

A new approach to distributed computing in high energy physics

10.1063/1.43354 ◽

1992 ◽

Author(s):

Paul Avery ◽

Chandra Chegireddy ◽

John Brothers ◽

Theodore Johnson ◽

Aric Zion

Keyword(s):

Distributed Computing ◽

High Energy Physics ◽

High Energy ◽

New Approach ◽

Energy Physics

Download Full-text

The JINR distributed computing environment

EPJ Web of Conferences ◽

10.1051/epjconf/201921403009 ◽

2019 ◽

Vol 214 ◽

pp. 03009

Author(s):

Vladimir Korenkov ◽

Andrei Dolbilov ◽

Valeri Mitsyn ◽

Ivan Kashunin ◽

Nikolay Kutovskiy ◽

...

Keyword(s):

Distributed Computing ◽

High Performance ◽

Heterogeneous Computing ◽

High Energy Physics ◽

Nuclear Research ◽

High Energy ◽

Cloud Infrastructure ◽

Computing Environment ◽

Member States ◽

Cloud Resources

Computing in the field of high energy physics requires usage of heterogeneous computing resources and IT, such as grid, high performance computing, cloud computing and big data analytics for data processing and analysis. The core of the distributed computing environment at the Joint Institute for Nuclear Research is the Multifunctional Information and Computing Complex. It includes Tier1 for CMS experiment, Tier2 site for all LHC experiments and other grid non-LHC VOs, such as BIOMED, COMPASS, NICA/MPD, NOvA, STAR and BESIII, as well as cloud and HPC infrastructures. A brief status overview of each component is presented. Particular attention is given to the development of distributed computations performed in collaboration with CERN, BNL, FNAL, FAIR, China, and JINR Member States. One of the directions for the cloud infrastructure is the development of integration methods of various cloud resources of the JINR Member State organizations in order to perform common tasks, and also to distribute a load across integrated resources. We performed cloud resources integration of scientific centers in Armenia, Azerbaijan, Belarus, Kazakhstan and Russia. Extension of the HPC component will be carried through a specialized infrastructure for HPC engineering that is being created at MICC, which makes use of the contact liquid cooling technology implemented by the Russian company JSC "RSC Technologies". Current plans are to further develop MICC as a center for scientific computing within the multidisciplinary research environment of JINR and JINR Member States, and mainly for the NICA mega-science project.

Download Full-text

ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery

Journal of High Energy Physics ◽

10.1007/jhep03(2021)291 ◽

2021 ◽

Vol 2021 (3) ◽

Author(s):

Konstantin T. Matchev ◽

Prasanth Shyamsundar

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

High Energy Physics ◽

Statistical Significance ◽

High Energy ◽

Event Selection ◽

Systematic Uncertainties ◽

Novel Approach ◽

Signal Region ◽

Energy Physics

Abstract We provide a prescription called ThickBrick to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six different performance measures. For analyses where the signal search is performed in the distribution of some event variables, our prescription ensures that only the information complementary to those event variables is used in event selection and categorization. This eliminates a major misalignment with the physics goals of the analysis (maximizing the significance of an excess) that exists in the training of typical ML-based event selectors and categorizers. In addition, this decorrelation of event selectors from the relevant event variables prevents the background distribution from becoming peaked in the signal region as a result of event selection, thereby ameliorating the challenges imposed on signal searches by systematic uncertainties. Our event selectors (categorizers) use the output of machine-learning-based classifiers as input and apply optimal selection cutoffs (categorization thresholds) that are functions of the event variables being analyzed, as opposed to flat cutoffs (thresholds). These optimal cutoffs and thresholds are learned iteratively, using a novel approach with connections to Lloyd’s k-means clustering algorithm. We provide a public, Python implementation of our prescription, also called ThickBrick, along with usage examples.

Download Full-text

Summary of the R&D of 20-inch MCP-PMTs for neutrino detection

Journal of Instrumentation ◽

10.1088/1748-0221/16/11/c11003 ◽

2021 ◽

Vol 16 (11) ◽

pp. C11003

Author(s):

Q. Wu ◽

S. Qian ◽

Y. Cao ◽

G. Huang ◽

M. Jin ◽

...

Keyword(s):

Quantum Efficiency ◽

Mass Production ◽

High Energy Physics ◽

Liquid Scintillator ◽

High Energy ◽

Test System ◽

Night Vision ◽

Mass Hierarchy ◽

Large Area ◽

Neutrino Mass Hierarchy

Abstract The Jiangmen Underground Neutrino Observatory (JUNO) in China aiming to determine the neutrino mass hierarchy is under construction. A new kind of large area microchannel-plate photomultiplier tube (MCP-PMT) was put forward for the JUNO by the researchers in Institute of High Energy Physics (IHEP) in China. After breaking through several core technotical barriers, the 20-inch MCP-PMT prototype with great performance was successfully produced by the MCP-PMT group in China and got 75% PMT orders (15,000 pics) from JUNO. The mass production line and batch test system was completed in North Night Vision Technology Co., Ltd. (NNVT). The performance of the MCP-PMT including the gain, the quantum efficiency, the P/V ratio, the dark count rate and the transit time spread can be batch tested. During the mass production process, the technical progress in the cathode deposition method improved the quantum efficiency of the photocathode from 30% to 35%. The aging behaviour, temperature effect, the after-pulse distribution and the flash signal of the 20-inch MCP-PMT are all detailly studied. By August of 2020, the 15,000 MCP-PMTs, which will be installed as the central liquid scintillator detector of JUNO, have been completed and delivered to Jiangmen. The average QE at 400 nm for the 15,000 pieces of MCP-PMTs is 32%.

Download Full-text

RAPPORT: running scientific high-performance computing applications on the cloud

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2012.0073 ◽

2013 ◽

Vol 371 (1983) ◽

pp. 20120073 ◽

Cited By ~ 4

Author(s):

Jeremy Cohen ◽

Ioannis Filippis ◽

Mark Woodbridge ◽

Daniela Bauer ◽

Neil Chue Hong ◽

...

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

High Performance ◽

High Energy Physics ◽

High Energy ◽

Cloud Infrastructure ◽

Is Research ◽

Performance Computing ◽

Computing Infrastructure ◽

Energy Physics

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.

Download Full-text

High-energy physics and the nature of matter

Uspekhi Fizicheskih Nauk ◽

10.3367/ufnr.0086.196508a.0589 ◽

1965 ◽

Vol 86 (8) ◽

pp. 589-590

Author(s):

E.V. Shpol'skii

Keyword(s):

High Energy Physics ◽

High Energy ◽

Energy Physics ◽

Nature Of Matter

Download Full-text