Exploring Object Stores for High-Energy Physics Data Storage

Over the last two decades, ROOT TTree has been used for storing over one exabyte of High-Energy Physics (HEP) events. The TTree columnar on-disk layout has been proved to be ideal for analyses of HEP data that typically require access to many events, but only a subset of the information stored for each of them. Future colliders, and particularly HL-LHC, will bring an increase of at least one order of magnitude in the volume of generated data. Therefore, the use of modern storage hardware, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes more important. However, TTree was not designed to optimally exploit modern hardware and may become a bottleneck for data retrieval. The ROOT RNTuple I/O system aims at overcoming TTree’s limitations and at providing improved effciency for modern storage systems. In this paper, we extend RNTuple with a backend that uses Intel DAOS as the underlying storage, demonstrating that the RNTuple architecture can accommodate high-performance object stores. From the user perspective, data can be accessed with minimal changes to the code, that is by replacing a filesystem path by a DAOS URI. Our performance evaluation shows that the new backend can be used for realistic analyses, while outperforming the compatibility solution provided by the DAOS project.

Download Full-text

Beyond HEP: Photon and accelerator science computing infrastructure at DESY

EPJ Web of Conferences ◽

10.1051/epjconf/202024507036 ◽

2020 ◽

Vol 245 ◽

pp. 07036

Author(s):

Christoph Beyer ◽

Stefan Bujack ◽

Stefan Dietrich ◽

Thomas Finnern ◽

Martin Flemming ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Resource Provisioning ◽

Small Scale ◽

Online Processing ◽

Offline Processing ◽

National Analysis ◽

Energy Physics

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.

Download Full-text

Striped Data Analysis Framework

EPJ Web of Conferences ◽

10.1051/epjconf/202024506042 ◽

2020 ◽

Vol 245 ◽

pp. 06042

Author(s):

Oliver Gutsche ◽

Igor Mandrichenko

Keyword(s):

Data Analysis ◽

Data Storage ◽

High Energy Physics ◽

Data Access ◽

High Energy ◽

Data Representation ◽

General Idea ◽

Common Data Model ◽

Local File ◽

Energy Physics

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.

Download Full-text

High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science

Advances in High Energy Physics ◽

10.1155/2014/507690 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Florin Pop

Keyword(s):

Monte Carlo ◽

Big Data ◽

Numerical Methods ◽

High Performance ◽

Data Science ◽

Experimental Validation ◽

High Energy Physics ◽

High Energy ◽

Performance Computing ◽

Energy Physics

Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing. The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.

Download Full-text

Comparison of data storage and analysis throughput in the light of high energy physics experiment MACE

Astronomy and Computing ◽

10.1016/j.ascom.2020.100409 ◽

2020 ◽

Vol 33 ◽

pp. 100409

Author(s):

D. Sarkar ◽

Mahesh P. ◽

Padmini S. ◽

N. Chouhan ◽

C. Borwankar ◽

...

Keyword(s):

Data Storage ◽

High Energy Physics ◽

High Energy ◽

Physics Experiment ◽

Energy Physics

Download Full-text

High energy physics from high performance computing

Journal of Physics Conference Series ◽

10.1088/1742-6596/180/1/012066 ◽

2009 ◽

Vol 180 ◽

pp. 012066 ◽

Cited By ~ 3

Author(s):

T Blum

Keyword(s):

High Performance Computing ◽

High Performance ◽

High Energy Physics ◽

High Energy ◽

Performance Computing ◽

Energy Physics

Download Full-text

Secure and efficient high-performance PROOF-based cluster system for high-energy physics

The Journal of Supercomputing ◽

10.1007/s11227-014-1146-5 ◽

2014 ◽

Vol 70 (1) ◽

pp. 166-176 ◽

Cited By ~ 4

Author(s):

Sang Un Ahn ◽

Il Yeon Yeo ◽

Sang Oh Park

Keyword(s):

High Performance ◽

High Energy Physics ◽

High Energy ◽

Cluster System ◽

Energy Physics

Download Full-text

Data storage and management for high-energy physics: technical and sociological requirements

Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems ◽

10.1109/mass.1994.373023 ◽

2002 ◽

Author(s):

R.P. Mount

Keyword(s):

Data Storage ◽

High Energy Physics ◽

High Energy ◽

Energy Physics

Download Full-text

HTS Accelerator Magnets and Conductor development in Europe

10.20944/preprints202012.0278.v1 ◽

2020 ◽

Author(s):

Lucio Rossi ◽

Carmine Senatore

Keyword(s):

European Community ◽

High Field ◽

High Performance ◽

High Energy Physics ◽

Background Field ◽

High Energy ◽

Race Track ◽

Accelerator Magnets ◽

Very High ◽

Energy Physics

In view of the preparation for a post-LHC collider, the high-energy physics (HEP) community started from 2010 to discuss various options, including the use of HTS for very high field dipoles. Therefore, a small program was set in Europe aiming at exploring the possibility of using HTS for accelerator quality magnets. Based on various EU funded programs, though at modest levels, has enabled the European community of accelerator magnets to start getting experience in HTS and addressing a few issues. The program was based on use of REBCO tapes to form 10 kA Roebel cables, to be used to wind small dipoles of 30-40 mm aperture in the 5 T range. The dipoles are designed to be later inserted in a background dipole field (in Nb3Sn), to reach eventually a field level in the 16-20 T range, beyond the reach of LTS. The program is currently underway: more than 1 km tape of high performance (Je &gt; 500 A/mm2 at 20 T, 4.2 K has been manufactured and characterized, various 30 m long Roebel cables have been assembled and validated up to 13 kA, a few dipoles have been wound and tested, reaching at present 4.5 T in stand-alone (while a dipole made from race track coils with no-bore exceeded 5 T using stacked tape cable) and a test in a background field is being organized.

Download Full-text

Hardware Implementation Study of Particle Tracking Algorithm on FPGAs

Electronics ◽

10.3390/electronics10202546 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2546

Author(s):

Alessandro Gabrielli ◽

Fabrizio Alfonsi ◽

Alberto Annovi ◽

Alessandra Camplani ◽

Alessandro Cerri

Keyword(s):

High Speed ◽

High Performance ◽

Hardware Implementation ◽

High Energy Physics ◽

Medical Image Analysis ◽

Computation Time ◽

High Energy ◽

Spatial Transformation ◽

Flip Flop ◽

Energy Physics

In recent years, the technological node used to implement FPGA devices has led to very high performance in terms of computational capacity and in some applications these can be much more efficient than CPUs or other programmable devices. The clock managers and the enormous versatility of communication technology through digital transceivers place FPGAs in a prime position for many applications. For example, from real-time medical image analysis to high energy physics particle trajectory recognition, where computation time can be crucial, the benefits of using frontier FPGA capabilities are even more relevant. This paper shows an example of FPGA hardware implementation, via a firmware design, of a complex analytical algorithm: The Hough transform. This is a mathematical spatial transformation used here to facilitate on-the-fly recognition of the trajectories of ionising particles as they pass through the so-called tracker apparatus within high-energy physics detectors. This is a general study to demonstrate that this technique is not only implementable via software-based systems, but can also be exploited using consumer hardware devices. In this context the latter are known as hardware accelerators. In this article in particular, the Xilinx UltraScale+ FPGA is investigated as it belongs to one of the frontier family devices on the market. These FPGAs make it possible to reach high-speed clock frequencies at the expense of acceptable energy consumption thanks to the 14 nm technological node used by the vendor. These devices feature a huge number of gates, high-bandwidth memories, transceivers and other high-performance electronics in a single chip, enabling the design of large, complex and scalable architectures. In particular the Xilinx Alveo U250 has been investigated. A target frequency of 250 MHz and a total latency of 30 clock periods have been achieved using only the 17 ÷ 53% of LUTs, the 8 ÷ 12% of DSPs, the 1 ÷ 3% of Block Rams and a Flip Flop occupancy range of 9 ÷ 28%.

Download Full-text

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201921408004 ◽

2019 ◽

Vol 214 ◽

pp. 08004 ◽

Cited By ~ 1

Author(s):

R. Du ◽

J. Shi ◽

J. Zou ◽

X. Jiang ◽

Z. Sun ◽

...

Keyword(s):

Resource Utilization ◽

High Performance ◽

High Energy Physics ◽

Job Scheduling ◽

High Energy ◽

The Other ◽

Workload Manager ◽

High Degree ◽

Performance Computing ◽

Energy Physics

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.

Download Full-text