scholarly journals PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

2021 ◽  
Vol 251 ◽  
pp. 03033
Author(s):  
Micah Groh ◽  
Norman Buchanan ◽  
Derek Doyle ◽  
James B. Kowalkowski ◽  
Marc Paterno ◽  
...  

Modern experiments in high energy physics analyze millions of events recorded in particle detectors to select the events of interest and make measurements of physics parameters. These data can often be stored as tabular data in files with detector information and reconstructed quantities. Most current techniques for event selection in these files lack the scalability needed for high performance computing environments. We describe our work to develop a high energy physics analysis framework suitable for high performance computing. This new framework utilizes modern tools for reading files and implicit data parallelism. Framework users analyze tabular data using standard, easy-to-use data analysis techniques in Python while the framework handles the file manipulations and parallelism without the user needing advanced experience in parallel programming. In future versions, we hope to provide a framework that can be utilized on a personal computer or a high performance computing cluster with little change to the user code.

Author(s):  
Jeremy Cohen ◽  
Ioannis Filippis ◽  
Mark Woodbridge ◽  
Daniela Bauer ◽  
Neil Chue Hong ◽  
...  

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Florin Pop

Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing. The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.


2019 ◽  
Vol 214 ◽  
pp. 08004 ◽  
Author(s):  
R. Du ◽  
J. Shi ◽  
J. Zou ◽  
X. Jiang ◽  
Z. Sun ◽  
...  

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.


2020 ◽  
Vol 245 ◽  
pp. 05012
Author(s):  
Venkitesh Ayyar ◽  
Wahid Bhimji ◽  
Maria Elena Monzani ◽  
Andrew Naylor ◽  
Simon Patton ◽  
...  

High Energy Physics experiments like the LUX-ZEPLIN dark matter experiment face unique challenges when running their computation on High Performance Computing resources. In this paper, we describe some strategies to optimize memory usage of simulation codes with the help of profiling tools. We employed this approach and achieved memory reduction of 10-30%. While this has been performed in the context of the LZ experiment, it has wider applicability to other HEP experimental codes that face these challenges on modern computer architectures.


2020 ◽  
Vol 245 ◽  
pp. 07036
Author(s):  
Christoph Beyer ◽  
Stefan Bujack ◽  
Stefan Dietrich ◽  
Thomas Finnern ◽  
Martin Flemming ◽  
...  

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.


10.14311/1718 ◽  
2013 ◽  
Vol 53 (1) ◽  
Author(s):  
Aleksander Filip Żarnecki ◽  
Lech Wiktor Piotrowski ◽  
Lech Mankiewicz ◽  
Sebastian Małek

The Luiza analysis framework for GLORIA is based on the Marlin package, which was originally developed for data analysis in the new High Energy Physics (HEP) project, International Linear Collider (ILC). The HEP experiments have to deal with enormous amounts of data and distributed data analysis is therefore essential. The Marlin framework concept seems to be well suited for the needs of GLORIA. The idea (and large parts of the code) taken from Marlin is that every computing task is implemented as a processor (module) that analyzes the data stored in an internal data structure, and the additional output is also added to that collection. The advantage of this modular approach is that it keeps things as simple as possible. Each step of the full analysis chain, e.g. from raw images to light curves, can be processed step-by-step, and the output of each step is still self consistent and can be fed in to the next step without any manipulation.


Sign in / Sign up

Export Citation Format

Share Document