MadFlow: towards the automation of Monte Carlo simulation on GPU for particle physics processes

In this proceedings we present MadFlow, a new framework for the automation of Monte Carlo (MC) simulation on graphics processing units (GPU) for particle physics processes. In order to automate MC simulation for a generic number of processes, we design a program which provides to the user the possibility to simulate custom processes through the Mad-Graph5_aMC@NLO framework. The pipeline includes a first stage where the analytic expressions for matrix elements and phase space are generated and exported in a GPU-like format. The simulation is then performed using the VegasFlow and PDFFlow libraries which deploy automatically the full simulation on systems with different hardware acceleration capabilities, such as multi-threading CPU, single-GPU and multi-GPU setups. We show some preliminary results for leading-order simulations on different hardware configurations.

Download Full-text

MadFlow: automating Monte Carlo simulation on GPU for particle physics processes

The European Physical Journal C ◽

10.1140/epjc/s10052-021-09443-8 ◽

2021 ◽

Vol 81 (7) ◽

Author(s):

Stefano Carrazza ◽

Juan Cruz-Martinez ◽

Marco Rossi ◽

Marco Zaro

Keyword(s):

Monte Carlo ◽

Particle Physics ◽

Graphics Processing Units ◽

Hardware Acceleration ◽

Leading Order ◽

Event Simulation ◽

Multiple Processes ◽

Analytic Expressions ◽

Simulation Results ◽

Graphics Processing

AbstractWe present , a first general multi-purpose framework for Monte Carlo (MC) event simulation of particle physics processes designed to take full advantage of hardware accelerators, in particular, graphics processing units (GPUs). The automation process of generating all the required components for MC simulation of a generic physics process and its deployment on hardware accelerator is still a big challenge nowadays. In order to solve this challenge, we design a workflow and code library which provides to the user the possibility to simulate custom processes through the MadGraph5_aMC@NLO framework and a plugin for the generation and exporting of specialized code in a GPU-like format. The exported code includes analytic expressions for matrix elements and phase space. The simulation is performed using the VegasFlow and PDFFlow libraries which deploy automatically the full simulation on systems with different hardware acceleration capabilities, such as multi-threading CPU, single-GPU and multi-GPU setups. The package also provides an asynchronous unweighted events procedure to store simulation results. Crucially, although only Leading Order is automatized, the library provides all ingredients necessary to build full complex Monte Carlo simulators in a modern, extensible and maintainable way. We show simulation results at leading-order for multiple processes on different hardware configurations.

Download Full-text

A hybrid mesh and voxel based Monte Carlo algorithm for accurate and efficient photon transport modeling in complex bio-tissues

10.1101/2020.10.01.322982 ◽

2020 ◽

Author(s):

Shijie Yan ◽

Qianqian Fang

Keyword(s):

Monte Carlo ◽

Graphics Processing Units ◽

Light Transport ◽

Monte Carlo Algorithm ◽

Brain Atlas ◽

Simulation Accuracy ◽

Mc Simulation ◽

Voxel Data ◽

Graphics Processing ◽

Improved Accuracy

AbstractOver the past decade, an increasing body of evidence has suggested that threedimensional (3-D) Monte Carlo (MC) light transport simulations are affected by the inherent limitations and errors of voxel-based domain boundaries. In this work, we specifically address this challenge using a hybrid MC algorithm, namely split-voxel MC or SVMC, that combines both mesh and voxel domain information to greatly improve MC simulation accuracy while remaining highly flexible and efficient in parallel hardware, such as graphics processing units (GPU). We achieve this by applying a marching-cubes algorithm to a pre-segmented domain to extract and encode sub-voxel information of curved surfaces, which is then used to inform ray-tracing computation within boundary voxels. This preservation of curved boundaries in a voxel data structure demonstrates significantly improved accuracy in several benchmarks, including a human brain atlas. The accuracy of the SVMC algorithm is comparable to that of mesh-based MC (MMC), but runs 2x-6x faster and requires only a lightweight preprocessing step. The proposed algorithm has been implemented in our open-source software and is freely available at http://mcx.space.

Download Full-text

Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization

Proceedings of the 47th Design Automation Conference on - DAC '10 ◽

10.1145/1837274.1837474 ◽

2010 ◽

Cited By ~ 8

Author(s):

Vineeth Veetil ◽

Yung-Hsu Chang ◽

Dennis Sylvester ◽

David Blaauw

Keyword(s):

Monte Carlo ◽

Resource Utilization ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Multi-Dimensional, Mesoscopic Monte Carlo Simulations of Inhomogeneous Reaction-Drift-Diffusion Systems on Graphics-Processing Units

PLoS ONE ◽

10.1371/journal.pone.0033384 ◽

2012 ◽

Vol 7 (4) ◽

pp. e33384 ◽

Cited By ~ 8

Author(s):

Matthias Vigelius ◽

Bernd Meyer

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Graphics Processing Units ◽

Drift Diffusion ◽

Diffusion Systems ◽

Graphics Processing

Download Full-text

Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs

The International Journal of High Performance Computing Applications ◽

10.1177/10943420211056654 ◽

2021 ◽

pp. 109434202110566

Author(s):

Pascal R Bähr ◽

Bruno Lang ◽

Peer Ueberholz ◽

Marton Ady ◽

Roberto Kersevan

Keyword(s):

Ray Tracing ◽

Graphics Processing Units ◽

High Vacuum ◽

Simulation Software ◽

Ultra High Vacuum ◽

Particle Accelerators ◽

Accelerated Simulation ◽

Mc Simulation ◽

Graphics Processing ◽

Simulation Unit

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.

Download Full-text

XVA PRINCIPLES, NESTED MONTE CARLO STRATEGIES, AND GPU OPTIMIZATIONS

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024918500309 ◽

2018 ◽

Vol 21 (06) ◽

pp. 1850030 ◽

Cited By ~ 3

Author(s):

LOKMAN A. ABBAS-TURKI ◽

STÉPHANE CRÉPEY ◽

BABACAR DIALLO

Keyword(s):

Monte Carlo ◽

Interest Rate ◽

Outer Layer ◽

Graphics Processing Units ◽

Lower Layer ◽

Credit Derivatives ◽

Square Root ◽

Root Number ◽

Graphics Processing ◽

Gpu Implementation

We present a nested Monte Carlo (NMC) approach implemented on graphics processing units (GPUs) to X-valuation adjustments (XVAs), where X ranges over C for credit, F for funding, M for margin, and K for capital. The overall XVA suite involves five compound layers of dependence. Higher layers are launched first, and trigger nested simulations on-the-fly whenever required in order to compute an item from a lower layer. If the user is only interested in some of the XVA components, then only the sub-tree corresponding to the most outer XVA needs be processed computationally. Inner layers only need a square root number of simulation with respect to the most outer layer. Some of the layers exhibit a smaller variance. As a result, with GPUs at least, error-controlled NMC XVA computations are doable. But, although NMC is naively suited to parallelization, a GPU implementation of NMC XVA computations requires various optimizations. This is illustrated on XVA computations involving equities, interest rate, and credit derivatives, for both bilateral and central clearing XVA metrics.

Download Full-text