Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC were conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.

Download Full-text

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)

10.31223/x52s7m ◽

2021 ◽

Author(s):

Lars Hoffmann ◽

Paul Baumeister ◽

Zhongyin Cai ◽

Jan Clemens ◽

Sabine Griessbach ◽

...

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Transport Model ◽

Programming Model ◽

Meteorological Data ◽

Transport Processes ◽

Model Verification ◽

Lagrangian Transport ◽

Trajectory Calculations ◽

Graphics Processing

Lagrangian models are powerful tools to study atmospheric transport processes. However, conducting large-scaleLagrangian transport simulations with many air parcels can become numerically rather costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model have been fully ported to GPUs, i. e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the MPI/OpenMP/OpenACC hybrid parallelization of MPTRAC have been conducted on the JUWELS Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor CoreGPUs, providing a peak performance of 71.0 PFlop/s. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 100 million particles driven by the European Centre for Medium-Range Weather Forecasts’ ERA5 reanalysis, the performance evaluation showed a maximum speedup of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, being conducted on the GPUs. Another 15 % of the runtime is required for file-I/O, mostly to read the ERA5 data from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model to be ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model

Download Full-text

Supplementary material to "Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)"

10.5194/gmd-2021-382-supplement ◽

2021 ◽

Author(s):

Lars Hoffmann ◽

Paul F. Baumeister ◽

Zhongyin Cai ◽

Jan Clemens ◽

Sabine Griessbach ◽

...

Keyword(s):

Graphics Processing Units ◽

Lagrangian Transport ◽

Trajectory Calculations ◽

Supplementary Material ◽

Graphics Processing

Download Full-text

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

10.1101/627497 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Protein Modification ◽

Large Scale ◽

Computational Cost ◽

Selection Procedure ◽

Spectral Library ◽

Data Set ◽

Graphics Processing ◽

Feature Hashing

AbstractOpen modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides.We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome.ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Large-scale transient stability simulation on graphics processing units

2009 IEEE Power & Energy Society General Meeting ◽

10.1109/pes.2009.5275844 ◽

2009 ◽

Cited By ~ 14

Author(s):

Vahid Jalili-Marandi ◽

Venkata Dinavahi

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Transient Stability ◽

Graphics Processing

Download Full-text

Error correlation between CO2 and CO as constraint for CO2 flux inversions using satellite data

Atmospheric Chemistry and Physics ◽

10.5194/acp-9-7313-2009 ◽

2009 ◽

Vol 9 (19) ◽

pp. 7313-7323 ◽

Cited By ~ 25

Author(s):

H. Wang ◽

D. J. Jacob ◽

M. Kopacz ◽

D. B. A. Jones ◽

P. Suntharalingam ◽

...

Keyword(s):

Inverse Modeling ◽

Large Scale ◽

Transport Model ◽

Meteorological Data ◽

Correlation Coefficients ◽

Surface Fluxes ◽

Satellite Observations ◽

Carbon Surface ◽

Chemical Transport Model ◽

Error Correlation

Abstract. Inverse modeling of CO2 satellite observations to better quantify carbon surface fluxes requires a chemical transport model (CTM) to relate the fluxes to the observed column concentrations. CTM transport error is a major source of uncertainty. We show that its effect can be reduced by using CO satellite observations as additional constraint in a joint CO2-CO inversion. CO is measured from space with high precision, is strongly correlated with CO2, and is more sensitive than CO2 to CTM transport errors on synoptic and smaller scales. Exploiting this constraint requires statistics for the CTM transport error correlation between CO2 and CO, which is significantly different from the correlation between the concentrations themselves. We estimate the error correlation globally and for different seasons by a paired-model method (comparing GEOS-Chem CTM simulations of CO2 and CO columns using different assimilated meteorological data sets for the same meteorological year) and a paired-forecast method (comparing 48- vs. 24-h GEOS-5 CTM forecasts of CO2 and CO columns for the same forecast time). We find strong error correlations (r2>0.5) between CO2 and CO columns over much of the extra-tropical Northern Hemisphere throughout the year, and strong consistency between different methods to estimate the error correlation. Application of the averaging kernels used in the retrieval for thermal IR CO measurements weakens the correlation coefficients by 15% on average (mostly due to variability in the averaging kernels) but preserves the large-scale correlation structure. We present a simple inverse modeling application to demonstrate that CO2-CO error correlations can indeed significantly reduce uncertainty on surface carbon fluxes in a joint CO2-CO inversion vs. a CO2-only inversion.

Download Full-text

The Effects of Scaling and Model Complexity in Simulating the Transport of Inorganic Micropollutants in a Lowland River Reach

Water Quality Research Journal ◽

10.2166/wqrj.2006.003 ◽

2006 ◽

Vol 41 (1) ◽

pp. 24-36 ◽

Cited By ~ 10

Author(s):

Karl-Erich Lindenschmidt ◽

René Wodrich ◽

Cornelia Hesse

Keyword(s):

Large Scale ◽

Transport Model ◽

Transport Processes ◽

Model Complexity ◽

Scale Model ◽

Small Scale ◽

Quality Model ◽

Large Scale Model ◽

Iron And Zinc ◽

Physical And Chemical

Abstract A hypothesis stating that more complex descriptions of processes in models simulate reality better (less error) but with more unreliable predictability (more sensitivity) is tested using a river water quality model. This hypothesis was extended stating that applying the model on a domain of smaller scale requires greater complexity to capture the same accuracy as in large-scale model applications which, however, leads to increased model sensitivity. The sediment and pollutant transport model TOXI, a module in the WASP5 package, was applied to two case studies of different scale: a 90-km course of the 5th order (sensu Strahler 1952) lower Saale river, Germany (large scale), and the lock-and-weir system at Calbe (small scale) situated on the same river course. A sensitivity analysis of several parameters relating to the physical and chemical transport processes of suspended solids, chloride, arsenic, iron and zinc shows that the coefficient, which partitions the total heavy metal mass into its dissolved and sorbed fraction, is a very sensitive parameter. Hence, the complexity of the sorptive process was varied to test the hypotheses.

Download Full-text

EMO-5: Copernicus pan-European high-resolution meteorological data set for large-scale hydrological modelling

10.5194/egusphere-egu2020-21551 ◽

2020 ◽

Author(s):

Vera Thiemig ◽

Peter Salamon ◽

Goncalo N. Gomes ◽

Jon O. Skøien ◽

Markus Ziese ◽

...

Keyword(s):

High Resolution ◽

Real Time ◽

Large Scale ◽

Interpolation Method ◽

Meteorological Data ◽

Hydrological Modelling ◽

Data Set ◽

Average Temperature ◽

Time Period ◽

Meteorological Observations

We present EMO-5, a Pan-European high-resolution (5 km), (sub-)daily, multi-variable meteorological data set especially developed to the needs of an operational, pan-European hydrological service (EFAS; European Flood Awareness System). The data set is built on historic and real-time observations coming from 18,964 meteorological in-situ stations, collected from 24 data providers, and 10,632 virtual stations from four high-resolution regional observational grids (CombiPrecip, ZAMG - INCA, EURO4M-APGD and CarpatClim) as well as one global reanalysis product (ERA-Interim-land). This multi-variable data set covers precipitation, temperature (average, min and max), wind speed, solar radiation and vapor pressure; all at daily resolution and in addition 6-hourly resolution for precipitation and average temperature. The original observations were thoroughly quality controlled before we used the Spheremap interpolation method to estimate the variable values for each of the 5 x 5 km grid cells and their affiliated uncertainty. EMO-5 v1 grids covering the time period from 1990 till 2019 will be released as a free and open Copernicus product mid-2020 (with a near real-time release of the latest gridded observations in future). We would like to present the great potential EMO-5 holds for the hydrological modelling community.&#160;footnote: EMO = European Meteorological Observations

Download Full-text

Large-scale analytical Fourier transform of photomask layouts using graphics processing units

10.1117/12.2192040 ◽

2015 ◽

Author(s):

Julia A. Sakamoto

Keyword(s):

Fourier Transform ◽

Graphics Processing Units ◽

Large Scale ◽

Graphics Processing

Download Full-text

Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units

Computer Physics Communications ◽

10.1016/j.cpc.2011.04.014 ◽

2011 ◽

Vol 182 (8) ◽

pp. 1651-1656 ◽

Cited By ~ 5

Author(s):

Kyle A. Wendt ◽

Joaquín E. Drut ◽

Timo A. Lähde

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Hubbard Model ◽

Graphics Processing Units ◽

Large Scale ◽

Hybrid Monte Carlo ◽

Graphics Processing

Download Full-text

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on Graphics Processing Units (GPUs)