An automatic implementation of the mixed precision in NEMO 4.2

At the beginning of 2021 a mixed precision version of the NEMO code was included into the official NEMO repository. The implementation followed the approach presented in Tint&#243; et al. 2019. The proposed optimization despite being not at all trivial, is not new, and quite popular nowadays. In fact, for historical reasons many computational models over-engineer the numerical precision, which leads to an under-optimal exploitation of computational infrastructures. By solving this miss-adjustment a conspicuous payback in terms of efficiency and throughput can be gained: we are not only taking a step toward a more environmentally friendly science, sometimes we are actually pushing the horizon of experiment feasibility a little further. For being able to smoothly include the changes needed in the official release an automatic workflow has been implemented: we attempt to minimize the number of changes required and, at the same time, maximize the number of variables that can be computed using single precision. Here we present a general sketch of the tool and workflow used. Starting from the original code, we automatically produce a new version of the same, where the user can specify the precision of each real variable therein declared. With this new executable, a numerical precision analysis can be performed: a search algorithm specially designed for this task will drive a workflow manager toward the creation of a list of variables that is safe to switch to single precision. The algorithm compares the result of each intermediate step of the workflow with reliable results from a double precision version of the same code, detecting which variables need to retain a higher accuracy. The result of this analysis is eventually used to perform the modification needed into the code in order to produce the desired working mixed precision version, while also keeping the number of necessary changes low. Finally, the previous double precision and the new mixed precision versions will be compared, including a computational comparison and a scientific validation to prove that the new version can be used for operational configurations, without losing accuracy and increasing the computational performance dramatically.

Download Full-text

An evaluation of the mixed precision version of NEMO 4.0.1

10.5194/egusphere-egu2020-16204 ◽

2020 ◽

Author(s):

Oriol Tintó ◽

Stella Valentina Paronuzzi Ticco ◽

Mario C. Acosta ◽

Miguel Castrillo ◽

Kim Serradell ◽

...

Keyword(s):

Environmental Impact ◽

Computational Models ◽

The Other ◽

Double Precision ◽

Single Precision ◽

Computational Performance ◽

Mixed Precision ◽

Scientific Results ◽

Numerical Precision

One of the requirements to keep improving the science produced using NEMO is to enhance its computational performance. The interest in improving its capability to efficiently use the computational infrastructure its two-fold: on one side there are experiments that would only be possible if a certain threshold of throughput is achieved, on the other side any development that achieves an increase in efficiency would help saving resources while reducing the environmental impact of our experiments. One of the opportunities that raised interest in the last few years is the optimization of the numerical precision. Historical reasons brought many computational models to over-engineer the numerical precision: solving this miss-adjustment can payback in terms of efficiency and throughput. In this direction, a research was carried out in order to safely reduce the numerical precision in NEMO which led to a mixed-precision version of the model. The implementation has been developed following the approach proposed by Tint&#243; et al. 2019, in which the variables that require double precision are identified automatically and the remaining ones are switched to use single-precision. The implementation will be released in 2020 and this work presents its evaluation in terms of both performance and scientific results.

Download Full-text

How to use mixed precision in ocean models: exploring a potential reduction of numerical precision in NEMO 4.0 and ROMS 3.6

Geoscientific Model Development ◽

10.5194/gmd-12-3135-2019 ◽

2019 ◽

Vol 12 (7) ◽

pp. 3135-3148 ◽

Cited By ~ 3

Author(s):

Oriol Tintó Prims ◽

Mario C. Acosta ◽

Andrew M. Moore ◽

Miguel Castrillo ◽

Kim Serradell ◽

...

Keyword(s):

Real Variable ◽

Ocean Modeling ◽

Divide And Conquer ◽

Regional Ocean Modeling System ◽

Divide And Conquer Algorithm ◽

Complex Interactions ◽

Computational Performance ◽

Mixed Precision ◽

Ocean Models ◽

Numerical Precision

Abstract. Mixed-precision approaches can provide substantial speed-ups for both computing- and memory-bound codes with little effort. Most scientific codes have overengineered the numerical precision, leading to a situation in which models are using more resources than required without knowing where they are required and where they are not. Consequently, it is possible to improve computational performance by establishing a more appropriate choice of precision. The only input that is needed is a method to determine which real variables can be represented with fewer bits without affecting the accuracy of the results. This paper presents a novel method that enables modern and legacy codes to benefit from a reduction of the precision of certain variables without sacrificing accuracy. It consists of a simple idea: we reduce the precision of a group of variables and measure how it affects the outputs. Then we can evaluate the level of precision that they truly need. Modifying and recompiling the code for each case that has to be evaluated would require a prohibitive amount of effort. Instead, the method presented in this paper relies on the use of a tool called a reduced-precision emulator (RPE) that can significantly streamline the process. Using the RPE and a list of parameters containing the precisions that will be used for each real variable in the code, it is possible within a single binary to emulate the effect on the outputs of a specific choice of precision. When we are able to emulate the effects of reduced precision, we can proceed with the design of the tests that will give us knowledge of the sensitivity of the model variables regarding their numerical precision. The number of possible combinations is prohibitively large and therefore impossible to explore. The alternative of performing a screening of the variables individually can provide certain insight about the required precision of variables, but, on the other hand, other complex interactions that involve several variables may remain hidden. Instead, we use a divide-and-conquer algorithm that identifies the parts that require high precision and establishes a set of variables that can handle reduced precision. This method has been tested using two state-of-the-art ocean models, the Nucleus for European Modelling of the Ocean (NEMO) and the Regional Ocean Modeling System (ROMS), with very promising results. Obtaining this information is crucial to build an actual mixed-precision version of the code in the next phase that will bring the promised performance benefits.

Download Full-text

How to use mixed precision in Ocean Models

10.5194/gmd-2019-20 ◽

2019 ◽

Author(s):

Oriol Tintó Prims ◽

Mario C. Acosta ◽

Andrew M. Moore ◽

Miguel Castrillo ◽

Kim Serradell ◽

...

Keyword(s):

State Of The Art ◽

Real Variable ◽

Divide And Conquer ◽

Divide And Conquer Algorithm ◽

Complex Interactions ◽

Several Variables ◽

Mixed Precision ◽

Novel Method ◽

Ocean Models ◽

Numerical Precision

Abstract. Mixed-precision approaches can provide substantial speed-ups for both computing- and memory-bound codes requiring little effort. Most scientific codes have overengineered the numerical precision leading to a situation where models are using more resources than required without having a clue about where these resources are unnecessary and where are really needed. Consequently, there is the possibility to obtain performance benefits from using a more appropriate choice of precision and the only thing that is needed is a method to determine which real variables can be represented with fewer bits without affecting the accuracy of the results. This paper presents a novel method to enable modern and legacy codes to benefit from a reduction of precision without sacrificing accuracy. It consists in a simple idea: if we can measure how reducing the precision of a group of variables affects the outputs, we can evaluate the level of precision this group of variables need. Modifying and recompiling the code for each case that has to be evaluated would require an amount of effort that makes this task prohibitive. Instead, the method presented in this paper relies on the use of a tool called Reduced Precision Emulator (RPE) that can significantly streamline the process . Using the RPE and a list of parameters containing the precisions that will be used for each real variable in the code, it is possible within a single binary to emulate the effect on the outputs of a specific choice of precision. Once we have the potential of emulating the effects of reduced precision, we can proceed with the design of the tests required to obtain knowledge about all the variables in the model. The number of possible combinations is prohibitively large and impossible to explore. The alternative of performing a screening of the variables individually can give certain insight about the precision needed by the variables, but on the other hand some more complex interactions that involve several variables may remain hidden. Instead, we use a divide-and-conquer algorithm that identifies the parts that cannot handle reduced precision and builds a set of variables that can. The method has been put to proof using two state-of-the-art ocean models, NEMO and ROMS, with very promising results. Obtaining this information is crucial to build afterwards an actual mixed precision version of the code that will bring the promised performance benefits.

Download Full-text

A Guiding Principles for Choosing Numerical Precision in Atmospheric Model based on CESM

10.5194/egusphere-egu2020-13445 ◽

2020 ◽

Author(s):

Jiayi Lai

Keyword(s):

Climate Models ◽

Spatial Scales ◽

Atmospheric Model ◽

Model Complexity ◽

Double Precision ◽

Atmospheric Models ◽

Weather Model ◽

Mixed Precision ◽

Numerical Precision ◽

Discrimination Method

The next generation of weather and climate models will have an unprecedented level of resolution and model complexity, while also increasing the requirements for calculation and memory speed. Reducing the accuracy of certain variables and using mixed precision methods in atmospheric models can greatly improve Computing and memory speed. However, in order to ensure the accuracy of the results, most models have over-designed numerical accuracy, which results in that occupied resources have being much larger than the required resources. Previous studies have shown that the necessary precision for an accurate weather model has clear scale dependence, with large spatial scales requiring higher precision than small scales. Even at large scales the necessary precision is far below that of double precision. However, it is difficult to find a guided method to assign different precisions to different variables, so that it can save unnecessary waste. This paper will take CESM1.2.1 as a research object to conduct a large number of tests to reduce accuracy, and propose a new discrimination method similar to the CFL criterion. This method can realize the correlation verification of a single variable, thereby determining which variables can use a lower level of precision without degrading the accuracy of the results.

Download Full-text

Single-Precision in the Tangent-Linear and Adjoint Models of Incremental 4D-Var

Monthly Weather Review ◽

10.1175/mwr-d-19-0291.1 ◽

2020 ◽

Vol 148 (4) ◽

pp. 1541-1552 ◽

Cited By ~ 1

Author(s):

Sam Hatfield ◽

Andrew McRae ◽

Tim Palmer ◽

Peter Düben

Keyword(s):

Data Assimilation ◽

Conjugate Gradient ◽

Standard Technique ◽

Gradient Algorithm ◽

Double Precision ◽

Single Precision ◽

Minimization Procedure ◽

Quasigeostrophic Model ◽

Numerical Precision ◽

Assimilation Scheme

Abstract The use of single-precision arithmetic in ECMWF’s forecasting model gave a 40% reduction in wall-clock time over double-precision, with no decrease in forecast quality. However, using reduced-precision in 4D-Var data assimilation is relatively unexplored and there are potential issues with using single-precision in the tangent-linear and adjoint models. Here, we present the results of reducing numerical precision in an incremental 4D-Var data assimilation scheme, with an underlying two-layer quasigeostrophic model. The minimizer used is the conjugate gradient method. We show how reducing precision increases the asymmetry between the tangent-linear and adjoint models. For ill-conditioned problems, this leads to a loss of orthogonality among the residuals of the conjugate gradient algorithm, which slows the convergence of the minimization procedure. However, we also show that a standard technique, reorthogonalization, eliminates these issues and therefore could allow the use of single-precision arithmetic. This work is carried out within ECMWF’s data assimilation framework, the Object Oriented Prediction System.

Download Full-text

Gravimetric analysis of uniform polyhedra

Geophysics ◽

10.1190/1.1443964 ◽

1996 ◽

Vol 61 (2) ◽

pp. 357-364 ◽

Cited By ~ 61

Author(s):

Horst Holstein ◽

Ben Ketteridge

Keyword(s):

Error Analysis ◽

Execution Time ◽

Limited Range ◽

Gravimetric Analysis ◽

Double Precision ◽

Rounding Error ◽

Single Precision ◽

Analytical Formulas ◽

Numerical Precision ◽

Facet Boundary

Analytical formulas for the gravity anomaly of a uniform polyhedral body are subject to numerical error that increases with distance from the target, while the anomaly decreases. This leads to a limited range of target distances in which the formulas are operational, beyond which the calculations are dominated by rounding error. We analyze the sources of error and propose a combination of numerical and analytical procedures that exhibit advantages over existing methods, namely (1) errors that diminish with distance, (2) enhanced operating range, and (3) algorithmic simplicity. The latter is achieved by avoiding the need to transform coordinates and the need to discriminate between projected observation points that lie inside, on, or outside a target facet boundary. Our error analysis is verified in computations based on a published code and on a code implementing our methods. The former requires a numerical precision of one part in [Formula: see text] (double precision) in problems of geophysical interest, whereas our code requires a precision of one part in [Formula: see text] (single precision) to give comparable results, typically in half the execution time.

Download Full-text

Mesh–particle interpolations on graphics processing units and multicore central processing units

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0074 ◽

2011 ◽

Vol 369 (1944) ◽

pp. 2164-2175 ◽

Cited By ~ 12

Author(s):

Diego Rossinelli ◽

Christian Conti ◽

Petros Koumoutsakos

Keyword(s):

Computer Architecture ◽

Graphics Processing Units ◽

Double Precision ◽

Single Precision ◽

Central Processing ◽

Computational Performance ◽

Multicore Cpu ◽

Graphics Processing ◽

Gpu Implementation ◽

A Performance

Particle–mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh–particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45–70×, depending on system size, and an acceleration of 85–155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30–40× for the multicore CPU implementation and 20–45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8–3.7× in single precision and 1.7–2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2–2.8× in double precision.

Download Full-text

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Download Full-text

PhotoNs-GPU: A GPU accelerated cosmological simulation code

Research in Astronomy and Astrophysics ◽

10.1088/1674-4527/21/11/281 ◽

2021 ◽

Vol 21 (11) ◽

pp. 281

Author(s):

Qiao Wang ◽

Chen Meng

Keyword(s):

Special Functions ◽

Fast Multipole Method ◽

Kernel Functions ◽

Peak Performance ◽

Double Precision ◽

Simulation Code ◽

Small Noise ◽

Mixed Precision ◽

Speed Up ◽

Different Levels

Abstract We present a GPU-accelerated cosmological simulation code, PhotoNs-GPU, based on an algorithm of Particle Mesh Fast Multipole Method (PM-FMM), and focus on the GPU utilization and optimization. A proper interpolated method for truncated gravity is introduced to speed up the special functions in kernels. We verify the GPU code in mixed precision and different levels of theinterpolated method on GPU. A run with single precision is roughly two times faster than double precision for current practical cosmological simulations. But it could induce an unbiased small noise in power spectrum. Compared with the CPU version of PhotoNs and Gadget-2, the efficiency of the new code is significantly improved. Activated all the optimizations on the memory access, kernel functions and concurrency management, the peak performance of our test runs achieves 48% of the theoretical speed and the average performance approaches to ∼35% on GPU.

Download Full-text

Single precision arithmetic in ECHAM radiation reduces runtime and energy consumption

10.5194/gmd-2020-3 ◽

2020 ◽

Author(s):

Alessandro Cotronei ◽

Thomas Slawig

Keyword(s):

Energy Consumption ◽

Observational Data ◽

Atmospheric Model ◽

Step Change ◽

Double Precision ◽

Performance Gain ◽

Low Resolution ◽

Single Precision ◽

Speed Up ◽

Echam Model

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.

Download Full-text