scholarly journals Single precision arithmetic in ECHAM radiation reduces runtime and energy consumption

2020 ◽  
Author(s):  
Alessandro Cotronei ◽  
Thomas Slawig

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.


2020 ◽  
Vol 13 (6) ◽  
pp. 2783-2804 ◽  
Author(s):  
Alessandro Cotronei ◽  
Thomas Slawig

Abstract. We converted the radiation part of the atmospheric model ECHAM to a single-precision arithmetic. We analyzed different conversion strategies and finally used a step-by-step change in all modules, subroutines and functions. We found out that a small code portion still requires higher-precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single-precision version in the coarse resolution with observational data and with the original double-precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible runtime reduction, at both coarse and low resolution. The single-precision radiation itself was accelerated by about 40 %, whereas the runtime reduction for the whole ECHAM model using the converted radiation achieved 18 % in the best configuration. We further measured the energy consumption, which could also be reduced.



2021 ◽  
Author(s):  
Sam Hatfield ◽  
Kristian Mogensen ◽  
Peter Dueben ◽  
Nils Wedi ◽  
Michail Diamantakis

<p>Earth-System models traditionally use double-precision, 64 bit floating-point numbers to perform arithmetic. According to orthodoxy, we must use such a relatively high level of precision in order to minimise the potential impact of rounding errors on the physical fidelity of the model. However, given the inherently imperfect formulation of our models, and the computational benefits of lower precision arithmetic, we must question this orthodoxy. At ECMWF, a single-precision, 32 bit variant of the atmospheric model IFS has been undergoing rigorous testing in preparation for operations for around 5 years. The single-precision simulations have been found to have effectively the same forecast skill as the double-precision simulations while finishing in 40% less time, thanks to the memory and cache benefits of single-precision numbers. Following these positive results, other modelling groups are now also considering single-precision as a way to accelerate their simulations.</p><p>In this presentation I will present the rationale behind the move to lower-precision floating-point arithmetic and up-to-date results from the single-precision atmospheric model at ECMWF, which will be operational imminently. I will then provide an update on the development of the single-precision ocean component at ECMWF, based on the NEMO ocean model, including a verification of quarter-degree simulations. I will also present new results from running ECMWF's coupled atmosphere-ocean-sea-ice-wave forecasting system entirely with single-precision. Finally I will discuss the feasibility of even lower levels of precision, like half-precision, which are now becoming available through GPU- and ARM-based systems such as Summit and Fugaku, respectively. The use of reduced-precision floating-point arithmetic will be an essential consideration for developing high-resolution, storm-resolving Earth-System models.</p>



2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Hasitha Muthumala Waidyasooriya ◽  
Masanori Hariyama ◽  
Yasuhiro Takei ◽  
Michitaka Kameyama

Acceleration of FDTD (finite-difference time-domain) is very important for the fields such as computational electromagnetic simulation. We consider the FDTD simulation model of cylindrical resonator design that requires double precision floating-point and cannot be done using single precision. Conventional FDTD acceleration methods have a common problem of memory-bandwidth limitation due to the large amount of parallel data access. To overcome this problem, we propose a hybrid of single and double precision floating-point computation method that reduces the data-transfer amount. We analyze the characteristics of the FDTD simulation to find out when we can use single precision instead of double precision. According to the experimental results, we achieved over 15 times of speed-up compared to the CPU single-core implementation and over 1.52 times of speed-up compared to the conventional GPU-based implementation.



2018 ◽  
Author(s):  
Pavel Pokhilko ◽  
Evgeny Epifanovsky ◽  
Anna I. Krylov

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.



2021 ◽  
Vol 21 (11) ◽  
pp. 281
Author(s):  
Qiao Wang ◽  
Chen Meng

Abstract We present a GPU-accelerated cosmological simulation code, PhotoNs-GPU, based on an algorithm of Particle Mesh Fast Multipole Method (PM-FMM), and focus on the GPU utilization and optimization. A proper interpolated method for truncated gravity is introduced to speed up the special functions in kernels. We verify the GPU code in mixed precision and different levels of theinterpolated method on GPU. A run with single precision is roughly two times faster than double precision for current practical cosmological simulations. But it could induce an unbiased small noise in power spectrum. Compared with the CPU version of PhotoNs and Gadget-2, the efficiency of the new code is significantly improved. Activated all the optimizations on the memory access, kernel functions and concurrency management, the peak performance of our test runs achieves 48% of the theoretical speed and the average performance approaches to ∼35% on GPU.



2019 ◽  
Vol 8 (2S11) ◽  
pp. 2990-2993

Duplication of the coasting element numbers is the big activity in automated signal handling. So the exhibition of drifting problem multipliers count on a primary undertaking in any computerized plan. Coasting factor numbers are spoken to utilizing IEEE 754 modern day in single precision(32-bits), Double precision(sixty four-bits) and Quadruple precision(128-bits) organizations. Augmentation of those coasting component numbers can be completed via using Vedic generation. Vedic arithmetic encompass sixteen wonderful calculations or Sutras. Urdhva Triyagbhyam Sutra is most usually applied for growth of twofold numbers. This paper indicates the compare of tough work finished via exceptional specialists in the direction of the plan of IEEE 754 ultra-modern-day unmarried accuracy skimming thing multiplier the usage of Vedic technological statistics.



2011 ◽  
Vol 11 (02n03) ◽  
pp. 569-591 ◽  
Author(s):  
HOONG CHIEH YEONG ◽  
JUN HYUN PARK ◽  
N. SRI NAMACHCHIVAYA

The study of random dynamical systems involves understanding the evolution of state variables that contain uncertainties and that are usually hidden, or not directly observable. Therefore, state variables have to be estimated and updated based on system models using information from observational data, which themselves are noisy, in the sense that they contain uncertainties and disturbances due to imperfections in observational devices and disturbances in the environment within which data are being collected. The development of efficient data assimilation methods for integrating observational data in predicting the evolution of random state variables is thus an important aspect in the study of random dynamical systems. In this paper, we consider a particle filtering approach to nonlinear filtering in multiscale dynamical systems. Particle filtering methods [1–3] utilizes ensembles of particles to represent the conditional density of state variables using particle positions, distributed over a sample space. The distribution of an ensemble of particles is updated using observational data to obtain the best representation of the conditional density of the state variables of interest. On the other hand, homogenization theory [4, 5], allows us to estimate the coarse-grained (slow) dynamics of a multiscale system on a larger timescale without having to explicitly study the fast variable evolution on a small timescale. The results of filter convergence presented in [6] shows the convergence of the filter of the actual state variable to a homogenized solution to the original multiscale system, and thus we develop a particle filtering scheme for multiscale random dynamical systems that utilizes this convergence result. This particle filtering method is called the Homogenized Hybird Particle Filter, and it incorporates a multiscale computation scheme, the Heterogeneous Multiscale Method developed in [7], with the novel branching particle filter described in [8–10]. By incorporating a multiscale scheme based on homogenization of the original system, estimation of the coarse-grained dynamics using observational data is performed over a larger timescale, thus resulting in computational time and cost reduction in terms of the evolution of the state variables as well as functional evaluations for the filtering aspect. We describe the theory behind this combined scheme and its general algorithm, concluded with an application to the Lorenz-96 [11] atmospheric model that mimics midlatitude geophysical dynamics with microscopic convective processes.



2001 ◽  
Vol 01 (02) ◽  
pp. 217-230 ◽  
Author(s):  
M. GAVRILOVA ◽  
J. ROKNE

The main result of the paper is a new and efficient algorithm to compute the closest possible representable intersection point between two lines in the plane. The coordinates of the points that define the lines are given as single precision floating-point numbers. The novelty of the algorithm is the method for deriving the best possible representable floating point numbers: instead of solving the equations to compute the line intersection coordinates exactly, which is a computationally expensive procedure, an iterative binary search procedure is applied. When the required precision is achieved, the algorithm stops. Only exact comparison tests are needed. Interval arithmetic is applied to further speed up the process. Experimental results demonstrate that the proposed algorithm is on the average ten times faster than an implementation of the line intersection computation subroutine using the CORE library exact arithmetic.



2014 ◽  
Vol 522-524 ◽  
pp. 1822-1825
Author(s):  
Jun Song Jia ◽  
Qiong Chen ◽  
Lin Lin Hu

Taking the construction industry of Beijing as an example, we, first, accounted the energy consumption (EC) and carbon emission (CE) in 1990-2012. Then, we used the Partial Least Squares (PLS) method to analyze the drivers of the CE. It was found the EC and the CE of Beijings construction industry in 1990 was 8.1 PJ and 0.99 Mt, respectively. They grew up to 30.6 PJ and 4.52 Mt in 2012. The increasing number was 22.5 PJ and 3.53 Mt with an average annual growth rate of 6.23% and 7.15%. The sources of CE were mainly arising from electricity, diesel, gasoline and raw coal. The driver's size of GDP per capita (A) was bigger than population (P). The 1% increase of A or P would make the CE increase 1.758% or 0.105%, respectively. The classical hypothesis of Environmental Kuznet's curve did exist in the CE of Beijing's construction industry. The urbanization rate increased 1%, to some extent, would make the CE decrease 0.421%. The improvement of scientific study itself cannot make the CE decrease. Only when the gotten achievements were transformed into the concrete technology and used in real life, the CE might decrease. So, it was necessary to speed up the new-type urbanization strategies, the development pattern's upgrade and transformation of economy and to continue to carry out national relevant policies on controlling population growth.



Sign in / Sign up

Export Citation Format

Share Document