scholarly journals FDTD Acceleration for Cylindrical Resonator Design Based on the Hybrid of Single and Double Precision Floating-Point Computation

2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Hasitha Muthumala Waidyasooriya ◽  
Masanori Hariyama ◽  
Yasuhiro Takei ◽  
Michitaka Kameyama

Acceleration of FDTD (finite-difference time-domain) is very important for the fields such as computational electromagnetic simulation. We consider the FDTD simulation model of cylindrical resonator design that requires double precision floating-point and cannot be done using single precision. Conventional FDTD acceleration methods have a common problem of memory-bandwidth limitation due to the large amount of parallel data access. To overcome this problem, we propose a hybrid of single and double precision floating-point computation method that reduces the data-transfer amount. We analyze the characteristics of the FDTD simulation to find out when we can use single precision instead of double precision. According to the experimental results, we achieved over 15 times of speed-up compared to the CPU single-core implementation and over 1.52 times of speed-up compared to the conventional GPU-based implementation.

2020 ◽  
Author(s):  
Alessandro Cotronei ◽  
Thomas Slawig

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2990-2993

Duplication of the coasting element numbers is the big activity in automated signal handling. So the exhibition of drifting problem multipliers count on a primary undertaking in any computerized plan. Coasting factor numbers are spoken to utilizing IEEE 754 modern day in single precision(32-bits), Double precision(sixty four-bits) and Quadruple precision(128-bits) organizations. Augmentation of those coasting component numbers can be completed via using Vedic generation. Vedic arithmetic encompass sixteen wonderful calculations or Sutras. Urdhva Triyagbhyam Sutra is most usually applied for growth of twofold numbers. This paper indicates the compare of tough work finished via exceptional specialists in the direction of the plan of IEEE 754 ultra-modern-day unmarried accuracy skimming thing multiplier the usage of Vedic technological statistics.


2021 ◽  
Author(s):  
Sam Hatfield ◽  
Kristian Mogensen ◽  
Peter Dueben ◽  
Nils Wedi ◽  
Michail Diamantakis

<p>Earth-System models traditionally use double-precision, 64 bit floating-point numbers to perform arithmetic. According to orthodoxy, we must use such a relatively high level of precision in order to minimise the potential impact of rounding errors on the physical fidelity of the model. However, given the inherently imperfect formulation of our models, and the computational benefits of lower precision arithmetic, we must question this orthodoxy. At ECMWF, a single-precision, 32 bit variant of the atmospheric model IFS has been undergoing rigorous testing in preparation for operations for around 5 years. The single-precision simulations have been found to have effectively the same forecast skill as the double-precision simulations while finishing in 40% less time, thanks to the memory and cache benefits of single-precision numbers. Following these positive results, other modelling groups are now also considering single-precision as a way to accelerate their simulations.</p><p>In this presentation I will present the rationale behind the move to lower-precision floating-point arithmetic and up-to-date results from the single-precision atmospheric model at ECMWF, which will be operational imminently. I will then provide an update on the development of the single-precision ocean component at ECMWF, based on the NEMO ocean model, including a verification of quarter-degree simulations. I will also present new results from running ECMWF's coupled atmosphere-ocean-sea-ice-wave forecasting system entirely with single-precision. Finally I will discuss the feasibility of even lower levels of precision, like half-precision, which are now becoming available through GPU- and ARM-based systems such as Summit and Fugaku, respectively. The use of reduced-precision floating-point arithmetic will be an essential consideration for developing high-resolution, storm-resolving Earth-System models.</p>


2001 ◽  
Vol 01 (02) ◽  
pp. 217-230 ◽  
Author(s):  
M. GAVRILOVA ◽  
J. ROKNE

The main result of the paper is a new and efficient algorithm to compute the closest possible representable intersection point between two lines in the plane. The coordinates of the points that define the lines are given as single precision floating-point numbers. The novelty of the algorithm is the method for deriving the best possible representable floating point numbers: instead of solving the equations to compute the line intersection coordinates exactly, which is a computationally expensive procedure, an iterative binary search procedure is applied. When the required precision is achieved, the algorithm stops. Only exact comparison tests are needed. Interval arithmetic is applied to further speed up the process. Experimental results demonstrate that the proposed algorithm is on the average ten times faster than an implementation of the line intersection computation subroutine using the CORE library exact arithmetic.


2016 ◽  
Author(s):  
Charles S. Zender

Abstract. Lossy compression schemes can help reduce the space required to store the false precision (i.e, scientifically meaningless data bits) that geoscientific models and measurements generate. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving which quantizes values solely by zeroing bits. Our variation eliminates the artificial low-bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression schemes to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by uncompressed and compressed climate data by up to 50 % and 20 %, respectively, for single-precision data (the most common case for climate data). When used aggressively (i.e., preserving only 1–3 decimal digits of precision), Bit Grooming produces storage reductions comparable to other quantization techniques such as linear packing. Unlike linear packing, Bit Grooming works on the full representable range of floating-point data. Bit Grooming reduces the volume of single-precision compressed data by roughly 10 % per decimal digit quantized (or "groomed") after the third such digit, up to a maximum reduction of about 50 %. The potential reduction is greater for double-precision datasets. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Nikolaos Alachiotis ◽  
Alexandros Stamatakis

The use of reconfigurable computing for accelerating floating-point intensive codes is becoming common due to the availability of DSPs in new-generation FPGAs. We present the design of an efficient, pipelined floating-point datapath for calculating the logarithm function on reconfigurable devices. We integrate the datapath into a stand-alone LUT-based (Lookup Table) component, the LAU (Logarithm Approximation Unit). We extended the LAU, by integrating two architecturally independent, LAU-based datapaths into a larger component, the VLAU (vector-like LAU). The VLAU produces 2 results/cycle, while occupying the same amount of memory as the LAU. Under single precision, one LAU is 12 and 1.7 times faster than the GNU and Intel Math Kernel Library (MKL) implementations, respectively. The LAU is also 1.6 times faster than the FloPoCo reconfigurable logarithm architecture. Under double precision, one LAU is 20 and 2.6 times faster than the respective GNU and MKL functions and 1.4 times faster than the FloPoCo logarithm. The VLAU is approximately twice as fast as the LAU, both under single and double precision.


Currently, each CPU has one or additional Floating Point Units (FPUs) integrated inside it. It is usually utilized in math wide-ranging applications, such as digital signal processing. It is found in places be established in engineering, medical and military fields in adding along to in different fields requiring audio, image or video handling. A high-speed and energy-efficient floating point unit is naturally needed in the electronics diligence as an arithmetic unit in microprocessors. The most operations accounting 95% of conformist FPU are multiplication and addition. Many applications need the speedy execution of arithmetic operations. In the existing system, the FPM(Floating Point Multiplication) and FPA(Floating Point Addition) have more delay and fewer speed and fewer throughput. The demand for high speed and throughput intended to design the multiplier and adder blocks within the FPM (Floating point multiplication)and FPA(Floating Point Addition) in a format of single precision floating point and double-precision floating point operation is internally pipelined to achieve high throughput and these are supported by the IEEE 754 standard floating point representations. This is designed with the Verilog code using Xilinx ISE 14.5 software tool is employed to code and verify the ensuing waveforms of the designed code


2014 ◽  
Vol 142 (10) ◽  
pp. 3809-3829 ◽  
Author(s):  
Peter D. Düben ◽  
T. N. Palmer

Abstract A reduction of computational cost would allow higher resolution in numerical weather predictions within the same budget for computation. This paper investigates two approaches that promise significant savings in computational cost: the use of reduced precision hardware, which reduces floating point precision beyond the standard double- and single-precision arithmetic, and the use of stochastic processors, which allow hardware faults in a trade-off between reduced precision and savings in power consumption and computing time. Reduced precision is emulated within simulations of a spectral dynamical core of a global atmosphere model and a detailed study of the sensitivity of different parts of the model to inexact hardware is performed. Afterward, benchmark simulations were performed for which as many parts of the model as possible were put onto inexact hardware. Results show that large parts of the model could be integrated with inexact hardware at error rates that are surprisingly high or with reduced precision to only a couple of bits in the significand of floating point numbers. However, the sensitivities to inexact hardware of different parts of the model need to be respected, for example, via scale separation. In the last part of the paper, simulations with a full operational weather forecast model in single precision are presented. It is shown that differences in accuracy between the single- and double-precision forecasts are smaller than differences between ensemble members of the ensemble forecast at the resolution of the standard ensemble forecasting system. The simulations prove that the trade-off between precision and performance is a worthwhile effort, already on existing hardware.


Sign in / Sign up

Export Citation Format

Share Document