scholarly journals Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections

Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 86
Author(s):  
Cezary J. Walczyk ◽  
Leonid V. Moroz ◽  
Jan L. Cieśliński

Direct computation of functions using low-complexity algorithms can be applied both for hardware constraints and in systems where storage capacity is a challenge for processing a large volume of data. We present improved algorithms for fast calculation of the inverse square root function for single-precision and double-precision floating-point numbers. Higher precision is also discussed. Our approach consists in minimizing maximal errors by finding optimal magic constants and modifying the Newton–Raphson coefficients. The obtained algorithms are much more accurate than the original fast inverse square root algorithm and have similar very low computational costs.


2021 ◽  
Vol 1 (2) ◽  
pp. 1-30
Author(s):  
William B. Langdon ◽  
Oliver Krauss

We use continuous optimisation and manual code changes to evolve up to 1024 Newton-Raphson numerical values embedded in an open source GNU C library glibc square root sqrt to implement a double precision cube root routine cbrt, binary logarithm log2 and reciprocal square root function for C in seconds. The GI inverted square root x -1/2 is far more accurate than Quake’s InvSqrt, Quare root. GI shows potential for automatically creating mobile or low resource mote smart dust bespoke custom mathematical libraries with new functionality.



Author(s):  
Cezary J. Walczyk ◽  
Leonid V. Moroz ◽  
Jan L. Cieśliński

We present an improved algorithm for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithm is much more accurate than the famous fast inverse square root algorithm and has a similar computational cost. The presented modification concern Newton-Raphson corrections and can be applied when the distribution of these corrections is not symmetric (for instance, in our case they are always negative).



Computation ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 41 ◽  
Author(s):  
Cezary J. Walczyk ◽  
Leonid V. Moroz ◽  
Jan L. Cieśliński

We present a new algorithm for the approximate evaluation of the inverse square root for single-precision floating-point numbers. This is a modification of the famous fast inverse square root code. We use the same “magic constant” to compute the seed solution, but then, we apply Newton–Raphson corrections with modified coefficients. As compared to the original fast inverse square root code, the new algorithm is two-times more accurate in the case of one Newton–Raphson correction and almost seven-times more accurate in the case of two corrections. We discuss relative errors within our analytical approach and perform numerical tests of our algorithm for all numbers of the type float.



Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 1058
Author(s):  
Leonid Moroz ◽  
Volodymyr Samotyy ◽  
Cezary J. Walczyk ◽  
Jan L. Cieśliński

We develop a bit manipulation technique for single precision floating point numbers which leads to new algorithms for fast computation of the cube root and inverse cube root. It uses the modified iterative Newton–Raphson method (the first order of convergence) and Householder method (the second order of convergence) to increase the accuracy of the results. The proposed algorithms demonstrate high efficiency and reduce error several times in the first iteration in comparison with known algorithms. After two iterations 22.84 correct bits were obtained for single precision. Experimental tests showed that our novel algorithm is faster and more accurate than library functions for microcontrollers.



2018 ◽  
Author(s):  
Pavel Pokhilko ◽  
Evgeny Epifanovsky ◽  
Anna I. Krylov

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.



Author(s):  
R. A. Morozov ◽  
P. V. Trifonov

Introduction:Practical implementation of a communication system which employs a family of polar codes requires either to store a number of large specifications or to construct the codes by request. The first approach assumes extensive memory consumption, which is inappropriate for many applications, such as those for mobile devices. The second approach can be numerically unstable and hard to implement in low-end hardware. One of the solutions is specifying a family of codes by a sequence of subchannels sorted by reliability. However, this solution makes it impossible to separately optimize each code from the family.Purpose:Developing a method for compact specifications of polar codes and subcodes.Results:A method is proposed for compact specification of polar codes. It can be considered a trade-off between real-time construction and storing full-size specifications in memory. We propose to store compact specifications of polar codes which contain frozen set differences between the original pre-optimized polar codes and the polar codes constructed for a binary erasure channel with some erasure probability. Full-size specification needed for decoding can be restored from a compact one by a low-complexity hardware-friendly procedure. The proposed method can work with either polar codes or polar subcodes, allowing you to reduce the memory consumption by 15–50 times.Practical relevance:The method allows you to use families of individually optimized polar codes in devices with limited storage capacity. 



1993 ◽  
Vol 301 ◽  
Author(s):  
J. L. Benton ◽  
D. J. Eaglesham ◽  
M. Almonte ◽  
P. H. Citrin ◽  
M. A. Marcus ◽  
...  

ABSTRACTAn understanding of the electrical, structural, and optical properites of Er in Si is necessary to evaluate this system as an opto-electronic material. Extended x-ray absorption fine structure, EXAFS, measurements of Er-implanted Si show that the optically active impurity complex is Er surrounded by an O cage of 6 atoms. The Er photoluminescence intensity is a square root function of excitation power, while the free exciton intensity increases linearly. The square root dependence of the 1.54μm-intensity is independent of measurement temperature and independent of co-implanted species. Ion-implantation of Er in Si introduces donor activity, but spreading resistance carrier concentration profiles indicate that these donors do not effect the optical activity of the Er.



2020 ◽  
Author(s):  
Alessandro Cotronei ◽  
Thomas Slawig

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.



2019 ◽  
Vol 8 (2S11) ◽  
pp. 2990-2993

Duplication of the coasting element numbers is the big activity in automated signal handling. So the exhibition of drifting problem multipliers count on a primary undertaking in any computerized plan. Coasting factor numbers are spoken to utilizing IEEE 754 modern day in single precision(32-bits), Double precision(sixty four-bits) and Quadruple precision(128-bits) organizations. Augmentation of those coasting component numbers can be completed via using Vedic generation. Vedic arithmetic encompass sixteen wonderful calculations or Sutras. Urdhva Triyagbhyam Sutra is most usually applied for growth of twofold numbers. This paper indicates the compare of tough work finished via exceptional specialists in the direction of the plan of IEEE 754 ultra-modern-day unmarried accuracy skimming thing multiplier the usage of Vedic technological statistics.



Sign in / Sign up

Export Citation Format

Share Document