Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections

Cezary J. Walczyk; Leonid V. Moroz; Jan L. Cieśliński

doi:10.3390/e23010086

Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections

Entropy ◽

10.3390/e23010086 ◽

2021 ◽

Vol 23 (1) ◽

pp. 86

Author(s):

Cezary J. Walczyk ◽

Leonid V. Moroz ◽

Jan L. Cieśliński

Keyword(s):

Storage Capacity ◽

Root Function ◽

Low Complexity ◽

Double Precision ◽

Square Root ◽

Direct Computation ◽

Single Precision ◽

Fast Calculation ◽

Computational Costs ◽

Newton Raphson

Direct computation of functions using low-complexity algorithms can be applied both for hardware constraints and in systems where storage capacity is a challenge for processing a large volume of data. We present improved algorithms for fast calculation of the inverse square root function for single-precision and double-precision floating-point numbers. Higher precision is also discussed. Our approach consists in minimizing maximal errors by finding optimal magic constants and modifying the Newton–Raphson coefficients. The obtained algorithms are much more accurate than the original fast inverse square root algorithm and have similar very low computational costs.

Genetic Improvement of Data for Maths Functions

ACM Transactions on Evolutionary Learning and Optimization ◽

10.1145/3461016 ◽

2021 ◽

Vol 1 (2) ◽

pp. 1-30

Author(s):

William B. Langdon ◽

Oliver Krauss

Keyword(s):

Open Source ◽

Genetic Improvement ◽

Root Function ◽

Cube Root ◽

Double Precision ◽

Square Root ◽

Smart Dust ◽

Code Changes ◽

Newton Raphson ◽

Binary Logarithm

We use continuous optimisation and manual code changes to evolve up to 1024 Newton-Raphson numerical values embedded in an open source GNU C library glibc square root sqrt to implement a double precision cube root routine cbrt, binary logarithm log2 and reciprocal square root function for C in seconds. The GI inverted square root x -1/2 is far more accurate than Quake’s InvSqrt, Quare root. GI shows potential for automatically creating mobile or low resource mote smart dust bespoke custom mathematical libraries with new functionality.

A Modification of the Fast Inverse Square Root Algorithm

10.20944/preprints201908.0045.v1 ◽

2019 ◽

Author(s):

Cezary J. Walczyk ◽

Leonid V. Moroz ◽

Jan L. Cieśliński

Keyword(s):

Computational Cost ◽

Floating Point ◽

Square Root ◽

Single Precision ◽

Fast Calculation ◽

Newton Raphson ◽

Floating Point Numbers ◽

Improved Algorithm

We present an improved algorithm for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithm is much more accurate than the famous fast inverse square root algorithm and has a similar computational cost. The presented modification concern Newton-Raphson corrections and can be applied when the distribution of these corrections is not symmetric (for instance, in our case they are always negative).

A Modification of the Fast Inverse Square Root Algorithm

Computation ◽

10.3390/computation7030041 ◽

2019 ◽

Vol 7 (3) ◽

pp. 41 ◽

Cited By ~ 1

Author(s):

Cezary J. Walczyk ◽

Leonid V. Moroz ◽

Jan L. Cieśliński

Keyword(s):

Analytical Approach ◽

Floating Point ◽

Square Root ◽

Approximate Evaluation ◽

Single Precision ◽

Seed Solution ◽

Numerical Tests ◽

Newton Raphson ◽

Relative Errors ◽

Magic Constant

We present a new algorithm for the approximate evaluation of the inverse square root for single-precision floating-point numbers. This is a modification of the famous fast inverse square root code. We use the same “magic constant” to compute the seed solution, but then, we apply Newton–Raphson corrections with modified coefficients. As compared to the original fast inverse square root code, the new algorithm is two-times more accurate in the case of one Newton–Raphson correction and almost seven-times more accurate in the case of two corrections. We discuss relative errors within our analytical approach and perform numerical tests of our algorithm for all numbers of the type float.

Fast Calculation of Cube and Inverse Cube Roots Using a Magic Constant and Its Implementation on Microcontrollers

Energies ◽

10.3390/en14041058 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1058

Author(s):

Leonid Moroz ◽

Volodymyr Samotyy ◽

Cezary J. Walczyk ◽

Jan L. Cieśliński

Keyword(s):

Order Of Convergence ◽

High Efficiency ◽

Experimental Tests ◽

Cube Root ◽

Single Precision ◽

Fast Calculation ◽

First Order ◽

Newton Raphson ◽

Raphson Method ◽

New Algorithms

We develop a bit manipulation technique for single precision floating point numbers which leads to new algorithms for fast computation of the cube root and inverse cube root. It uses the modified iterative Newton–Raphson method (the first order of convergence) and Householder method (the second order of convergence) to increase the accuracy of the results. The proposed algorithms demonstrate high efficiency and reduce error several times in the first iteration in comparison with known algorithms. After two iterations 22.84 correct bits were obtained for single precision. Experimental tests showed that our novel algorithm is faster and more accurate than library functions for microcontrollers.

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Compact specification of polar codes

Information and Control Systems ◽

10.31799/1684-8853-2019-1-40-47 ◽

2019 ◽

pp. 40-47

Author(s):

R. A. Morozov ◽

P. V. Trifonov

Keyword(s):

Storage Capacity ◽

Low Complexity ◽

Practical Implementation ◽

Polar Codes ◽

Practical Relevance ◽

Memory Consumption ◽

Full Size ◽

The Family ◽

Erasure Probability ◽

Binary Erasure Channel

Introduction:Practical implementation of a communication system which employs a family of polar codes requires either to store a number of large specifications or to construct the codes by request. The first approach assumes extensive memory consumption, which is inappropriate for many applications, such as those for mobile devices. The second approach can be numerically unstable and hard to implement in low-end hardware. One of the solutions is specifying a family of codes by a sequence of subchannels sorted by reliability. However, this solution makes it impossible to separately optimize each code from the family.Purpose:Developing a method for compact specifications of polar codes and subcodes.Results:A method is proposed for compact specification of polar codes. It can be considered a trade-off between real-time construction and storing full-size specifications in memory. We propose to store compact specifications of polar codes which contain frozen set differences between the original pre-optimized polar codes and the polar codes constructed for a binary erasure channel with some erasure probability. Full-size specification needed for decoding can be restored from a compact one by a low-complexity hardware-friendly procedure. The proposed method can work with either polar codes or polar subcodes, allowing you to reduce the memory consumption by 15–50 times.Practical relevance:The method allows you to use families of individually optimized polar codes in devices with limited storage capacity.

Low-Complexity and High-Speed Architecture Design Methodology for Complex Square Root

Circuits Systems and Signal Processing ◽

10.1007/s00034-021-01738-1 ◽

2021 ◽

Author(s):

Suresh Mopuri ◽

Amit Acharyya

Keyword(s):

High Speed ◽

Design Methodology ◽

Low Complexity ◽

Architecture Design ◽

Square Root

Correlation of Electrical, Structural, and Optical Properties of Erbium In Silicon

MRS Proceedings ◽

10.1557/proc-301-119 ◽

1993 ◽

Vol 301 ◽

Cited By ~ 5

Author(s):

J. L. Benton ◽

D. J. Eaglesham ◽

M. Almonte ◽

P. H. Citrin ◽

M. A. Marcus ◽

...

Keyword(s):

Root Function ◽

Free Exciton ◽

Excitation Power ◽

Optically Active ◽

Photoluminescence Intensity ◽

Square Root ◽

Structural And Optical Properties ◽

X Ray ◽

X Ray Absorption ◽

Donor Activity

ABSTRACTAn understanding of the electrical, structural, and optical properites of Er in Si is necessary to evaluate this system as an opto-electronic material. Extended x-ray absorption fine structure, EXAFS, measurements of Er-implanted Si show that the optically active impurity complex is Er surrounded by an O cage of 6 atoms. The Er photoluminescence intensity is a square root function of excitation power, while the free exciton intensity increases linearly. The square root dependence of the 1.54μm-intensity is independent of measurement temperature and independent of co-implanted species. Ion-implantation of Er in Si introduces donor activity, but spreading resistance carrier concentration profiles indicate that these donors do not effect the optical activity of the Er.

Single precision arithmetic in ECHAM radiation reduces runtime and energy consumption

10.5194/gmd-2020-3 ◽

2020 ◽

Author(s):

Alessandro Cotronei ◽

Thomas Slawig

Keyword(s):

Energy Consumption ◽

Observational Data ◽

Atmospheric Model ◽

Step Change ◽

Double Precision ◽

Performance Gain ◽

Low Resolution ◽

Single Precision ◽

Speed Up ◽

Echam Model

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.

Research on IEEE 754 Standard Single Precision Floating Point Multipliers Designed using Urdhva Triyagbhyam Sutra of Vedic Mathematics

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1382.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 2990-2993

Keyword(s):

Floating Point ◽

Double Precision ◽

Single Precision ◽

Vedic Mathematics ◽

Quadruple Precision

Duplication of the coasting element numbers is the big activity in automated signal handling. So the exhibition of drifting problem multipliers count on a primary undertaking in any computerized plan. Coasting factor numbers are spoken to utilizing IEEE 754 modern day in single precision(32-bits), Double precision(sixty four-bits) and Quadruple precision(128-bits) organizations. Augmentation of those coasting component numbers can be completed via using Vedic generation. Vedic arithmetic encompass sixteen wonderful calculations or Sutras. Urdhva Triyagbhyam Sutra is most usually applied for growth of twofold numbers. This paper indicates the compare of tough work finished via exceptional specialists in the direction of the plan of IEEE 754 ultra-modern-day unmarried accuracy skimming thing multiplier the usage of Vedic technological statistics.