Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants

Many low-cost platforms that support floating-point arithmetic, such as microcontrollers and field-programmable gate arrays, do not include fast hardware or software methods for calculating the square root and/or reciprocal square root. Typically, such functions are implemented using direct lookup tables or polynomial approximations, with a subsequent application of the Newton–Raphson method. Other, more complex solutions include high-radix digit-recurrence and bipartite or multipartite table-based methods. In contrast, this article proposes a simple modification of the fast inverse square root method that has high accuracy and relatively low latency. Algorithms are given in C/C++ for single- and double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions. These are based on the switching of magic constants in the initial approximation, depending on the input interval of the normalized floating-point numbers, in order to minimize the maximum relative error on each subinterval after the first iteration—giving 13 correct bits of the result. Our experimental results show that the proposed algorithms provide a fairly good trade-off between accuracy and latency after two iterations for numbers of type float, and after three iterations for numbers of type double when using fused multiply–add instructions—giving almost complete accuracy.

Download Full-text

SIMPLE EFFECTIVE FAST INVERSE SQUARE ROOT ALGORITHM WITH TWO MAGIC CONSTANTS

International Journal of Computing ◽

10.47839/ijc.18.4.1616 ◽

2019 ◽

pp. 461-470

Author(s):

Oleh Horyachyy ◽

Leonid Moroz ◽

Viktor Otenko

Keyword(s):

Computer Game ◽

Initial Approximation ◽

Floating Point ◽

Square Root ◽

Original Algorithm ◽

Gate Arrays ◽

Field Programmable ◽

Floating Point Number ◽

Programmable Gate Arrays ◽

Relative Errors

The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors. The original algorithm uses a magic constant trick with input floating-point number to obtain a clever initial approximation and then utilizes the classical iterative Newton-Raphson formula. It was first used in the computer game Quake III Arena, causing widespread discussion among scientists and programmers, and now it can be frequently found in many scientific applications, although it has some drawbacks. The proposed algorithm has such parameters of the modified inverse square root algorithm that minimize the relative error and includes two magic constants in order to avoid one floating-point multiplication. In addition, we use the fused multiply-add function and iterative methods of higher order in the second iteration to improve the accuracy. Such algorithms do not require storage of large tables for initial approximation and can be effectively used on field-programmable gate arrays (FPGAs) and other platforms without hardware support for this function.

Download Full-text

Design and Implementation of Double Precision Floating Point Division and Square Root on FPGAs

2006 IEEE Aerospace Conference ◽

10.1109/aero.2006.1655961 ◽

2006 ◽

Cited By ~ 8

Author(s):

A.J. Thakkar ◽

A. Ejnioui

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root ◽

Design And Implementation

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

Double Precision Floating Point Square Root Computation

International Journal of Engineering Trends and Technology ◽

10.14445/22315381/ijett-v13p259 ◽

2014 ◽

Vol 13 (6) ◽

pp. 294-298

Author(s):

Najib Ghatte ◽

◽

Shilpa Patil ◽

Deepak Bhoir

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root

Download Full-text

An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation

Journal of Circuits System and Computers ◽

10.1142/s0218126615501510 ◽

2015 ◽

Vol 24 (10) ◽

pp. 1550151

Author(s):

Wei Guo ◽

KwangHyok Ri ◽

Luping Cui ◽

Jizeng Wei

Keyword(s):

Logic Gates ◽

Lookup Table ◽

Floating Point ◽

Cmos Process ◽

Double Precision ◽

Square Root ◽

Area Efficiency ◽

Area Reduction ◽

Efficiency Performance ◽

Performance Area

In this paper, we propose a unified architecture for computation of double-precision floating-point division, reciprocal, square root, inverse square root and multiplication with a significant area reduction. First, a double-precision multiplication-based divider, the common datapath shared with these arithmetic computations, is optimized by a modified Goldschmidt algorithm to achieve better area efficiency. In this algorithm, a linear-degree minimax approximation instead of second-degree is used to obtain a 15-bit precision estimate of the reciprocal so that we can get a rather small lookup table (LUT) as well as reduced amount of computation when accumulating the partial products. Two Goldschmidt iterations specially designed for hardware reuse are performed to gain the final accurate result of division. By virtue of the pipelined processing, the time cost for the two iterations is minimized. Second, a reconfigurable datapath with a little extra area cost is introduced to dynamically support multiple double-precision computations by executing the optimized divider iteratively. The design is finally implemented and synthesized in SMIC 0.13-μm CMOS process. The experimental results show that the proposed design can achieve a speed of 400 MHz with area of 61.6 K logic gates and 9-Kb LUT. Compared with other works, the area efficiency (performance/area ratio) of the proposed unified architecture is increased by about 20% in average, which is a better performance-area trade-off for embedded microprocessors.

Download Full-text

Pipelining of double precision floating point division and square root operations

Proceedings of the 44th annual southeast regional conference on - ACM-SE 44 ◽

10.1145/1185448.1185555 ◽

2006 ◽

Cited By ~ 9

Author(s):

Anuja Jayraj Thakkar ◽

Abdel Ejnioui

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root

Download Full-text

A digit-set-interleaved radix-8 division/square root kernel for double-precision floating point

2010 International Symposium on System on Chip ◽

10.1109/issoc.2010.5625547 ◽

2010 ◽

Cited By ~ 2

Author(s):

Ingo Rust ◽

Tobias G. Noll

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root ◽

Digit Set

Download Full-text

A Low-Cost High Radix Floating-Point Square-Root Circuit

Electronics ◽

10.3390/electronics10161988 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1988

Author(s):

Yuheng Yang ◽

Qing Yuan ◽

Jian Liu

Keyword(s):

Power Consumption ◽

Low Cost ◽

Floating Point ◽

Square Root ◽

Single Precision ◽

Dynamic Power ◽

Low Area ◽

High Radix

In this paper, we propose an efficient architecture of floating-point square-root circuit with low area cost, which is in accordance with the IEEE-754 standard. We extend the principle of the standard SRT algorithm so that the latency and area cost of the proposed circuit are linear with the radix. In addition, no extra computation cycles are required. With 65 nm technology, the area cost of the single-precision floating-point square-root circuit based on proposed architecture is only 6450.84 μm2, and the dynamic power consumption is only 0.764 mW at 300 MHz. The implementation results show that the proposed square-root circuit can reduce the area cost by 60%~90% compared with other designs in the literature.

Download Full-text

Design of Double-Precision Floating-Point Division and Square Root Based on SRT Algorithm

Proceedings of the 2014 International Conference on Computer, Communications and Information Technology ◽

10.2991/ccit-14.2014.92 ◽

2014 ◽

Author(s):

Jiyang Chen ◽

Yuanxi Peng ◽

Yuanwu Lei ◽

Ziye Deng

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root

Download Full-text

Online Signature Verification Systems on a Low-Cost FPGA

Applied Sciences ◽

10.3390/app12010378 ◽

2021 ◽

Vol 12 (1) ◽

pp. 378

Author(s):

Enrique Cantó Navarro ◽

Rafael Ramos Lara ◽

Mariano López García

Keyword(s):

Embedded System ◽

Execution Time ◽

Low Cost ◽

Computing System ◽

Floating Point ◽

Signature Verification ◽

Double Precision ◽

Hardware Accelerator ◽

Online Signature ◽

Online Signature Verification

This paper describes three different approaches for the implementation of an online signature verification system on a low-cost FPGA. The system is based on an algorithm, which operates on real numbers using the double-precision floating-point IEEE 754 format. The double-precision computations are replaced by simpler formats, without affecting the biometrics performance, in order to permit efficient implementations on low-cost FPGA families. The first approach is an embedded system based on MicroBlaze, a 32-bit soft-core microprocessor designed for Xilinx FPGAs, which can be configured by including a single-precision floating-point unit (FPU). The second implementation attaches a hardware accelerator to the embedded system to reduce the execution time on floating-point vectors. The last approach is a custom computing system, which is built from a large set of arithmetic circuits that replace the floating-point data with a more efficient representation based on fixed-point format. The latter system provides a very high runtime acceleration factor at the expense of using a large number of FPGA resources, a complex development cycle and no flexibility since it cannot be adapted to other biometric algorithms. By contrast, the first system provides just the opposite features, while the second approach is a mixed solution between both of them. The experimental results show that both the hardware accelerator and the custom computing system reduce the execution time by a factor ×7.6 and ×201 but increase the logic FPGA resources by a factor ×2.3 and ×5.2, respectively, in comparison with the MicroBlaze embedded system.

Download Full-text