scholarly journals Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants

Computation ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 21 ◽  
Author(s):  
Leonid V. Moroz ◽  
Volodymyr V. Samotyy ◽  
Oleh Y. Horyachyy

Many low-cost platforms that support floating-point arithmetic, such as microcontrollers and field-programmable gate arrays, do not include fast hardware or software methods for calculating the square root and/or reciprocal square root. Typically, such functions are implemented using direct lookup tables or polynomial approximations, with a subsequent application of the Newton–Raphson method. Other, more complex solutions include high-radix digit-recurrence and bipartite or multipartite table-based methods. In contrast, this article proposes a simple modification of the fast inverse square root method that has high accuracy and relatively low latency. Algorithms are given in C/C++ for single- and double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions. These are based on the switching of magic constants in the initial approximation, depending on the input interval of the normalized floating-point numbers, in order to minimize the maximum relative error on each subinterval after the first iteration—giving 13 correct bits of the result. Our experimental results show that the proposed algorithms provide a fairly good trade-off between accuracy and latency after two iterations for numbers of type float, and after three iterations for numbers of type double when using fused multiply–add instructions—giving almost complete accuracy.

2019 ◽  
pp. 461-470
Author(s):  
Oleh Horyachyy ◽  
Leonid Moroz ◽  
Viktor Otenko

The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors. The original algorithm uses a magic constant trick with input floating-point number to obtain a clever initial approximation and then utilizes the classical iterative Newton-Raphson formula. It was first used in the computer game Quake III Arena, causing widespread discussion among scientists and programmers, and now it can be frequently found in many scientific applications, although it has some drawbacks. The proposed algorithm has such parameters of the modified inverse square root algorithm that minimize the relative error and includes two magic constants in order to avoid one floating-point multiplication. In addition, we use the fused multiply-add function and iterative methods of higher order in the second iteration to improve the accuracy. Such algorithms do not require storage of large tables for initial approximation and can be effectively used on field-programmable gate arrays (FPGAs) and other platforms without hardware support for this function.


2014 ◽  
Vol 550 ◽  
pp. 126-136
Author(s):  
N. Ramya Rani

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.


Author(s):  
Najib Ghatte ◽  
◽  
Shilpa Patil ◽  
Deepak Bhoir

2015 ◽  
Vol 24 (10) ◽  
pp. 1550151
Author(s):  
Wei Guo ◽  
KwangHyok Ri ◽  
Luping Cui ◽  
Jizeng Wei

In this paper, we propose a unified architecture for computation of double-precision floating-point division, reciprocal, square root, inverse square root and multiplication with a significant area reduction. First, a double-precision multiplication-based divider, the common datapath shared with these arithmetic computations, is optimized by a modified Goldschmidt algorithm to achieve better area efficiency. In this algorithm, a linear-degree minimax approximation instead of second-degree is used to obtain a 15-bit precision estimate of the reciprocal so that we can get a rather small lookup table (LUT) as well as reduced amount of computation when accumulating the partial products. Two Goldschmidt iterations specially designed for hardware reuse are performed to gain the final accurate result of division. By virtue of the pipelined processing, the time cost for the two iterations is minimized. Second, a reconfigurable datapath with a little extra area cost is introduced to dynamically support multiple double-precision computations by executing the optimized divider iteratively. The design is finally implemented and synthesized in SMIC 0.13-μm CMOS process. The experimental results show that the proposed design can achieve a speed of 400 MHz with area of 61.6 K logic gates and 9-Kb LUT. Compared with other works, the area efficiency (performance/area ratio) of the proposed unified architecture is increased by about 20% in average, which is a better performance-area trade-off for embedded microprocessors.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1988
Author(s):  
Yuheng Yang ◽  
Qing Yuan ◽  
Jian Liu

In this paper, we propose an efficient architecture of floating-point square-root circuit with low area cost, which is in accordance with the IEEE-754 standard. We extend the principle of the standard SRT algorithm so that the latency and area cost of the proposed circuit are linear with the radix. In addition, no extra computation cycles are required. With 65 nm technology, the area cost of the single-precision floating-point square-root circuit based on proposed architecture is only 6450.84 μm2, and the dynamic power consumption is only 0.764 mW at 300 MHz. The implementation results show that the proposed square-root circuit can reduce the area cost by 60%~90% compared with other designs in the literature.


2021 ◽  
Vol 12 (1) ◽  
pp. 378
Author(s):  
Enrique Cantó Navarro ◽  
Rafael Ramos Lara ◽  
Mariano López García

This paper describes three different approaches for the implementation of an online signature verification system on a low-cost FPGA. The system is based on an algorithm, which operates on real numbers using the double-precision floating-point IEEE 754 format. The double-precision computations are replaced by simpler formats, without affecting the biometrics performance, in order to permit efficient implementations on low-cost FPGA families. The first approach is an embedded system based on MicroBlaze, a 32-bit soft-core microprocessor designed for Xilinx FPGAs, which can be configured by including a single-precision floating-point unit (FPU). The second implementation attaches a hardware accelerator to the embedded system to reduce the execution time on floating-point vectors. The last approach is a custom computing system, which is built from a large set of arithmetic circuits that replace the floating-point data with a more efficient representation based on fixed-point format. The latter system provides a very high runtime acceleration factor at the expense of using a large number of FPGA resources, a complex development cycle and no flexibility since it cannot be adapted to other biometric algorithms. By contrast, the first system provides just the opposite features, while the second approach is a mixed solution between both of them. The experimental results show that both the hardware accelerator and the custom computing system reduce the execution time by a factor ×7.6 and ×201 but increase the logic FPGA resources by a factor ×2.3 and ×5.2, respectively, in comparison with the MicroBlaze embedded system.


Sign in / Sign up

Export Citation Format

Share Document