scholarly journals T-Count Optimized Quantum Circuit Designs for Single-Precision Floating-Point Division

Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 703
Author(s):  
S. S. Gayathri ◽  
R. Kumar ◽  
Samiappan Dhanalakshmi ◽  
Gerard Dooly ◽  
Dinesh Babu Duraibabu

The implementation of quantum computing processors for scientific applications includes quantum floating points circuits for arithmetic operations. This work adopts the standard division algorithms for floating-point numbers with restoring, non-restoring, and Goldschmidt division algorithms for single-precision inputs. The design proposals are carried out while using the quantum Clifford+T gates set, and resource estimates in terms of numbers of qubits, T-count, and T-depth are provided for the proposed circuits. By improving the leading zero detector (LZD) unit structure, the proposed division circuits show a significant reduction in the T-count when compared to the existing works on floating-point division.

2001 ◽  
Vol 01 (02) ◽  
pp. 217-230 ◽  
Author(s):  
M. GAVRILOVA ◽  
J. ROKNE

The main result of the paper is a new and efficient algorithm to compute the closest possible representable intersection point between two lines in the plane. The coordinates of the points that define the lines are given as single precision floating-point numbers. The novelty of the algorithm is the method for deriving the best possible representable floating point numbers: instead of solving the equations to compute the line intersection coordinates exactly, which is a computationally expensive procedure, an iterative binary search procedure is applied. When the required precision is achieved, the algorithm stops. Only exact comparison tests are needed. Interval arithmetic is applied to further speed up the process. Experimental results demonstrate that the proposed algorithm is on the average ten times faster than an implementation of the line intersection computation subroutine using the CORE library exact arithmetic.


Author(s):  
Cezary J. Walczyk ◽  
Leonid V. Moroz ◽  
Jan L. Cieśliński

We present an improved algorithm for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithm is much more accurate than the famous fast inverse square root algorithm and has a similar computational cost. The presented modification concern Newton-Raphson corrections and can be applied when the distribution of these corrections is not symmetric (for instance, in our case they are always negative).


2018 ◽  
Author(s):  
Matheus M. Susin ◽  
Lucas Wanner

In this work, we compared the precision, speed, and power consumption of the reciprocal square root of a single-precision floating point number, using different approximation techniques. We also devised an equivalent approximation for half-precision floating point numbers, and evaluated its performance across the whole range of positive non-zero 16-bit floating point values.


Geophysics ◽  
2020 ◽  
Vol 85 (3) ◽  
pp. F65-F76
Author(s):  
Gabriel Fabien-Ouellet

New processors are increasingly supporting half-precision floating-point numbers, often with a significant throughput gain over single-precision operations. Seismic modeling, imaging, and inversion could benefit from such an acceleration, but it is not obvious how the accuracy of the solution can be preserved with a very narrow 16-bit representation. By scaling the finite-difference expression of the isotropic elastic wave equation, we have found that a stable solution can be obtained despite the very narrow dynamic range of the half-precision format.We develop an implementation with the CUDA platform, which, on most recent graphics processing units (GPU), is nearly twice as fast and uses half the memory of the equivalent single-precision version. The error on seismograms caused by the reduced precision is shown to correspond to a negligible fraction of the total seismic energy and is mostly incoherent with seismic phases. Finally, we find that this noise does not adversely impact full-waveform inversion nor reverse time migration, which both benefit from the higher throughput of half-precision computation.


2020 ◽  
Vol 33 (109) ◽  
pp. 21-31
Author(s):  
І. Ya. Zeleneva ◽  
Т. V. Golub ◽  
T. S. Diachuk ◽  
А. Ye. Didenko

The purpose of these studies is to develop an effective structure and internal functional blocks of a digital computing device – an adder, that performs addition and subtraction operations on floating- point numbers presented in IEEE Std 754TM-2008 format. To improve the characteristics of the adder, the circuit uses conveying, that is, division into levels, each of which performs a specific action on numbers. This allows you to perform addition / subtraction operations on several numbers at the same time, which increas- es the performance of calculations, and also makes the adder suitable for use in modern synchronous cir- cuits. Each block of the conveyor structure of the adder on FPGA is synthesized as a separate project of a digital functional unit, and thus, the overall task is divided into separate subtasks, which facilitates experi- mental testing and phased debugging of the entire device. Experimental studies were performed using EDA Quartus II. The developed circuit was modeled on FPGAs of the Stratix III and Cyclone III family. An ana- logue of the developed circuit was a functionally similar device from Altera. A comparative analysis is made and reasoned conclusions are drawn that the performance improvement is achieved due to the conveyor structure of the adder. Implementation of arithmetic over the floating-point numbers on programmable logic integrated cir- cuits, in particular on FPGA, has such advantages as flexibility of use and low production costs, and also provides the opportunity to solve problems for which there are no ready-made solutions in the form of stand- ard devices presented on the market. The developed adder has a wide scope, since most modern computing devices need to process floating-point numbers. The proposed conveyor model of the adder is quite simple to implement on the FPGA and can be an alternative to using built-in multipliers and processor cores in cases where the complex functionality of these devices is redundant for a specific task.


2016 ◽  
Vol 51 (1) ◽  
pp. 555-567
Author(s):  
Marc Andrysco ◽  
Ranjit Jhala ◽  
Sorin Lerner

Sign in / Sign up

Export Citation Format

Share Document