Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions sinhx and coshx Based on Improved CORDIC Algorithm

CORDIC algorithm is used for low-cost hardware implementation to calculate transcendental functions. This paper proposes a low-latency high-precision architecture for the computation of hyperbolic functions sinhx and coshx based on an improved CORDIC algorithm, that is, the QH-CORDIC. The principle, structure, and range of convergence of the QH-CORDIC are discussed, and the hardware circuit architecture of functions sinhx and coshx using the QH-CORDIC is plotted in this paper. The proposed architecture is implemented using an FPGA device, showing that it has 75% and 50% latency overhead over the two latest prior works. In the synthesis using TSMC 65 nm standard cell library, ASIC implementation results show that the proposed architecture is also superior to the two latest prior works in terms of total time (latency × period), ATP (area × total time), total energy (power × total time), energy efficiency (total energy/efficient bits), and area efficiency (efficient bits/area/total time). Comparison of related works indicates that it is much more favorable for the proposed architecture to perform high-precision floating-point computations on functions sinhx and coshx than the LUT method, stochastic computing, and other CORDIC algorithms.

Download Full-text

Design of DDS Based on Improved CORDIC Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.588-589.727 ◽

2012 ◽

Vol 588-589 ◽

pp. 727-730

Author(s):

Zong Yao Liu ◽

Wei Hua Zhu ◽

Zhen Hua Qu

Keyword(s):

Parallel Computing ◽

High Precision ◽

Hardware Implementation ◽

Cordic Algorithm ◽

Traditional Algorithm ◽

Multi Level ◽

Simulation Results ◽

Computing Method ◽

Computation Speed ◽

Computing Speed

For the shortcomings that computation speed of the DDS decreases with iterations increasing in CORDIC algorithm., the traditional algorithm of multiple iterations is displaced by a point of decompose predict the direction of rotation and multi-level iterative parallel computing method in this paper. The function simulation results show that the improved algorithm enhance the computation speed and maintain data high precision. This design has high computing speed, high precision and simple hardware implementation etc.

Download Full-text

Low-Latency and Minor-Error Architecture for Parallel Computing XY-Like Functions with High-Precision Floating-Point Inputs

Electronics ◽

10.3390/electronics11010069 ◽

2021 ◽

Vol 11 (1) ◽

pp. 69

Author(s):

Ming Liu ◽

Wenjia Fu ◽

Jincheng Xia

Keyword(s):

High Precision ◽

Relative Error ◽

State Of The Art ◽

Latency Period ◽

Limited Range ◽

The State ◽

Floating Point ◽

Cordic Algorithm ◽

Maximum Relative Error ◽

Large Area

This paper proposes a novel architecture for the computation of XY-like functions based on the QH CORDIC (Quadruple-Step-Ahead Hyperbolic Coordinate Rotation Digital Computer) methodology. The proposed architecture converts direct computing of function XY to logarithm, multiplication, and exponent operations. The QH CORDIC methodology is a parallel variant of the traditional CORDIC algorithm. Traditional CORDIC suffers from long latency and large area, while the QH CORDIC has much lower latency. The computation of functions lnx and ex is accomplished with the QH CORDIC. To solve the problem of the limited range of convergence of the QH CORDIC, this paper employs two specific techniques to enlarge the range of convergence for functions lnx and ex, making it possible to deal with high-precision floating-point inputs. Hardware modeling of function XY using the QH CORDIC is plotted in this paper. Under the TSMC 65 nm standard cell library, this paper designs and synthesizes a reference circuit. The ASIC implementation results show that the proposed architecture has 30 more orders of magnitude of maximum relative error and average relative error than the state-of-the-art. On top of that, the proposed architecture is also superior to the state-of-the-art in terms of latency, word length and energy efficiency (power × latency × period /efficient bits).

Download Full-text

Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

Applied Sciences ◽

10.3390/app11114988 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4988

Author(s):

Jincheng Xia ◽

Wenjia Fu ◽

Ming Liu ◽

Mingjiang Wang

Keyword(s):

Average Energy ◽

Latency Period ◽

Floating Point ◽

Low Latency ◽

Partial Quotient ◽

Guaranteed Accuracy ◽

Standard Cell Library ◽

Cell Library ◽

Speed Performance ◽

Time Latency

Floating-point division is indispensable and becoming increasingly important in many modern applications. To improve speed performance of floating-point division in actual microprocessors, this paper proposes a low-latency architecture with a multi-precision architecture for floating-point division which will meet the IEEE-754 standard. There are three parts in the floating-point division design: pre-configuration, mantissa division, and quotient normalization. In the part of mantissa division, based on the fast division algorithm, a Predict–Correct algorithm is employed which brings about more partial quotient bits per cycle without consuming too much circuit area. Detailed analysis is presented to support the guaranteed accuracy per cycle with no restriction to specific parameters. In the synthesis using TSMC, 90 nm standard cell library, the results show that the proposed architecture has ≈63.6% latency, ≈30.23% total time (latency × period), ≈31.8% total energy (power × latency × period), and ≈44.6% efficient average energy (power × latency × period/efficient length) overhead over the latest floating-point division structure. In terms of latency, the proposed division architecture is much faster than several classic processors.

Download Full-text