scholarly journals Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

2021 ◽  
Vol 11 (11) ◽  
pp. 4988
Author(s):  
Jincheng Xia ◽  
Wenjia Fu ◽  
Ming Liu ◽  
Mingjiang Wang

Floating-point division is indispensable and becoming increasingly important in many modern applications. To improve speed performance of floating-point division in actual microprocessors, this paper proposes a low-latency architecture with a multi-precision architecture for floating-point division which will meet the IEEE-754 standard. There are three parts in the floating-point division design: pre-configuration, mantissa division, and quotient normalization. In the part of mantissa division, based on the fast division algorithm, a Predict–Correct algorithm is employed which brings about more partial quotient bits per cycle without consuming too much circuit area. Detailed analysis is presented to support the guaranteed accuracy per cycle with no restriction to specific parameters. In the synthesis using TSMC, 90 nm standard cell library, the results show that the proposed architecture has ≈63.6% latency, ≈30.23% total time (latency × period), ≈31.8% total energy (power × latency × period), and ≈44.6% efficient average energy (power × latency × period/efficient length) overhead over the latest floating-point division structure. In terms of latency, the proposed division architecture is much faster than several classic processors.

Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2533
Author(s):  
Wenjia Fu ◽  
Jincheng Xia ◽  
Xu Lin ◽  
Ming Liu ◽  
Mingjiang Wang

CORDIC algorithm is used for low-cost hardware implementation to calculate transcendental functions. This paper proposes a low-latency high-precision architecture for the computation of hyperbolic functions sinhx and coshx based on an improved CORDIC algorithm, that is, the QH-CORDIC. The principle, structure, and range of convergence of the QH-CORDIC are discussed, and the hardware circuit architecture of functions sinhx and coshx using the QH-CORDIC is plotted in this paper. The proposed architecture is implemented using an FPGA device, showing that it has 75% and 50% latency overhead over the two latest prior works. In the synthesis using TSMC 65 nm standard cell library, ASIC implementation results show that the proposed architecture is also superior to the two latest prior works in terms of total time (latency × period), ATP (area × total time), total energy (power × total time), energy efficiency (total energy/efficient bits), and area efficiency (efficient bits/area/total time). Comparison of related works indicates that it is much more favorable for the proposed architecture to perform high-precision floating-point computations on functions sinhx and coshx than the LUT method, stochastic computing, and other CORDIC algorithms.


Author(s):  
Sukanya Sagarika Meher ◽  
Jushya Ravi ◽  
Mustafa Eren Celik ◽  
Stephen Miller ◽  
Anubhav Sahu ◽  
...  

Author(s):  
Laysson Oliveira Luz ◽  
Jose Augusto M. Nacif ◽  
Ricardo S. Ferreira ◽  
Omar P. Vilela Neto

VLSI Design ◽  
2013 ◽  
Vol 2013 ◽  
pp. 1-15
Author(s):  
Shadi Traboulsi ◽  
Valerio Frascolla ◽  
Nils Pohl ◽  
Josef Hausner ◽  
Attila Bilgic

In this paper, we present and compare efficient low-power hardware architectures for accelerating the Packet Data Convergence Protocol (PDCP) in LTE and LTE-Advanced mobile terminals. Specifically, our work proposes the design of two cores: a crypto engine for the Evolved Packet System Encryption Algorithm (128-EEA2) that is based on the AES cipher and a coprocessor for the Least Significant Bit (LSB) encoding mechanism of the Robust Header Compression (ROHC) algorithm. With respect to the former, first we propose a reference architecture, which reflects a basic implementation of the algorithm, then we identify area and power bottle-necks in the design and finally we introduce and compare several architectures targeting the most power-consuming operations. With respect to the LSB coprocessor, we propose a novel implementation based on a one-hot encoding, thereby reducing hardware’s logic switching rate. Architectural hardware analysis is performed using Faraday’s 90 nm standard-cell library. The obtained results, when compared against the reference architecture, show that these novel architectures achieve significant improvements, namely, 25% in area and 35% in power consumption for the 128-EEA2 crypto-core, and even more important reductions are seen for the LSB coprocessor, that is, 36% in area and 50% in power consumption.


Author(s):  
M. C. Parameshwara

This paper proposes six novel approximate 1-bit full adders (AFAs) for inexact computing. The six novel AFAs namely AFA1, AFA2, AFA3, AFA4, AFA5, and AFA6 are derived from state-of-the-art exact 1-bit full adder (EFA) architectures. The performance of these AFAs is compared with reported AFAs (RAAs) in terms of design metrics (DMs) and peak-signal-to-noise-ratio (PSNR). The DMs under consideration are power, delay, power-delay-product (PDP), energy-delay-product (EDP), and area. For a fair comparison, the EFAs and proposed AFAs along with RAAs are described in Verilog, simulated, and synthesized using Cadences’ RC tool, using generic 180 nm standard cell library. The unconstrained synthesis results show that: among all the proposed AFAs, the AFA1 and AFA2 are found to be energy-efficient adders with high PSNR. The AFA1 has a total [Formula: see text][Formula: see text][Formula: see text]W, [Formula: see text][Formula: see text]ps, [Formula: see text][Formula: see text]fJ, [Formula: see text][Formula: see text]Js, [Formula: see text][Formula: see text][Formula: see text]m2, and [Formula: see text][Formula: see text]dB. And the AFA2 has the total [Formula: see text][Formula: see text][Formula: see text]W, [Formula: see text][Formula: see text]ps, [Formula: see text][Formula: see text]fJ, [Formula: see text][Formula: see text]Js, [Formula: see text][Formula: see text][Formula: see text]m2, and [Formula: see text][Formula: see text]dB.


Sign in / Sign up

Export Citation Format

Share Document