Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

Floating-point division is indispensable and becoming increasingly important in many modern applications. To improve speed performance of floating-point division in actual microprocessors, this paper proposes a low-latency architecture with a multi-precision architecture for floating-point division which will meet the IEEE-754 standard. There are three parts in the floating-point division design: pre-configuration, mantissa division, and quotient normalization. In the part of mantissa division, based on the fast division algorithm, a Predict–Correct algorithm is employed which brings about more partial quotient bits per cycle without consuming too much circuit area. Detailed analysis is presented to support the guaranteed accuracy per cycle with no restriction to specific parameters. In the synthesis using TSMC, 90 nm standard cell library, the results show that the proposed architecture has ≈63.6% latency, ≈30.23% total time (latency × period), ≈31.8% total energy (power × latency × period), and ≈44.6% efficient average energy (power × latency × period/efficient length) overhead over the latest floating-point division structure. In terms of latency, the proposed division architecture is much faster than several classic processors.

Download Full-text

Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions sinhx and coshx Based on Improved CORDIC Algorithm

Electronics ◽

10.3390/electronics10202533 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2533

Author(s):

Wenjia Fu ◽

Jincheng Xia ◽

Xu Lin ◽

Ming Liu ◽

Mingjiang Wang

Keyword(s):

Total Energy ◽

High Precision ◽

Hardware Implementation ◽

Low Cost ◽

Latency Period ◽

Low Latency ◽

Cordic Algorithm ◽

Hyperbolic Functions ◽

Area Efficiency ◽

Cell Library

CORDIC algorithm is used for low-cost hardware implementation to calculate transcendental functions. This paper proposes a low-latency high-precision architecture for the computation of hyperbolic functions sinhx and coshx based on an improved CORDIC algorithm, that is, the QH-CORDIC. The principle, structure, and range of convergence of the QH-CORDIC are discussed, and the hardware circuit architecture of functions sinhx and coshx using the QH-CORDIC is plotted in this paper. The proposed architecture is implemented using an FPGA device, showing that it has 75% and 50% latency overhead over the two latest prior works. In the synthesis using TSMC 65 nm standard cell library, ASIC implementation results show that the proposed architecture is also superior to the two latest prior works in terms of total time (latency × period), ATP (area × total time), total energy (power × total time), energy efficiency (total energy/efficient bits), and area efficiency (efficient bits/area/total time). Comparison of related works indicates that it is much more favorable for the proposed architecture to perform high-precision floating-point computations on functions sinhx and coshx than the LUT method, stochastic computing, and other CORDIC algorithms.

Download Full-text

Power and Variation Improved Near-Vt Standard Cell Library for 28-nm FDSOI

2019 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S) ◽

10.1109/s3s46989.2019.9320687 ◽

2019 ◽

Author(s):

Wing-Tsi Wong ◽

Kamlesh Singh ◽

Jos Huisken ◽

Jose Pineda de Gyvez

Keyword(s):

Standard Cell ◽

Standard Cell Library ◽

Cell Library ◽

28 Nm

Download Full-text

Superconductor Standard Cell Library for Advanced EDA Design Flow

IEEE Transactions on Applied Superconductivity ◽

10.1109/tasc.2021.3061024 ◽

2021 ◽

pp. 1-1

Author(s):

Sukanya Sagarika Meher ◽

Jushya Ravi ◽

Mustafa Eren Celik ◽

Stephen Miller ◽

Anubhav Sahu ◽

...

Keyword(s):

Design Flow ◽

Standard Cell ◽

Standard Cell Library ◽

Cell Library

Download Full-text

NMLib: A Nanomagnetic Logic Standard Cell Library

2021 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas51556.2021.9401107 ◽

2021 ◽

Author(s):

Laysson Oliveira Luz ◽

Jose Augusto M. Nacif ◽

Ricardo S. Ferreira ◽

Omar P. Vilela Neto

Keyword(s):

Standard Cell ◽

Standard Cell Library ◽

Nanomagnetic Logic ◽

Cell Library

Download Full-text

Reduction of LSI Maximum Power Consumption with Standard Cell Library of Stack Structured Cells

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.2021vlp0014 ◽

2021 ◽

Author(s):

Yuki IMAI ◽

Shinichi NISHIZAWA ◽

Kazuhito ITO

Keyword(s):

Power Consumption ◽

Maximum Power ◽

Standard Cell ◽

Standard Cell Library ◽

Cell Library

Download Full-text

Organic-Flow: An Open-Source Organic Standard Cell Library and Process Development Kit

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) ◽

10.23919/date48585.2020.9116540 ◽

2020 ◽

Author(s):

Ting-Jung Chang ◽

Zhuozhi Yao ◽

Barry P. Rand ◽

David Wentzlaff

Keyword(s):

Open Source ◽

Process Development ◽

Standard Cell ◽

Standard Cell Library ◽

Cell Library

Download Full-text

Energy-Efficient Hardware Architectures for the Packet Data Convergence Protocol in LTE-Advanced Mobile Terminals

VLSI Design ◽

10.1155/2013/369627 ◽

2013 ◽

Vol 2013 ◽

pp. 1-15

Author(s):

Shadi Traboulsi ◽

Valerio Frascolla ◽

Nils Pohl ◽

Josef Hausner ◽

Attila Bilgic

Keyword(s):

Power Consumption ◽

Energy Efficient ◽

Reference Architecture ◽

Least Significant Bit ◽

Mobile Terminals ◽

Standard Cell Library ◽

Cell Library ◽

Lte Advanced ◽

Packet Data ◽

Hardware Architectures

In this paper, we present and compare efficient low-power hardware architectures for accelerating the Packet Data Convergence Protocol (PDCP) in LTE and LTE-Advanced mobile terminals. Specifically, our work proposes the design of two cores: a crypto engine for the Evolved Packet System Encryption Algorithm (128-EEA2) that is based on the AES cipher and a coprocessor for the Least Significant Bit (LSB) encoding mechanism of the Robust Header Compression (ROHC) algorithm. With respect to the former, first we propose a reference architecture, which reflects a basic implementation of the algorithm, then we identify area and power bottle-necks in the design and finally we introduce and compare several architectures targeting the most power-consuming operations. With respect to the LSB coprocessor, we propose a novel implementation based on a one-hot encoding, thereby reducing hardware’s logic switching rate. Architectural hardware analysis is performed using Faraday’s 90 nm standard-cell library. The obtained results, when compared against the reference architecture, show that these novel architectures achieve significant improvements, namely, 25% in area and 35% in power consumption for the 128-EEA2 crypto-core, and even more important reductions are seen for the LSB coprocessor, that is, 36% in area and 50% in power consumption.

Download Full-text

Standard Cell Library Enhancement For Mixed Multi-Height Cell Design Implementation

10.1109/ewdts52692.2021.9581045 ◽

2021 ◽

Author(s):

Suren Abazyan

Keyword(s):

Cell Design ◽

Standard Cell ◽

Standard Cell Library ◽

Cell Library

Download Full-text

A high flexibility BiCMOS standard cell library for mixed analogue-digital ASICs

Analogue-digital ASICs: circuit techniques, design tools and applications ◽

10.1049/pbcs003e_ch9 ◽

2011 ◽

pp. 197-212

Author(s):

Christian Caillon

Keyword(s):

Standard Cell ◽

Standard Cell Library ◽

Cell Library ◽

High Flexibility

Download Full-text

Approximate Full Adders for Energy Efficient Image Processing Applications

Journal of Circuits System and Computers ◽

10.1142/s0218126621502352 ◽

2021 ◽

pp. 2150235

Author(s):

M. C. Parameshwara

Keyword(s):

Energy Efficient ◽

State Of The Art ◽

Signal To Noise Ratio ◽

Full Adder ◽

Signal To Noise ◽

Design Metrics ◽

Standard Cell Library ◽

Cell Library ◽

Fair Comparison ◽

Power Delay Product

This paper proposes six novel approximate 1-bit full adders (AFAs) for inexact computing. The six novel AFAs namely AFA1, AFA2, AFA3, AFA4, AFA5, and AFA6 are derived from state-of-the-art exact 1-bit full adder (EFA) architectures. The performance of these AFAs is compared with reported AFAs (RAAs) in terms of design metrics (DMs) and peak-signal-to-noise-ratio (PSNR). The DMs under consideration are power, delay, power-delay-product (PDP), energy-delay-product (EDP), and area. For a fair comparison, the EFAs and proposed AFAs along with RAAs are described in Verilog, simulated, and synthesized using Cadences’ RC tool, using generic 180 nm standard cell library. The unconstrained synthesis results show that: among all the proposed AFAs, the AFA1 and AFA2 are found to be energy-efficient adders with high PSNR. The AFA1 has a total [Formula: see text][Formula: see text][Formula: see text]W, [Formula: see text][Formula: see text]ps, [Formula: see text][Formula: see text]fJ, [Formula: see text][Formula: see text]Js, [Formula: see text][Formula: see text][Formula: see text]m2, and [Formula: see text][Formula: see text]dB. And the AFA2 has the total [Formula: see text][Formula: see text][Formula: see text]W, [Formula: see text][Formula: see text]ps, [Formula: see text][Formula: see text]fJ, [Formula: see text][Formula: see text]Js, [Formula: see text][Formula: see text][Formula: see text]m2, and [Formula: see text][Formula: see text]dB.

Download Full-text