VLSI Architecture of S-box with High Area Efficiency Based on Composite Field Arithmetic

This paper presents a novel parallel quasi-cyclic low-density parity-check (QC-LDPC) encoding algorithm with low complexity, which is compatible with the 5th generation (5G) new radio (NR). Basing on the algorithm, we propose a high area-efficient parallel encoder with compatible architecture. The proposed encoder has the advantages of parallel encoding and pipelined operations. Furthermore, it is designed as a configurable encoding structure, which is fully compatible with different base graphs of 5G LDPC. Thus, the encoder architecture has flexible adaptability for various 5G LDPC codes. The proposed encoder was synthesized in a 65 nm CMOS technology. According to the encoder architecture, we implemented nine encoders for distributed lifting sizes of two base graphs. The eperimental results show that the encoder has high performance and significant area-efficiency, which is better than related prior art. This work includes a whole set of encoding algorithm and the compatible encoders, which are fully compatible with different base graphs of 5G LDPC codes. Therefore, it has more flexible adaptability for various 5G application scenarios.

Download Full-text

RiSA: A Reinforced Systolic Array for Depthwise Convolutions and Embedded Tensor Reshaping

ACM Transactions on Embedded Computing Systems ◽

10.1145/3476984 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-20

Author(s):

Hyungmin Cho

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Language Processing ◽

Systolic Array ◽

Data Reuse ◽

Systolic Arrays ◽

High Data ◽

Area Efficiency ◽

High Area ◽

Accelerator Design

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.

Download Full-text

A floating high-voltage level-shifter with high area efficiency for biomedical implants

2016 12th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME) ◽

10.1109/prime.2016.7519451 ◽

2016 ◽

Cited By ~ 1

Author(s):

Michael Haas ◽

Maurits Ortmanns

Keyword(s):

High Voltage ◽

Biomedical Implants ◽

Voltage Level ◽

Area Efficiency ◽

High Area ◽

Level Shifter

Download Full-text

A scalable ASIP for BP Polar decoding with multiple code lengths

MATEC Web of Conferences ◽

10.1051/matecconf/201823201046 ◽

2018 ◽

Vol 232 ◽

pp. 01046

Author(s):

Wan Qiao ◽

Dake Liu

Keyword(s):

Cmos Technology ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Maximum Throughput ◽

Specific Instruction ◽

Area Efficiency ◽

Multiple Data ◽

High Area ◽

Multiple Code ◽

Application Specific

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.

Download Full-text

Area and Power Potent VLSI Architecture for Modified CSLA with A Logic Optimization Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3051.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2873-2879 ◽

Cited By ~ 1

Keyword(s):

Optimization Technique ◽

Vlsi Architecture ◽

Performance Constraints ◽

Area Efficiency ◽

Logic Optimization ◽

Ripple Carry Adder ◽

Overall Performance ◽

Carry Select Adder ◽

The Cost ◽

Area Efficient

Through generations, in an endeavor to pioneer innovative circuit designs, an adder is having the greatest importance as it is a basic building block to decide the system’s overall performance. Wide varieties of adders are used for a plethora of applications in the field of Signal Processing and VLSI systems. Most predominantly used Speed efficient architecture for performing n-bit addition in VLSI applications is Square Root Carry Select Adder (SQRT-CSLA) as it pre-computes the carry and sum by assuming input carry as ‘zero’ and ‘one’. But the overall area usage is high as it uses more number of full adders when compared to Ripple Carry Adder. Though, the existing adder designing techniques are area efficient, there is still scope to achieve area efficiency as area decides the cost of the VLSI Systems. Not only area-efficient but also power potent architectures are required to accelerate the overall performance of the VLSI systems. To meet these objectives, this paper proposes an efficient VLSI architecture for carry select adder by using logic optimization technique addressing performance constraints. The proposed architecture is designed and implemented using cadence encounter tool for different data widths ranging from 16 bits to 128 bits. The performance of the proposed 128-bit architecture achieves an area improvement of 63.43% and a power improvement of 71.00923% when compared to 128-bit SQRT-CSLA architecture

Download Full-text

All-directional Electrostatic-discharge Protection Circuit with High Area-efficiency

JSTS Journal of Semiconductor Technology and Science ◽

10.5573/jsts.2021.21.4.270 ◽

2021 ◽

Vol 21 (4) ◽

pp. 270-278

Author(s):

Kyoung-Il Do ◽

Byung-Seok Lee ◽

Seung-Hoo Jin ◽

Yong-Seo Koo

Keyword(s):

Electrostatic Discharge ◽

Area Efficiency ◽

High Area ◽

Protection Circuit

Download Full-text

High speed and high-area efficiency non-volatile look-up table design based on magnetic tunnel junction

2017 17th Non-Volatile Memory Technology Symposium (NVMTS) ◽

10.1109/nvmts.2017.8171280 ◽

2017 ◽

Cited By ~ 1

Author(s):

Rana Alhalabi ◽

Gregory Di Pendina ◽

Ioan-lucian Prejbeanu ◽

Etienne Nowak

Keyword(s):

Tunnel Junction ◽

High Speed ◽

Magnetic Tunnel Junction ◽

Area Efficiency ◽

Look Up Table ◽

High Area

Download Full-text

Power-Rail ESD Clamp Circuit With Ultralow Standby Leakage Current and High Area Efficiency in Nanometer CMOS Technology

IEEE Transactions on Electron Devices ◽

10.1109/ted.2012.2209120 ◽

2012 ◽

Vol 59 (10) ◽

pp. 2626-2634 ◽

Cited By ~ 4

Author(s):

Chih-Ting Yeh ◽

Ming-Dou Ker

Keyword(s):

Leakage Current ◽

Cmos Technology ◽

Area Efficiency ◽

Nanometer Cmos ◽

High Area

Download Full-text

Design and Implementation of FIR Filter using Efficient MAC

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7341.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 878-881

Keyword(s):

Finite Impulse Response ◽

Optimization Technique ◽

Vlsi Architecture ◽

Building Blocks ◽

Fir Filter ◽

Total Power ◽

Area Efficiency ◽

Logic Optimization ◽

Gate Count ◽

Impulse Response Filter

The Design And Realization Of Efficient Multiplication And Accumulation Unit (MAC) Of A Digital FIR Filter Has Substantial Influence In Designing A Well-Organized Finite Impulse Response Filter As It Is Used To Compute The Filter Response. Area Efficiency In An FIR Filter Can Be Achieved By Reducing The Gate Count Of Either Multiplier Unit Or An Adder Unit Or Both The Units Since They Are The Basic Building Blocks Of FIR Filter. This Paper Presents A VLSI Architecture For A 4-Tap FIR Filter Which Is Designed By Using Efficient Adder And A Multiplier Employing Logic Optimization Technique. Area For MAC Based FIR Filter Employing Vedic-CSLALOT Is Improved By 11.959% When Compared To Hierarchy-SQRT-CSLA. Total Power For MAC Based FIR Filter Employing Vedic-CSLALOT Is Improved By 13.15% As Against To Hierarchy-SQRT-CSLA.

Download Full-text

A Novel Memristor-Reusable Mapping Methodology of In-memory Logic Implementation for High Area-Efficiency

2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) ◽

10.1109/nanoarch47378.2019.181207 ◽

2019 ◽

Author(s):

Yongjie Lu ◽

Yanan Sun ◽

Weifeng He ◽

Zhigang Mao

Keyword(s):

Area Efficiency ◽

High Area

Download Full-text