Enhanced Technology Mapping for FPGAs with Exploration of Cell Configurations

In the state-of-the-art field-programmable gate arrays (FPGAs), logic circuits are synthesized and mapped on clusters of look-up tables. However, arithmetic operations benefit from an existing dedicated adder along with a carry chain used to ensure a fast carry propagation. This carry chain is a dedicated wire available in the architecture of the FPGA and is as such independent of the external programmable routing resources. In this paper, we propose a variable-structure Boolean matching technology mapper with embedded decomposition techniques to map nonarithmetic logic functions on carry chains. Previously synthesized and mapped logic functions are adapted so that their outputs are routed using the dedicated carry chains instead of the external programmable interconnects. The experimental results show a reduction in the used routing resources as well as the circuit area when using this Boolean matching-based mapper on the Altera Stratix-III FPGA.

Download Full-text

Mixed Encoding of Collections of Output Variables for LUT-Based Mealy FSMs

Journal of Circuits System and Computers ◽

10.1142/s0218126619501317 ◽

2019 ◽

Vol 28 (08) ◽

pp. 1950131 ◽

Cited By ~ 1

Author(s):

Alexander Barkalov ◽

Larysa Titarenko ◽

Sławomir Chmielewski

Keyword(s):

Formal Method ◽

Field Programmable Gate Arrays ◽

Finite State Machines ◽

Logic Circuits ◽

State Machines ◽

Gate Arrays ◽

Finite State ◽

Field Programmable ◽

Programmable Gate Arrays

A method is proposed targeting the decrease of the number of look-up tables (LUTs) in logic circuits of field programmable gate arrays (FPGA)-based Mealy finite state machines. The method is based on constructing a partition for the set of output variables. It diminishes the number of additional variables encoding the collections of output variables (COVs). A formal method is proposed for finding the partition. An example of synthesis is given, as well as the results of investigations. The investigations were conducted for standard benchmarks.

Download Full-text

BPR-TCAM—Block and Partial Reconfiguration based TCAM on Xilinx FPGAs

Electronics ◽

10.3390/electronics9020353 ◽

2020 ◽

Vol 9 (2) ◽

pp. 353 ◽

Cited By ~ 1

Author(s):

Anees Ullah ◽

Ali Zahir ◽

Noaman A. Khan ◽

Waleed Ahmad ◽

Alexis Ramos ◽

...

Keyword(s):

Resource Utilization ◽

High Speed ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Partial Reconfiguration ◽

Gate Arrays ◽

Content Addressable Memories ◽

Field Programmable ◽

Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.

Download Full-text

Maple: A Simultaneous Technology Mapping, Placement, And Global Routing Algorithm For Field-programmable Gate Arrays

IEEE/ACM International Conference on Computer-Aided Design ◽

10.1109/iccad.1994.629759 ◽

2005 ◽

Cited By ~ 7

Author(s):

N. Togawa ◽

M. Sato ◽

T. Ohtsuki

Keyword(s):

Routing Algorithm ◽

Field Programmable Gate Arrays ◽

Global Routing ◽

Technology Mapping ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Exploring Shared SRAM Tables in FPGAs for Larger LUTs and Higher Degree of Sharing

International Journal of Reconfigurable Computing ◽

10.1155/2017/7021056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Ali Asghar ◽

Muhammad Mazher Iqbal ◽

Waqar Ahmed ◽

Mujahid Ali ◽

Husain Parvez ◽

...

Keyword(s):

High Performance ◽

Critical Path ◽

Path Delay ◽

Gate Arrays ◽

Area Reduction ◽

Area Overhead ◽

Logic Block ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Download Full-text

Design of EMB-Based Moore FSMs

Journal of Circuits System and Computers ◽

10.1142/s0218126617501250 ◽

2017 ◽

Vol 26 (07) ◽

pp. 1750125 ◽

Cited By ~ 7

Author(s):

Małgorzata Kołopieńczyk ◽

Larysa Titarenko ◽

Alexander Barkalov

Keyword(s):

Control Unit ◽

Logic Circuits ◽

Embedded Memory ◽

Complexity Of Algorithms ◽

Gate Arrays ◽

Control Units ◽

Finite State ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Effective Use

The complexity of algorithms implemented in digital systems grows. Methods are developed for most effective use of both hardware resources and energy. For engineers the problem of hardware resources optimization in design of control units is still an important issue. The standard way of implementing the control unit as a finite-state machine (FSM) is not satisfactory as it consumes considerable amounts of field-programmable gate arrays (FPGA) resources. This paper is devoted to the design of a Moore FSM in FPGA structure using look-up tables and embedded memory blocks (EMB) elements. The problem background is discussed. The method of the design of Moore FSM logic circuits with EMB based on splitting the set of logical conditions and the encoding of logical conditions is presented. Examples of design and research results are given.

Download Full-text

Technology mapping algorithm for heterogeneous field programmable gate arrays

IEE Proceedings - Computers and Digital Techniques ◽

10.1049/ip-cdt:20020748 ◽

2002 ◽

Vol 149 (6) ◽

pp. 249 ◽

Cited By ~ 2

Author(s):

Y.-T. Lai ◽

C.-C. Kao

Keyword(s):

Field Programmable Gate Arrays ◽

Technology Mapping ◽

Mapping Algorithm ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

A Fast Approach for Generating Efficient Parsers on FPGAs

Symmetry ◽

10.3390/sym11101265 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1265 ◽

Cited By ~ 1

Author(s):

Zhuang Cao ◽

Huiguo Zhang ◽

Junnan Li ◽

Mei Wen ◽

Chunyuan Zhang

Keyword(s):

High Performance ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Hardware Architecture ◽

Clock Rate ◽

Gate Arrays ◽

Fast Approach ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Vhdl Code

The development of modern networking requires that high-performance network processors be designed quickly and efficiently to support new protocols. As a very important part of the processor, the parser parses the headers of the packets—this is the precondition for further processing and finally forwarding these packets. This paper presents a framework designed to transform P4 programs to VHDL and to generate parsers on Field Programmable Gate Arrays (FPGAs). The framework includes a pipeline-based hardware architecture and a back-end compiler. The hardware architecture comprises many components with varying functionality, each of which has its own optimized VHDL template. By using the output of a standard frontend P4 compiler, our proposed compiler extracts the parameters and relationships from within the used components, which can then be mapped to corresponding templates by configuring, optimizing, and instantiating them. Finally, these templates are connected to output VHDL code. When a prototype of this framework is implemented and evaluated, the results demonstrate that the throughputs of the generated parsers achieve nearly 320 Gbps at a clock rate of around 300 MHz. Compared with state-of-the-art solutions, our proposed parsers achieve an average of twice the throughput when similar amounts of resources are being used.

Download Full-text

General technology mapping for field-programmable gate arrays based on lookup tables

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/504914.504915 ◽

2002 ◽

Vol 7 (1) ◽

pp. 1-32

Author(s):

Amit Chowdhary ◽

John P. Hayes

Keyword(s):

Field Programmable Gate Arrays ◽

Technology Mapping ◽

Gate Arrays ◽

Lookup Tables ◽

Field Programmable ◽

Programmable Gate Arrays ◽

General Technology

Download Full-text

RTN: Reparameterized Ternary Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5912 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4780-4787

Author(s):

Yuhang Li ◽

Xin Dong ◽

Sai Qian Zhang ◽

Haoli Bai ◽

Yuanpeng Chen ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Accuracy Improvement ◽

Gate Arrays ◽

Resource Limited ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Speed Up

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.

Download Full-text

Technology mapping and circuit depth optimization for field programmable gate arrays

Proceedings of IEEE Custom Integrated Circuits Conference - CICC '93 ◽

10.1109/cicc.1993.590373 ◽

2002 ◽

Author(s):

Shih-Chieh Chang ◽

M. Marek-Sadowska

Keyword(s):

Field Programmable Gate Arrays ◽

Technology Mapping ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Circuit Depth

Download Full-text