Exploring Shared SRAM Tables in FPGAs for Larger LUTs and Higher Degree of Sharing

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Download Full-text

High Performance Low Cost Implementation of FPGA-Based Fractional-Order Operators

Volume 6: 5th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B, and C ◽

10.1115/detc2005-84796 ◽

2005 ◽

Cited By ~ 3

Author(s):

Cindy X. Jiang ◽

Tom T. Hartley ◽

Joan E. Carletta

Keyword(s):

Fractional Order ◽

Word Length ◽

High Performance ◽

Low Cost ◽

Careful Consideration ◽

Order System ◽

System Quality ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.

Download Full-text

FaaM: FPGA-as-a-Microservice - A Case Study for Data Compression

EPJ Web of Conferences ◽

10.1051/epjconf/201921407029 ◽

2019 ◽

Vol 214 ◽

pp. 07029

Author(s):

David Ojika ◽

Ann Gordon-Ross ◽

Herman Lam ◽

Bhavesh Patel

Keyword(s):

High Performance ◽

Network Function Virtualization ◽

Communication Overhead ◽

Network Function ◽

Gate Arrays ◽

Emerging Trends ◽

Field Programmable ◽

Amazon Web Services ◽

Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present an ongoing effort to realize a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multithreaded, multi-user compression as a microservice using the FaaM architecture indicate that microservices-based FPGA acceleration can sustain high-performance compared to straightforward implementation with minimal to no communication overhead despite the hardware abstraction.

Download Full-text

A P4-Enabled RINA Interior Router for Software-Defined Data Centers

Computers ◽

10.3390/computers9030070 ◽

2020 ◽

Vol 9 (3) ◽

pp. 70

Author(s):

Carolina Fernández ◽

Sergio Giménez ◽

Eduard Grasa ◽

Steve Bunch

Keyword(s):

Integrated Circuit ◽

High Performance ◽

Data Transfer ◽

Great Promise ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific Integrated Circuit ◽

Networking Technologies ◽

Application Specific

The lack of high-performance RINA (Recursive InterNetwork Architecture) implementations to date makes it hard to experiment with RINA as an underlay networking fabric solution for different types of networks, and to assess RINA’s benefits in practice on scenarios with high traffic loads. High-performance router implementations typically require dedicated hardware support, such as FPGAs (Field Programmable Gate Arrays) or specialized ASICs (Application Specific Integrated Circuit). With the advance of hardware programmability in recent years, new possibilities unfold to prototype novel networking technologies. In particular, the use of the P4 programming language for programmable ASICs holds great promise for developing a RINA router. This paper details the design and part of the implementation of the first P4-based RINA interior router, which reuses the layer management components of the IRATI Linux-based RINA implementation and implements the data-transfer components using a P4 program. We also describe the configuration and testing of our initial deployment scenarios, using ancillary open-source tools such as the P4 reference test software switch (BMv2) or the P4Runtime API.

Download Full-text

Design and Implementation of DDS Module on FPGA

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/1141022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1296-1299

Keyword(s):

High Performance ◽

Frequency Conversion ◽

Sine Wave ◽

Direct Digital Synthesis ◽

Output Frequency ◽

Gate Arrays ◽

Construction Scheme ◽

Field Programmable ◽

Digital Synthesis ◽

Programmable Gate Arrays

he paper concerns the construction scheme of Direct Digital Synthesis (DDS) generator based on widely developed Field Programmable Gate Arrays (FPGA) technology. based on (DDS) it generates sine wave that frequency and phase is manageable is designed with direct digital synthesis(DDS) technology. It is showed that the design based on FPGA with DDS is dependable and practicable. The output wave by test reaches the essential aims, easy control and high performance. The DDS produce sinusoidal signal owns the features of modest circuit, easy to be measured, unchanging performance, high frequency conversion speed and fine accuracy etc. And its output frequency falls within the range of 0Hz ~ 150KHz with 5 Hz of steps

Download Full-text

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps49936.2021.00116 ◽

2021 ◽

Author(s):

Martin Karp ◽

Artur Podobas ◽

Niclas Jansson ◽

Tobias Kenter ◽

Christian Plessl ◽

...

Keyword(s):

High Performance ◽

Field Programmable Gate Arrays ◽

Spectral Element ◽

Future Projection ◽

Spectral Element Methods ◽

Gate Arrays ◽

Implementation Evaluation ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Enhanced Technology Mapping for FPGAs with Exploration of Cell Configurations

Journal of Circuits System and Computers ◽

10.1142/s0218126615500395 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550039 ◽

Cited By ~ 1

Author(s):

Grace Zgheib ◽

Iyad Ouaiss

Keyword(s):

State Of The Art ◽

Variable Structure ◽

Logic Circuits ◽

Technology Mapping ◽

Decomposition Techniques ◽

Gate Arrays ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays ◽

Logic Functions

In the state-of-the-art field-programmable gate arrays (FPGAs), logic circuits are synthesized and mapped on clusters of look-up tables. However, arithmetic operations benefit from an existing dedicated adder along with a carry chain used to ensure a fast carry propagation. This carry chain is a dedicated wire available in the architecture of the FPGA and is as such independent of the external programmable routing resources. In this paper, we propose a variable-structure Boolean matching technology mapper with embedded decomposition techniques to map nonarithmetic logic functions on carry chains. Previously synthesized and mapped logic functions are adapted so that their outputs are routed using the dedicated carry chains instead of the external programmable interconnects. The experimental results show a reduction in the used routing resources as well as the circuit area when using this Boolean matching-based mapper on the Altera Stratix-III FPGA.

Download Full-text

A Fast Approach for Generating Efficient Parsers on FPGAs

Symmetry ◽

10.3390/sym11101265 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1265 ◽

Cited By ~ 1

Author(s):

Zhuang Cao ◽

Huiguo Zhang ◽

Junnan Li ◽

Mei Wen ◽

Chunyuan Zhang

Keyword(s):

High Performance ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Hardware Architecture ◽

Clock Rate ◽

Gate Arrays ◽

Fast Approach ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Vhdl Code

The development of modern networking requires that high-performance network processors be designed quickly and efficiently to support new protocols. As a very important part of the processor, the parser parses the headers of the packets—this is the precondition for further processing and finally forwarding these packets. This paper presents a framework designed to transform P4 programs to VHDL and to generate parsers on Field Programmable Gate Arrays (FPGAs). The framework includes a pipeline-based hardware architecture and a back-end compiler. The hardware architecture comprises many components with varying functionality, each of which has its own optimized VHDL template. By using the output of a standard frontend P4 compiler, our proposed compiler extracts the parameters and relationships from within the used components, which can then be mapped to corresponding templates by configuring, optimizing, and instantiating them. Finally, these templates are connected to output VHDL code. When a prototype of this framework is implemented and evaluated, the results demonstrate that the throughputs of the generated parsers achieve nearly 320 Gbps at a clock rate of around 300 MHz. Compared with state-of-the-art solutions, our proposed parsers achieve an average of twice the throughput when similar amounts of resources are being used.

Download Full-text