High Performance Low Cost Implementation of FPGA-Based Fractional-Order Operators

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.

Download Full-text

FaaM: FPGA-as-a-Microservice - A Case Study for Data Compression

EPJ Web of Conferences ◽

10.1051/epjconf/201921407029 ◽

2019 ◽

Vol 214 ◽

pp. 07029

Author(s):

David Ojika ◽

Ann Gordon-Ross ◽

Herman Lam ◽

Bhavesh Patel

Keyword(s):

High Performance ◽

Network Function Virtualization ◽

Communication Overhead ◽

Network Function ◽

Gate Arrays ◽

Emerging Trends ◽

Field Programmable ◽

Amazon Web Services ◽

Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present an ongoing effort to realize a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multithreaded, multi-user compression as a microservice using the FaaM architecture indicate that microservices-based FPGA acceleration can sustain high-performance compared to straightforward implementation with minimal to no communication overhead despite the hardware abstraction.

Download Full-text

Implementation of Fractional-order Operators on Field Programmable Gate Arrays

Advances in Fractional Calculus ◽

10.1007/978-1-4020-6042-7_23 ◽

2007 ◽

pp. 333-346 ◽

Cited By ~ 18

Author(s):

Cindy X. Jiang ◽

Joan E. Carletta ◽

Tom T. Hartley

Keyword(s):

Fractional Order ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

A P4-Enabled RINA Interior Router for Software-Defined Data Centers

Computers ◽

10.3390/computers9030070 ◽

2020 ◽

Vol 9 (3) ◽

pp. 70

Author(s):

Carolina Fernández ◽

Sergio Giménez ◽

Eduard Grasa ◽

Steve Bunch

Keyword(s):

Integrated Circuit ◽

High Performance ◽

Data Transfer ◽

Great Promise ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific Integrated Circuit ◽

Networking Technologies ◽

Application Specific

The lack of high-performance RINA (Recursive InterNetwork Architecture) implementations to date makes it hard to experiment with RINA as an underlay networking fabric solution for different types of networks, and to assess RINA’s benefits in practice on scenarios with high traffic loads. High-performance router implementations typically require dedicated hardware support, such as FPGAs (Field Programmable Gate Arrays) or specialized ASICs (Application Specific Integrated Circuit). With the advance of hardware programmability in recent years, new possibilities unfold to prototype novel networking technologies. In particular, the use of the P4 programming language for programmable ASICs holds great promise for developing a RINA router. This paper details the design and part of the implementation of the first P4-based RINA interior router, which reuses the layer management components of the IRATI Linux-based RINA implementation and implements the data-transfer components using a P4 program. We also describe the configuration and testing of our initial deployment scenarios, using ancillary open-source tools such as the P4 reference test software switch (BMv2) or the P4Runtime API.

Download Full-text

A low-cost remote laboratory of field programmable gate arrays

Proceedings of 2015 12th International Conference on Remote Engineering and Virtual Instrumentation (REV) ◽

10.1109/rev.2015.7087286 ◽

2015 ◽

Cited By ~ 1

Author(s):

Huy Nguyen Quang ◽

Thang Manh Hoang

Keyword(s):

Low Cost ◽

Field Programmable Gate Arrays ◽

Remote Laboratory ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Exploring Shared SRAM Tables in FPGAs for Larger LUTs and Higher Degree of Sharing

International Journal of Reconfigurable Computing ◽

10.1155/2017/7021056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Ali Asghar ◽

Muhammad Mazher Iqbal ◽

Waqar Ahmed ◽

Mujahid Ali ◽

Husain Parvez ◽

...

Keyword(s):

High Performance ◽

Critical Path ◽

Path Delay ◽

Gate Arrays ◽

Area Reduction ◽

Area Overhead ◽

Logic Block ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Download Full-text

Design and Implementation of DDS Module on FPGA

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/1141022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1296-1299

Keyword(s):

High Performance ◽

Frequency Conversion ◽

Sine Wave ◽

Direct Digital Synthesis ◽

Output Frequency ◽

Gate Arrays ◽

Construction Scheme ◽

Field Programmable ◽

Digital Synthesis ◽

Programmable Gate Arrays

he paper concerns the construction scheme of Direct Digital Synthesis (DDS) generator based on widely developed Field Programmable Gate Arrays (FPGA) technology. based on (DDS) it generates sine wave that frequency and phase is manageable is designed with direct digital synthesis(DDS) technology. It is showed that the design based on FPGA with DDS is dependable and practicable. The output wave by test reaches the essential aims, easy control and high performance. The DDS produce sinusoidal signal owns the features of modest circuit, easy to be measured, unchanging performance, high frequency conversion speed and fine accuracy etc. And its output frequency falls within the range of 0Hz ~ 150KHz with 5 Hz of steps

Download Full-text

Comparison of reconfigurable structures for flexible word-length multiplication

Advances in Radio Science ◽

10.5194/ars-6-113-2008 ◽

2008 ◽

Vol 6 ◽

pp. 113-118 ◽

Cited By ~ 2

Author(s):

O. A. Pfänder ◽

R. Nopper ◽

H.-J. Pfleiderer ◽

S. Zhou ◽

A. Bermak

Keyword(s):

Power Dissipation ◽

Word Length ◽

Digital Signal ◽

Gate Arrays ◽

Computing Unit ◽

Field Programmable ◽

Embedded Blocks ◽

Programmable Gate Arrays ◽

Array Structures ◽

Reconfigurable Structures

Abstract. Binary multiplication continues to be one of the essential arithmetic operations in digital circuits. Even though field-programmable gate arrays (FPGAs) are becoming more and more powerful these days, the vendors cannot avoid implementing multiplications with high word-lengths using embedded blocks instead of configurable logic. But on the other hand, the circuit's efficiency decreases if the provided word-length of the hard-wired multipliers exceeds the precision requirements of the algorithm mapped into the FPGA. Thus it is beneficial to use multiplier blocks with configurable word-length, optimized for area, speed and power dissipation, e.g. regarding digital signal processing (DSP) applications. In this contribution, we present different approaches and structures for the realization of a multiplication with variable precision and perform an objective comparison. This includes one approach based on a modified Baugh and Wooley algorithm and three structures using Booth's arithmetic operand recoding with different array structures. All modules have the option to compute signed two's complement fix-point numbers either as an individual computing unit or interconnected to a superior array. Therefore, a high throughput at low precision through parallelism, or a high precision through concatenation can be achieved.

Download Full-text

A High Resolution Vernier Digital-to-Time Converter Implemented with 65 nm FPGA

Applied Sciences ◽

10.3390/app9132705 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2705

Author(s):

Chenggang Yan ◽

Chen Hu ◽

Jianhui Wu

Keyword(s):

High Resolution ◽

Low Cost ◽

Phase Locked Loops ◽

Least Significant Bit ◽

Gate Arrays ◽

Differential Nonlinearity ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Fpga Chip ◽

Fine Resolution

In this paper, a digital-to-time converter (DTC) based on the three delay lines (3D) Vernier principle is proposed and implemented with field programmable gate arrays (FPGAs). Based on the 3D Vernier principle, the DTC is realized by three period approximate phase locked loops (PLLs). The theoretical fine resolution of the proposed DTC is improved by calculating the period difference two times. The achieved resolution of the proposed DTC is 203 fs realized with an Altera Stratix III FPGA chip, which is about tenfold higher than traditional FPGA-DTC implemented with the same series FPGAs. The worst absolute differential nonlinearity (DNL) and integral nonlinearity (INL) are verified smaller than 0.88 least significant bit (LSB) and 4.4 LSB, respectively. By optimized computation logic, there are only 448 adaptive look-up-tables (ALUTs), 237 registers and three phase locked loops (PLLs) utilized for circuit implementation. Experimental results prove that the proposed DTC features high resolution with low cost.

Download Full-text