A P4-Enabled RINA Interior Router for Software-Defined Data Centers

Carolina Fernández; Sergio Giménez; Eduard Grasa; Steve Bunch

doi:10.3390/computers9030070

A P4-Enabled RINA Interior Router for Software-Defined Data Centers

Computers ◽

10.3390/computers9030070 ◽

2020 ◽

Vol 9 (3) ◽

pp. 70

Author(s):

Carolina Fernández ◽

Sergio Giménez ◽

Eduard Grasa ◽

Steve Bunch

Keyword(s):

Integrated Circuit ◽

High Performance ◽

Data Transfer ◽

Great Promise ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific Integrated Circuit ◽

Networking Technologies ◽

Application Specific

The lack of high-performance RINA (Recursive InterNetwork Architecture) implementations to date makes it hard to experiment with RINA as an underlay networking fabric solution for different types of networks, and to assess RINA’s benefits in practice on scenarios with high traffic loads. High-performance router implementations typically require dedicated hardware support, such as FPGAs (Field Programmable Gate Arrays) or specialized ASICs (Application Specific Integrated Circuit). With the advance of hardware programmability in recent years, new possibilities unfold to prototype novel networking technologies. In particular, the use of the P4 programming language for programmable ASICs holds great promise for developing a RINA router. This paper details the design and part of the implementation of the first P4-based RINA interior router, which reuses the layer management components of the IRATI Linux-based RINA implementation and implements the data-transfer components using a P4 program. We also describe the configuration and testing of our initial deployment scenarios, using ancillary open-source tools such as the P4 reference test software switch (BMv2) or the P4Runtime API.

Download Full-text

An efficient look up table based approximate adder for field programmable gate array

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp144-151 ◽

2022 ◽

Vol 25 (1) ◽

pp. 144

Author(s):

Hadise Ramezani ◽

Majid Mohammadi ◽

Amir Sabbagh Molahosseini

Keyword(s):

Integrated Circuits ◽

High Performance ◽

Efficient Implementation ◽

Approximate Computing ◽

Gaussian Filter ◽

Gate Arrays ◽

Output Quality ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific

The approximate computing is an alternative computing approach which can lead to high-performance implementation of audio and image processing as well as deep learning applications. However, most of the available approximate adders have been designed using application specific integrated circuits (ASICs), and they would not result in an efficient implementation on field programmable gate arrays (FPGAs). In this paper, we have designed a new approximate adder customized for efficient implementation on FPGAs, and then it has been used to build the Gaussian filter. The experimental results of the implementation of Gaussian filter based on the proposed approximate adder on a Virtex-7 FPGA, indicated that the resource utilization has decreased by 20-51%, and the designed filter delay based on the modified design methodology for building approximate adders for FPGA-based systems (MDeMAS) adder has improved 10-35%, due to the obtained output quality.

Download Full-text

Σχεδιασμός αρχιτεκτονικών και απεικόνιση εφαρμογών σε επαναδιαμορφούμενες πλατφόρμες με εργαλεία λογισμικού

10.12681/eadd/38607 ◽

2016 ◽

Author(s):

Χαράλαμπος Σιδηρόπουλος

Keyword(s):

Integrated Circuits ◽

High Performance ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip ◽

High Level ◽

Application Specific

Τα τελευταία χρόνια, οι επαναδιαμορφούμενες αρχιτεκτονικές και πιο συγκεκριμένα τα Field Programmable Gate Arrays (FPGAs) έχουν γίνει βιώσιμες εναλλακτικές λύσεις στην θέση των Application Specific Integrated Circuits (ASICs). Το χαρακτηριστικό της τεχνολογίας των FPGAs είναι ότι υποστηρίζουν υλοποίηση εφαρμογών μέσω της κατάλληλης (επανα)διαμόρφωσης της λειτουργικότητας των πόρων υλικού. Αυτό επιτρέπει στα FPGAs να παρέχουν μεγαλύτερη ευελιξία, να βοηθούν στην ταχεία κατασκευή πρωτοτύπων για προϊόντα και να μειώνουν σημαντικά τα non-recurring engineering (NRE) κόστη, σε σύγκριση με τις ASIC Συσκευές.Τα χαρακτηριστικά και οι δυνατότητες των αρχιτεκτονικών αυτών έχουν αλλάξει και έχουν βελτιωθεί σημαντικά τις τελευταίες δύο δεκαετίες. Από συστοιχίες Look-Up tables (LUT), έχουμε φτάσει σε ετερογενείς συσκευές που ενσωματώνουν μια σειρά από στοιχεία υλικού (π.χ., LUTs με διαφορετικά μεγέθη, μικροεπεξεργαστές, DSP και RAM μπλοκ κλπ.). Η λογική δομή ενός FPGA έχει αλλάξει σταδιακά από μια ομοιογενή και τακτική αρχιτεκτονική σε μια ετερογενή System on Chip (SoC) συσκευή. Η πολυπλοκότητα των σημερινών εφαρμογών εισάγει συνήθως περιορισμούς στην αρχιτεκτονική οργάνωση των FPGA. Ακόμη και αν η ζήτηση για επιπλέον πόρους λογικής ικανοποιείται με πλατφόρμες που αποτελούνται από πιο πολύπλοκα λογικά μπλοκ, ή CLBs, (π.χ. με περισσότερα LUTs), το πρόβλημα αυτό εξακολουθεί να υφίσταται με τις πιο απαιτητικές σε θέμα επικοινωνίας εφαρμογές (π.χ. τηλεπικοινωνίες, κρυπτογράφηση και την επεξεργασία εικόνας, βίντεο), δεδομένου ότι η απόδοσή τους εξαρτάται συνήθως από τη διαθεσιμότητα σε I/O bandwidth.H παρούσα διδακτορική διατριβή διερευνεί τις προκλήσεις και προτείνει νέες λύσεις στο πεδίο της απεικόνισης (mapping) μιας εφαρμογής σε Field Programmable Gate Arrays. Ο στόχος είναι να σκιαγραφηθούν και να αναλυθούν, τα εμπόδια που περιορίζουν την αποδοτικότητα της διαδικασίας απεικόνισης και να προταθούν νέες λύσεις με στόχο την αύξησή της. Προς αυτόν τον στόχο αναπτύχθηκε μια καινοτόμα μεθοδολογία η οποία επιτρέπει την ταχεία διερεύνηση σε επίπεδο αρχιτεκτονικής διαφορετικών οργανώσεων και ιεραρχιών μνήμης, σε ετερογενή FPGAs. Παράλληλα με την μεθοδολογία αναπτύχθηκε και ένα λογισμικό πλαίσιο που υποστηρίζει την απεικόνιση μιας εφαρμογής πάνω στις προαναφερθείσες αρχιτεκτονικές. Το προτεινόμενο πλαίσιο επιτρέπει την διερεύνηση ιεραρχιών οποιουδήποτε τύπου αρχιτεκτονικού μπλοκ, όχι μόνο μνημών. Πάνω στο θέμα των αρχιτεκτονικών, για την άμβλυνση του προβλήματος του I/O bandwidth που εμφανίζεται σε πιο πολύπλοκες εφαρμογές και για την αύξηση των επιδόσεων γενικά προτάθηκε ένα νέο τριδιάστατο αρχιτεκτονικό πρότυπο FPGA. Η τριδιάστατη αυτή αρχιτεκτονική αποτελείται από ετερογενή στρώματα, σε αντίθεση με προηγούμενες προσεγγίσεις όπου κάθε στρώμα είναι αντίγραφο του προηγουμένου.Το case study που χρησιμοποιείται αποτελείται από τρία στρώματα, σε καθένα εκ των οποίων τοποθετείται ξεχωριστά η λογική, η μνήμη, και τα I/O μπλοκ. Η επιλογή τριών στρωμάτων με τα συγκεκριμένα αρχιτεκτονικά στοιχεία δεν περιορίζει την γενικότητα της προτεινόμενης λύσης. Επιπρόσθετα αναπτύχθηκε το κατάλληλο λογισμικό πλαίσιο που υποστηρίζει την διερεύνηση τέτοιων αρχιτεκτονικών και την απεικόνιση εφαρμογών πάνω σε τέτοιες επαναδιαμορφούμενες αρχιτεκτονικές.Εκτός από τις γνωστές προκλήσεις στο φυσικό επίπεδο που οφείλονται στην συρρίκνωση των τρανζίστορ, η αυξημένη πολυπλοκότητα των εφαρμογών αλλά και της αρχιτεκτονικής των FPGAs, καθιστά την αποτελεσματικότητα και την αποδοτικότητα των CAD εργαλείων που χρησιμοποιούνται ακόμη πιο κρίσιμες. Οι τεχνικές που επιταχύνουν τους βασικούς αλγόριθμους CAD μπορούν να επιφέρουν σημαντικές αλλαγές στο χρόνο σχεδιασμού ενός προϊόντος, ενώ πολλοί σχεδιαστές μπορεί να είναι πρόθυμοι να δεχτούν μικρή υποβάθμιση στην ποιότητα της λύσης με αντάλλαγμα ένα βελτιωμένο χρόνο εκτέλεσης των εργαλείων CAD. Προκειμένου να ενταχθούν αποτελεσματικά σε αυτό το νέο τοπίο, τα FPGAs πρέπει να υποστηρίζουν ταχεία ανάπτυξη και απεικόνιση εφαρμογών. Η βιομηχανία έχει κάνει βήματα για την ταχύτερη ανάπτυξη εφαρμογών, εξερευνώντας ποικίλες λύσεις, όπως High Level Synthesis (HLS). Τα FPGAs έχουν διερευνηθεί ως μια βιώσιμη πλατφόρμα για διάφορες εφαρμογές High Performance Computing (HPC) και ενσωματωμένων συστημάτων κυρίως λόγω του εγγενούς παραλληλισμού και της δυνατότητας επαναπρογραμματισμού που μπορεί να εφαρμοστεί είτε στο σχεδιασμό ή το χρόνο εκτέλεσης.Για την αντιμετώπιση αυτών των περιορισμών σε αυτή την διδακτορική διατριβή εισάγεται μια νέα μεθοδολογία που έχει ως στόχο την ταχεία απεικόνιση εφαρμογών σε FPGAs. Ο στόχος αυτής της προσέγγισης είναι να μειωθεί σημαντικά ο χρόνος εκτέλεσης χωρίς ταυτόχρονα να υποβαθμιστούν σημαντικά οι επιδόσεις της εφαρμογής. Για τον ίδιο σκοπό, αναπτύχθηκε μια μεθοδολογία cloud και το αντίστοιχο λογισμικό πλαίσιο προκειμένου να καταστεί δυνατή η αποτελεσματική απεικόνιση πολλαπλών εφαρμογών κατά το χρόνο εκτέλεσης σε ένα ή περισσότερα FPGAs. Η προτεινόμενη λύση άρει τα προαναφερθέντα προβλήματα προσφέροντας γρήγορους χρόνους εκτέλεσης και επιτρέποντας να κλιμακωθεί η διαδικασία της απεικόνισης σε πολλούς πυρήνες.Προκειμένου να αξιοποιηθούν τα FPGAs σε ένα δυναμικό περιβάλλον προτάθηκε μια νέα μεθοδολογία και τα απαραίτητα εργαλεία που επιτρέπουν την αποδοτική απεικόνιση πολλαπλών εφαρμογών σε ετερογενή FPGAs. Με τη χρήση δυναμικών εικονικών πυρήνων, προσαρμοσμένων κατανεμητών μνήμης και βελτιστοποιήσεις στην διαχείριση μνήμης, ξεπεράστηκαν οι περιορισμοί που επιβάλλονται από τα CAD εργαλεία και αποδείχτηκε θεωρητικά ότι η απεικόνιση εφαρμογών σε FPGAs μπορεί να γίνεται κατά τον χρόνο εκτέλεσης ακόμα και σε ενσωματωμένα συστήματα.

Download Full-text

High Performance Low Cost Implementation of FPGA-Based Fractional-Order Operators

Volume 6: 5th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B, and C ◽

10.1115/detc2005-84796 ◽

2005 ◽

Cited By ~ 3

Author(s):

Cindy X. Jiang ◽

Tom T. Hartley ◽

Joan E. Carletta

Keyword(s):

Fractional Order ◽

Word Length ◽

High Performance ◽

Low Cost ◽

Careful Consideration ◽

Order System ◽

System Quality ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.

Download Full-text

Soft Core Processor Generated Based on the Machine Code of the Application

Journal of Circuits System and Computers ◽

10.1142/s0218126616500298 ◽

2016 ◽

Vol 25 (04) ◽

pp. 1650029 ◽

Cited By ~ 11

Author(s):

Adam Ziebinski ◽

Stanwlaw Swierc

Keyword(s):

Embedded System ◽

Soft Core ◽

Machine Code ◽

Correct Operation ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Instruction Set Extensions ◽

The Cost ◽

Application Specific

Currently embedded system designs aim to improve areas such as speed, energy efficiency and the cost of an application. Application-specific instruction set extensions on reconfigurable hardware provide such opportunities. The article presents a new approach for generating soft core processors that are optimized for specific tasks. In this work, we describe an automatic method for selecting custom instructions for generating software core processors that are based on the machine code of the application program. As the result, a soft core processor will contain the logic that is absolutely necessary. This solution requires fewer gates to be synthesized in the field programmable gate arrays (FPGA) and has a potential to increase the speed of the information processing that is performed by the system in the target FPGA. Experiments have confirmed the correct operation of the method that was used. After the reduction mechanism was enabled, the total number of slices blocks that were occupied decreased to 47% of its initial value in the best case for the Xilinx Spartan3 (xc3s200) and the maximum frequency increased approximately 44% in the best case for Xilinx Spartan6 (xc6slx4).

Download Full-text

FPGAs in Client Compute Hardware

10.36227/techrxiv.13604699.v1 ◽

2021 ◽

Author(s):

Michael Mattioli

Keyword(s):

Integrated Circuits ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Application Specific Integrated Circuits ◽

Field Programmable ◽

Programmable Gate Arrays ◽

And Performance ◽

Application Specific

<div>Field-programmable gate arrays (FPGAs) are remarkably versatile. FPGAs are used in a wide variety of applications and industries where use of application-specific integrated circuits (ASICs) is less economically feasible. Despite the area, cost, and power challenges designers face when integrating FPGAs into devices, they provide significant security and performance benefits. Many of these benefits can be realized in client compute hardware such as laptops, tablets, and smartphones.</div>

Download Full-text

FaaM: FPGA-as-a-Microservice - A Case Study for Data Compression

EPJ Web of Conferences ◽

10.1051/epjconf/201921407029 ◽

2019 ◽

Vol 214 ◽

pp. 07029

Author(s):

David Ojika ◽

Ann Gordon-Ross ◽

Herman Lam ◽

Bhavesh Patel

Keyword(s):

High Performance ◽

Network Function Virtualization ◽

Communication Overhead ◽

Network Function ◽

Gate Arrays ◽

Emerging Trends ◽

Field Programmable ◽

Amazon Web Services ◽

Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present an ongoing effort to realize a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multithreaded, multi-user compression as a microservice using the FaaM architecture indicate that microservices-based FPGA acceleration can sustain high-performance compared to straightforward implementation with minimal to no communication overhead despite the hardware abstraction.

Download Full-text

Exploring Shared SRAM Tables in FPGAs for Larger LUTs and Higher Degree of Sharing

International Journal of Reconfigurable Computing ◽

10.1155/2017/7021056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Ali Asghar ◽

Muhammad Mazher Iqbal ◽

Waqar Ahmed ◽

Mujahid Ali ◽

Husain Parvez ◽

...

Keyword(s):

High Performance ◽

Critical Path ◽

Path Delay ◽

Gate Arrays ◽

Area Reduction ◽

Area Overhead ◽

Logic Block ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Download Full-text

Sparse Cholesky Factorization on FPGA Using Parameterized Model

Mathematical Problems in Engineering ◽

10.1155/2017/3021591 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11

Author(s):

Yichun Sun ◽

Hengzhu Liu ◽

Tong Zhou

Keyword(s):

Integrated Circuit ◽

Sparse Matrix ◽

Fundamental Problem ◽

Performance Model ◽

Cholesky Factorization ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific Integrated Circuit ◽

Parameterized Model ◽

And Performance

Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components.

Download Full-text

High-performance fieldbus application-specific integrated circuit design for industrial smart sensor networks

The Journal of Supercomputing ◽

10.1007/s11227-017-2010-1 ◽

2017 ◽

Vol 74 (9) ◽

pp. 4451-4469 ◽

Cited By ~ 1

Author(s):

Ching-Han Chen ◽

Ming-Yi Lin ◽

Xing-Chen Guo

Keyword(s):

Sensor Networks ◽

Circuit Design ◽

Integrated Circuit ◽

High Performance ◽

Smart Sensor ◽

Integrated Circuit Design ◽

Application Specific Integrated Circuit ◽

Application Specific

Download Full-text

Design and Implementation of DDS Module on FPGA

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/1141022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1296-1299

Keyword(s):

High Performance ◽

Frequency Conversion ◽

Sine Wave ◽

Direct Digital Synthesis ◽

Output Frequency ◽

Gate Arrays ◽

Construction Scheme ◽

Field Programmable ◽

Digital Synthesis ◽

Programmable Gate Arrays

he paper concerns the construction scheme of Direct Digital Synthesis (DDS) generator based on widely developed Field Programmable Gate Arrays (FPGA) technology. based on (DDS) it generates sine wave that frequency and phase is manageable is designed with direct digital synthesis(DDS) technology. It is showed that the design based on FPGA with DDS is dependable and practicable. The output wave by test reaches the essential aims, easy control and high performance. The DDS produce sinusoidal signal owns the features of modest circuit, easy to be measured, unchanging performance, high frequency conversion speed and fine accuracy etc. And its output frequency falls within the range of 0Hz ~ 150KHz with 5 Hz of steps

Download Full-text