High Speed Implementation of a SHA-3 Core on Virtex-5 and Virtex-6 FPGAs

This work presents a novel technique for a high-speed implementation of the newly selected cryptographic hash function, Secure Hash Algorithm-3 (SHA-3) on Xilinx’s Virtex-5 and Virtex-6 Field Programmable Gate Arrays (FPGAs). The proposed technique consists of a two-phase implementation approach. In the first phase, all steps of the SHA-3 core are logically combined, which helps to eliminate the intermediate states of core function, these states utilize more area and also slow the execution. The second phase deals with the hardware implementation of the first phase equations using Xilinx Look-Up-Table (LUT) primitives. This two phase implementation technique results in a throughput of 19.241[Formula: see text]Gbps on a Virtex-6 FPGA; this is the highest reported throughput to date for an FPGA implementation of SHA-3. This high throughput makes this technique ideally suited for the provision of Bump In The Wire (BITW) security for Internet of Things (IoT) applications.

Download Full-text

Layout Aware Optimization of High Speed Fixed Coefficient FIR Filters for FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2010/697625 ◽

2010 ◽

Vol 2010 ◽

pp. 1-17 ◽

Cited By ~ 17

Author(s):

Shahnam Mirzaei ◽

Ryan Kastner ◽

Anup Hosangadi

Keyword(s):

High Speed ◽

Finite Impulse Response ◽

Fir Filters ◽

Second Phase ◽

Distributed Arithmetic ◽

Gate Arrays ◽

Length Estimation ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Coefficient Multipliers

We present a method for implementing high speed finite impulse response (FIR) filters on field programmable gate arrays (FPGAs). Our algorithm is a multiplierless technique where fixed coefficient multipliers are replaced with a series of add and shift operations. The first phase of our algorithm uses registered adders and hardwired shifts. Here, a modified common subexpression elimination (CSE) algorithm reduces the number of adders while maintaining performance. The second phase optimizes routing delay using prelayout wire length estimation techniques to improve the final placed and routed design. The optimization target platforms are Xilinx Virtex FPGA devices where we compare the implementation results with those produced by Xilinx Coregen, which is based on distributed arithmetic (DA). We observed up to 50% reduction in the number of slices and up to 75% reduction in the number of look up tables (LUTs) for fully parallel implementations compared to DA method. Also, there is 50% reduction in the total dynamic power consumption of the filters. Our designs perform up to 27% faster than the multiply accumulate (MAC) filters implemented by Xilinx Coregen tool using DSP blocks. For placement, there is a saving up to 20% in number of routing channels. This results in lower congestion and up to 8% reduction in average wirelength.

Download Full-text

BPR-TCAM—Block and Partial Reconfiguration based TCAM on Xilinx FPGAs

Electronics ◽

10.3390/electronics9020353 ◽

2020 ◽

Vol 9 (2) ◽

pp. 353 ◽

Cited By ~ 1

Author(s):

Anees Ullah ◽

Ali Zahir ◽

Noaman A. Khan ◽

Waleed Ahmad ◽

Alexis Ramos ◽

...

Keyword(s):

Resource Utilization ◽

High Speed ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Partial Reconfiguration ◽

Gate Arrays ◽

Content Addressable Memories ◽

Field Programmable ◽

Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.

Download Full-text

Nonvolatile Nanoelectromechanical Memory Switches for Low-Power and High-Speed Field-Programmable Gate Arrays

IEEE Transactions on Electron Devices ◽

10.1109/ted.2014.2380992 ◽

2015 ◽

Vol 62 (2) ◽

pp. 673-679 ◽

Cited By ~ 14

Author(s):

Yong Jun Kim ◽

Woo Young Choi

Keyword(s):

Low Power ◽

High Speed ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

A Comparison of Filtering Approaches Using Low-Speed DACs for Hardware-in-the-Loop Implemented in FPGAs

Electronics ◽

10.3390/electronics8101116 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1116 ◽

Cited By ~ 4

Author(s):

Yushkova ◽

Sanchez ◽

de Castro ◽

Martínez-García

Keyword(s):

High Speed ◽

Hardware In The Loop ◽

Low Speed ◽

Digital To Analog Converters ◽

Gate Arrays ◽

Simulation Techniques ◽

Input Signals ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Input Waveform

The use of Hardware-in-the-Loop (HIL) systems implemented in Field Programmable Gate Arrays (FPGAs) is constantly increasing because of its advantages compared to traditional simulation techniques. This increase in usage has caused new challenges related to the improvement of their performance and features like the number of output channels, while the price of HIL systems is diminishing. At present, the use of low-speed Digital-to-Analog Converters (DACs) is starting to be a commercial possibility because of two reasons. One is their lower price and the other is their lower pin count, which determines the number and price of the FPGAs that are necessary to handle those DACs. This paper compares four filtering approaches for providing suitable data to low-speed DACs, which help to filter high-speed input signals, discarding the need of using expensive high-speed DACS, and therefore decreasing the total cost of HIL implementations. Results show that the selection of the appropriate filter should be based on the type of the input waveform and the relative importance of the dynamics versus the area.

Download Full-text

FPGA TECHNOLOGY IN PROCESS TOMOGRAPHY

Jurnal Teknologi ◽

10.11113/jt.v78.9431 ◽

2016 ◽

Vol 78 (7-4) ◽

Author(s):

Lean Thiam Siow ◽

Mohd Hafiz Fazalul Rahiman ◽

Ruzairi Abdul Rahim ◽

Mohd Shukry Abdul Majid ◽

Salman Sayyidi Hamzah ◽

...

Keyword(s):

High Speed ◽

Low Cost ◽

Gate Arrays ◽

Tomography System ◽

Process Tomography ◽

Ultrasonic Process ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Future Work

The aims of this paper are to provide a review of the process tomography applications employing field programmable gate arrays (FPGA) and to understand current FPGA related researches, in order to seek for the possibility to applied FPGA technology in an ultrasonic process tomography system. FPGA allows users to implement complete systems on a programmable chip, meanwhile, five main benefits of applying the FPGA technology are performance, time to market, cost, reliability, and long-term maintenance. These advantages definitely could help in the revolution of process tomography, especially for ultrasonic process tomography and electrical process tomography. Future work is focused on the ultrasonic process tomography for chemical process column investigation using FPGA for the aspects of low cost, high speed and reconstructed image quality.

Download Full-text

A highly reliable metal-to-metal antifuse for high-speed field programmable gate arrays

Proceedings of IEEE International Electron Devices Meeting ◽

10.1109/iedm.1993.347405 ◽

2002 ◽

Cited By ~ 10

Author(s):

M.T. Takagi ◽

I. Yoshii ◽

N. Ikeda ◽

H. Yasuda ◽

K. Hama

Keyword(s):

High Speed ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Evaluation of the Different Numerical Formats for HIL Models of Power Converters after the Adoption of VHDL-2008 by Xilinx

Electronics ◽

10.3390/electronics10161952 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1952

Author(s):

Eva M. Cirugeda-Roldán ◽

María Sofía Martínez-García ◽

Alberto Sanchez ◽

Angel de Castro

Keyword(s):

High Speed ◽

Power Converters ◽

Low Cost ◽

Hardware In The Loop ◽

Main Design ◽

Gate Arrays ◽

The World ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Area Optimization

Hardware in the loop is a widely used technique in power electronics, allowing to test and debug in real time (RT) at a low cost. In this context, field-programmable gate arrays (FPGAs) play an important role due to the high-speed requirements of RT simulations, in which area optimization is also crucial. Both characteristics, area and speed, are affected by the numerical formats (NFs) and their rounding modes. Regarding FPGAs, Xilinx is one of the largest manufacturers in the world, offering Vivado as its main design suite, but it was not until the release of Vivado 2020.2 that support for the IEEE NF libraries of VHDL-2008 was included. This work presents an exhaustive evaluation of the performance of Vivado 2020.2 in terms of area and speed using the native IEEE libraries of VHDL-2008 regarding NF. Results show that even though fixed-point NFs optimize area and speed, if a user prefers the use of floating-point NFs, with this new release, it can be synthesized—which could not be done in previous versions of Vivado. Although support for the native IEEE libraries of VHDL-2008 was included in Vivado 2020.2, it still lacks some issues regarding NF conversion during synthesis while support for simulation is not yet included.

Download Full-text

High-performance FPGA implementation of the secure hash algorithm 3 for single and multi-message processing

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1324-1333 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1324

Author(s):

Fatimazahraa Assad ◽

Mohamed Fettach ◽

Fadwa El Otmani ◽

Abderrahim Tragha

Keyword(s):

Integrated Circuits ◽

High Speed ◽

High Performance ◽

Message Processing ◽

Maximum Throughput ◽

Design Data ◽

Hash Algorithm ◽

Field Programmable ◽

Secure Hash Algorithm ◽

Hardware Description

<span>The secure hash function has become the default choice for information security, especially in applications that require data storing or manipulation. Consequently, optimized implementations of these functions in terms of Throughput or Area are in high demand. In this work we propose a new conception of the secure hash algorithm 3 (SHA-3), which aim to increase the performance of this function by using pipelining, four types of pipelining are proposed two, three, four, and six pipelining stages. This approach allows us to design data paths of SHA-3 with higher Throughput and higher clock frequencies. The design reaches a maximum Throughput of 102.98 Gbps on Virtex 5 and 115.124 Gbps on Virtex 6 in the case of the 6 stages, for 512 bits output length. Although the utilization of the resource increase with the increase of the number of the cores used in each one of the cases. The proposed designs are coded in very high-speed integrated circuits program (VHSIC) hardware description language (VHDL) and implemented in Xilinx Virtex-5 and Virtex-6 A field-programmable gate array (FPGA) devices and compared to existing FPGA implementations.</span>

Download Full-text

Ψηφιακή σχεδίαση και υλοποίηση κάρτας κρυπτογράφησης για μυστική επικοινωνία βασισμένη σε FPGA τεχνολογία

10.12681/eadd/30142 ◽

2009 ◽

Author(s):

Παναγιώτης Μαργαρώνης

Keyword(s):

Integrated Circuits ◽

High Speed ◽

Linear Feedback ◽

Description Language ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

Feedback Shift Register ◽

Very High

Η παρούσα διατριβή παρουσιάζει τη διαδικασία σχεδίασης και υλοποίησης μιας ολοκληρωμένης και αυτόνομης κάρτας κρυπτογράφησης. Η συγκεκριμένη κάρτα έχει ονομαστεί LAM και εισάγει ένα ψηφιακό ολοκληρωμένο κύκλωμα το οποίο βασίζεται στο Peripheral Component Interconnection (PCI) δίαυλο. Η υλοποίηση της παραπάνω κάρτας κρυπτογράφησης σχεδιάστηκε με τη χρήση προγραμματιζόμενου ολοκληρωμένου κυκλώματος Field Programmable Gate Arrays (FPGA). Ο αντικειμενικός σκοπός της διατριβής είναι να προσφέρει σε βάθος γνώση αναφορικά με τη διαδικασία σχεδίασης και υλοποίησης ενός ψηφιακού κυκλώματος κρυπτογράφησης που βασίζεται στην τεχνολογία των ολοκληρωμένων προγραμματιζόμενων κυκλωμάτων FPGA με χρήση της γλώσσας περιγραφής υλικού Very High Speed Integrated Circuits Hardware Description Language (VHDL). Το συγκεκριμένο ψηφιακό κύκλωμα μπορεί να αξιοποιηθεί σαν κάρτα προσωπικού υπολογιστή. Η προαναφερόμενη κάρτα σχεδιάστηκε και υλοποιήθηκε σαν μια ολοκληρωμένη διαφανής συσκευή με δυνατότητα συμμετρικής κρυπτογράφησης/αποκρυπτογράφησης, ενσωματώνοντας ένα σύστημα δημιουργίας και διαχείρισης κλειδιών κρυπτογράφησης καθώς και συγχρονισμού με άλλες επικοινωνούντες συσκευές. Για την εκπόνηση της διατριβής πραγματοποιήθηκε μελέτη στα παρακάτω ερευνητικά πεδία. Στο πρώτο στάδιο μελετήθηκαν τα κυκλώματα FPGA, η γλώσσα περιγραφής υλικού VHDL, η κατανομή και ο χώρος σχεδίασης που περιλαμβάνει η υλοποίηση του κυκλώματος εσωτερικά στο Chip και τα εργαλεία υλοποίησης και ανάπτυξης. Στο δεύτερο στάδιο έγινε μελέτη των αρχών μετάδοσης δεδομένων μέσω του Internet, της κάρτας διασύνδεσης Ethernet και της επικοινωνίας πραγματικού χρόνου μέσω TCP/IP πρωτοκόλλου. Στο τρίτο στάδιο πραγματοποιήθηκε μελέτη στο μετασχηματισμό και μεταφορά κλειδιών από εξωτερική μνήμη στην εσωτερική μνήμη της κάρτας κρυπτογράφησης με τη βοήθεια Linear Feedback Shift Register (LFSR), στον προγραμματισμό LFSR και στην επιλογή κλειδιών (αδύναμα κλειδιά). Στο τέταρτο στάδιο μελετήθηκαν ερευνητικά θέματα που άπτονται της δημιουργίας και διαχείρισης κλειδιών συμμετρικής κρυπτογραφίας. Έπειτα έγινε μελέτη στη μετάδοση ψηφιακών δεδομένων μέσω πρωτοκόλλων DVB/DAB. Στη συνέχεια μελετήθηκε η εξουσιοδότηση χρήστη με Έξυπνες Κάρτες (Smart Cards) και το πρωτόκολλο ανάγνωσης των έξυπνων καρτών. Επιπλέον μελετήθηκαν η αρχιτεκτονική, οι αρχές επικοινωνίας του PCI διαύλου και ο χρονισμός του συστήματος, ενώ έγινε και ανάλυση των υπαρχόντων συμμετρικών αλγορίθμων κρυπτογράφησης που έχουν υλοποιηθεί σε επίπεδο υλικού. Ένα ακόμη πεδίο μελέτης υπήρξε ο συγχρονισμός των καρτών κρυπτογράφησης σε απομακρυσμένα συστήματα καθώς και η διάρκεια της ασφαλούς επικοινωνίας. Τέλος μελετήθηκαν οι βασικές αρχές για την προστασία από εξωτερικές παρεμβολές λόγω ηλεκτρομαγνητικής ακτινοβολίας καθώς και οι απαιτήσεις από εξωτερικά κυκλώματα για την ικανοποίηση των ηλεκτρικών απαιτήσεων της κάρτας.

Download Full-text

Combining Multiple Optimized FPGA-based Pulsar Search Modules Using OpenCL

Journal of Astronomical Instrumentation ◽

10.1142/s2251171719500089 ◽

2019 ◽

Vol 08 (03) ◽

pp. 1950008 ◽

Cited By ~ 1

Author(s):

Haomiao Wang ◽

Prabu Thiagaraj ◽

Oliver Sinnen

Keyword(s):

High Speed ◽

Design Space ◽

Hardware Accelerators ◽

Gate Arrays ◽

Fast Prototyping ◽

Multiple Input ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Square Kilometer Array ◽

High Level

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometer Array (SKA) as hardware accelerators. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalized hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approaches employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimized separately before. In this paper, we explore the design space of combining well-optimized designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2[Formula: see text] speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple high-end FPGA cards hosted in a workstation and compare to a technology comparable mid-range GPU.

Download Full-text