High Speed Implementation of a SHA-3 Core on Virtex-5 and Virtex-6 FPGAs

2016 ◽  
Vol 25 (07) ◽  
pp. 1650069 ◽  
Author(s):  
Muzaffar Rao ◽  
Thomas Newe ◽  
Ian Grout ◽  
Avijit Mathur

This work presents a novel technique for a high-speed implementation of the newly selected cryptographic hash function, Secure Hash Algorithm-3 (SHA-3) on Xilinx’s Virtex-5 and Virtex-6 Field Programmable Gate Arrays (FPGAs). The proposed technique consists of a two-phase implementation approach. In the first phase, all steps of the SHA-3 core are logically combined, which helps to eliminate the intermediate states of core function, these states utilize more area and also slow the execution. The second phase deals with the hardware implementation of the first phase equations using Xilinx Look-Up-Table (LUT) primitives. This two phase implementation technique results in a throughput of 19.241[Formula: see text]Gbps on a Virtex-6 FPGA; this is the highest reported throughput to date for an FPGA implementation of SHA-3. This high throughput makes this technique ideally suited for the provision of Bump In The Wire (BITW) security for Internet of Things (IoT) applications.

2010 ◽  
Vol 2010 ◽  
pp. 1-17 ◽  
Author(s):  
Shahnam Mirzaei ◽  
Ryan Kastner ◽  
Anup Hosangadi

We present a method for implementing high speed finite impulse response (FIR) filters on field programmable gate arrays (FPGAs). Our algorithm is a multiplierless technique where fixed coefficient multipliers are replaced with a series of add and shift operations. The first phase of our algorithm uses registered adders and hardwired shifts. Here, a modified common subexpression elimination (CSE) algorithm reduces the number of adders while maintaining performance. The second phase optimizes routing delay using prelayout wire length estimation techniques to improve the final placed and routed design. The optimization target platforms are Xilinx Virtex FPGA devices where we compare the implementation results with those produced by Xilinx Coregen, which is based on distributed arithmetic (DA). We observed up to 50% reduction in the number of slices and up to 75% reduction in the number of look up tables (LUTs) for fully parallel implementations compared to DA method. Also, there is 50% reduction in the total dynamic power consumption of the filters. Our designs perform up to 27% faster than the multiply accumulate (MAC) filters implemented by Xilinx Coregen tool using DSP blocks. For placement, there is a saving up to 20% in number of routing channels. This results in lower congestion and up to 8% reduction in average wirelength.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 353 ◽  
Author(s):  
Anees Ullah ◽  
Ali Zahir ◽  
Noaman A. Khan ◽  
Waleed Ahmad ◽  
Alexis Ramos ◽  
...  

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1116 ◽  
Author(s):  
Yushkova ◽  
Sanchez ◽  
de Castro ◽  
Martínez-García

The use of Hardware-in-the-Loop (HIL) systems implemented in Field Programmable Gate Arrays (FPGAs) is constantly increasing because of its advantages compared to traditional simulation techniques. This increase in usage has caused new challenges related to the improvement of their performance and features like the number of output channels, while the price of HIL systems is diminishing. At present, the use of low-speed Digital-to-Analog Converters (DACs) is starting to be a commercial possibility because of two reasons. One is their lower price and the other is their lower pin count, which determines the number and price of the FPGAs that are necessary to handle those DACs. This paper compares four filtering approaches for providing suitable data to low-speed DACs, which help to filter high-speed input signals, discarding the need of using expensive high-speed DACS, and therefore decreasing the total cost of HIL implementations. Results show that the selection of the appropriate filter should be based on the type of the input waveform and the relative importance of the dynamics versus the area.


2016 ◽  
Vol 78 (7-4) ◽  
Author(s):  
Lean Thiam Siow ◽  
Mohd Hafiz Fazalul Rahiman ◽  
Ruzairi Abdul Rahim ◽  
Mohd Shukry Abdul Majid ◽  
Salman Sayyidi Hamzah ◽  
...  

The aims of this paper are to provide a review of the process tomography applications employing field programmable gate arrays (FPGA) and to understand current FPGA related researches, in order to seek for the possibility to applied FPGA technology in an ultrasonic process tomography system. FPGA allows users to implement complete systems on a programmable chip, meanwhile, five main benefits of applying the FPGA technology are performance, time to market, cost, reliability, and long-term maintenance. These advantages definitely could help in the revolution of process tomography, especially for ultrasonic process tomography and electrical process tomography. Future work is focused on the ultrasonic process tomography for chemical process column investigation using FPGA for the aspects of low cost, high speed and reconstructed image quality.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1952
Author(s):  
Eva M. Cirugeda-Roldán ◽  
María Sofía Martínez-García ◽  
Alberto Sanchez ◽  
Angel de Castro

Hardware in the loop is a widely used technique in power electronics, allowing to test and debug in real time (RT) at a low cost. In this context, field-programmable gate arrays (FPGAs) play an important role due to the high-speed requirements of RT simulations, in which area optimization is also crucial. Both characteristics, area and speed, are affected by the numerical formats (NFs) and their rounding modes. Regarding FPGAs, Xilinx is one of the largest manufacturers in the world, offering Vivado as its main design suite, but it was not until the release of Vivado 2020.2 that support for the IEEE NF libraries of VHDL-2008 was included. This work presents an exhaustive evaluation of the performance of Vivado 2020.2 in terms of area and speed using the native IEEE libraries of VHDL-2008 regarding NF. Results show that even though fixed-point NFs optimize area and speed, if a user prefers the use of floating-point NFs, with this new release, it can be synthesized—which could not be done in previous versions of Vivado. Although support for the native IEEE libraries of VHDL-2008 was included in Vivado 2020.2, it still lacks some issues regarding NF conversion during synthesis while support for simulation is not yet included.


Author(s):  
Fatimazahraa Assad ◽  
Mohamed Fettach ◽  
Fadwa El Otmani ◽  
Abderrahim Tragha

<span>The secure hash function has become the default choice for information security, especially in applications that require data storing or manipulation. Consequently, optimized implementations of these functions in terms of Throughput or Area are in high demand. In this work we propose a new conception of the secure hash algorithm 3 (SHA-3), which aim to increase the performance of this function by using pipelining, four types of pipelining are proposed two, three, four, and six pipelining stages. This approach allows us to design data paths of SHA-3 with higher Throughput and higher clock frequencies. The design reaches a maximum Throughput of 102.98 Gbps on Virtex 5 and 115.124 Gbps on Virtex 6 in the case of the 6 stages, for 512 bits output length. Although the utilization of the resource increase with the increase of the number of the cores used in each one of the cases. The proposed designs are coded in very high-speed integrated circuits program (VHSIC) hardware description language (VHDL) and implemented in Xilinx Virtex-5 and Virtex-6 A field-programmable gate array (FPGA) devices and compared to existing FPGA implementations.</span>


2009 ◽  
Author(s):  
Παναγιώτης Μαργαρώνης

Η παρούσα διατριβή παρουσιάζει τη διαδικασία σχεδίασης και υλοποίησης μιας ολοκληρωμένης και αυτόνομης κάρτας κρυπτογράφησης. Η συγκεκριμένη κάρτα έχει ονομαστεί LAM και εισάγει ένα ψηφιακό ολοκληρωμένο κύκλωμα το οποίο βασίζεται στο Peripheral Component Interconnection (PCI) δίαυλο. Η υλοποίηση της παραπάνω κάρτας κρυπτογράφησης σχεδιάστηκε με τη χρήση προγραμματιζόμενου ολοκληρωμένου κυκλώματος Field Programmable Gate Arrays (FPGA). Ο αντικειμενικός σκοπός της διατριβής είναι να προσφέρει σε βάθος γνώση αναφορικά με τη διαδικασία σχεδίασης και υλοποίησης ενός ψηφιακού κυκλώματος κρυπτογράφησης που βασίζεται στην τεχνολογία των ολοκληρωμένων προγραμματιζόμενων κυκλωμάτων FPGA με χρήση της γλώσσας περιγραφής υλικού Very High Speed Integrated Circuits Hardware Description Language (VHDL). Το συγκεκριμένο ψηφιακό κύκλωμα μπορεί να αξιοποιηθεί σαν κάρτα προσωπικού υπολογιστή. Η προαναφερόμενη κάρτα σχεδιάστηκε και υλοποιήθηκε σαν μια ολοκληρωμένη διαφανής συσκευή με δυνατότητα συμμετρικής κρυπτογράφησης/αποκρυπτογράφησης, ενσωματώνοντας ένα σύστημα δημιουργίας και διαχείρισης κλειδιών κρυπτογράφησης καθώς και συγχρονισμού με άλλες επικοινωνούντες συσκευές. Για την εκπόνηση της διατριβής πραγματοποιήθηκε μελέτη στα παρακάτω ερευνητικά πεδία. Στο πρώτο στάδιο μελετήθηκαν τα κυκλώματα FPGA, η γλώσσα περιγραφής υλικού VHDL, η κατανομή και ο χώρος σχεδίασης που περιλαμβάνει η υλοποίηση του κυκλώματος εσωτερικά στο Chip και τα εργαλεία υλοποίησης και ανάπτυξης. Στο δεύτερο στάδιο έγινε μελέτη των αρχών μετάδοσης δεδομένων μέσω του Internet, της κάρτας διασύνδεσης Ethernet και της επικοινωνίας πραγματικού χρόνου μέσω TCP/IP πρωτοκόλλου. Στο τρίτο στάδιο πραγματοποιήθηκε μελέτη στο μετασχηματισμό και μεταφορά κλειδιών από εξωτερική μνήμη στην εσωτερική μνήμη της κάρτας κρυπτογράφησης με τη βοήθεια Linear Feedback Shift Register (LFSR), στον προγραμματισμό LFSR και στην επιλογή κλειδιών (αδύναμα κλειδιά). Στο τέταρτο στάδιο μελετήθηκαν ερευνητικά θέματα που άπτονται της δημιουργίας και διαχείρισης κλειδιών συμμετρικής κρυπτογραφίας. Έπειτα έγινε μελέτη στη μετάδοση ψηφιακών δεδομένων μέσω πρωτοκόλλων DVB/DAB. Στη συνέχεια μελετήθηκε η εξουσιοδότηση χρήστη με Έξυπνες Κάρτες (Smart Cards) και το πρωτόκολλο ανάγνωσης των έξυπνων καρτών. Επιπλέον μελετήθηκαν η αρχιτεκτονική, οι αρχές επικοινωνίας του PCI διαύλου και ο χρονισμός του συστήματος, ενώ έγινε και ανάλυση των υπαρχόντων συμμετρικών αλγορίθμων κρυπτογράφησης που έχουν υλοποιηθεί σε επίπεδο υλικού. Ένα ακόμη πεδίο μελέτης υπήρξε ο συγχρονισμός των καρτών κρυπτογράφησης σε απομακρυσμένα συστήματα καθώς και η διάρκεια της ασφαλούς επικοινωνίας. Τέλος μελετήθηκαν οι βασικές αρχές για την προστασία από εξωτερικές παρεμβολές λόγω ηλεκτρομαγνητικής ακτινοβολίας καθώς και οι απαιτήσεις από εξωτερικά κυκλώματα για την ικανοποίηση των ηλεκτρικών απαιτήσεων της κάρτας.


2019 ◽  
Vol 08 (03) ◽  
pp. 1950008 ◽  
Author(s):  
Haomiao Wang ◽  
Prabu Thiagaraj ◽  
Oliver Sinnen

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometer Array (SKA) as hardware accelerators. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalized hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approaches employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimized separately before. In this paper, we explore the design space of combining well-optimized designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2[Formula: see text] speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple high-end FPGA cards hosted in a workstation and compare to a technology comparable mid-range GPU.


Sign in / Sign up

Export Citation Format

Share Document