AREEBA: An Area Efficient Binary Huff-Curve Architecture

Elliptic curve cryptography is the most widely employed class of asymmetric cryptography algorithm. However, it is exposed to simple power analysis attacks due to the lack of unifiedness over point doubling and addition operations. The unified crypto systems such as Binary Edward, Hessian and Huff curves provide resistance against power analysis attacks. Furthermore, Huff curves are more secure than Edward and Hessian curves but require more computational resources. Therefore, this article has provided a low area hardware architecture for point multiplication computation of Binary Huff curves over GF(2163) and GF(2233). To achieve this, a segmented least significant digit multiplier for polynomial multiplications is proposed. In order to provide a realistic and reasonable comparison with state of the art solutions, the proposed architecture is modeled in Verilog and synthesized for different field programmable gate arrays. For Virtex-4, Virtex-5, Virtex-6, and Virtex-7 devices, the utilized hardware resources in terms of hardware slices over GF(2163) are 5302, 2412, 2982 and 3508, respectively. The corresponding achieved values over GF(2233) are 11,557, 10,065, 4370 and 4261, respectively. The reported low area values provide the acceptability of this work in area-constrained applications.

Download Full-text

An Overview of Power Analysis Attacks Against Field Programmable Gate Arrays

Proceedings of the IEEE ◽

10.1109/jproc.2005.862437 ◽

2006 ◽

Vol 94 (2) ◽

pp. 383-394 ◽

Cited By ~ 76

Author(s):

O.-X. Standaert ◽

E. Peeters ◽

G. Rouvroy ◽

J.-J. Quisquater

Keyword(s):

Power Analysis ◽

Field Programmable Gate Arrays ◽

Power Analysis Attacks ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Side-Channel Power Resistance for Encryption Algorithms Using Implementation Diversity

Cryptography ◽

10.3390/cryptography4020013 ◽

2020 ◽

Vol 4 (2) ◽

pp. 13

Author(s):

Ivan Bow ◽

Nahome Bete ◽

Fareena Saqib ◽

Wenjie Che ◽

Chintan Patel ◽

...

Keyword(s):

Power Analysis ◽

Side Channel ◽

Side Channel Attacks ◽

Dynamic Partial Reconfiguration ◽

Channel Power ◽

Diversity Techniques ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Encryption Algorithms

This paper investigates countermeasures to side-channel attacks. A dynamic partial reconfiguration (DPR) method is proposed for field programmable gate arrays (FPGAs)s to make techniques such as differential power analysis (DPA) and correlation power analysis (CPA) difficult and ineffective. We call the technique side-channel power resistance for encryption algorithms using DPR, or SPREAD. SPREAD is designed to reduce cryptographic key related signal correlations in power supply transients by changing components of the hardware implementation on-the-fly using DPR. Replicated primitives within the advanced encryption standard (AES) algorithm, in particular, the substitution-box (SBOX)s, are synthesized to multiple and distinct gate-level implementations. The different implementations change the delay characteristics of the SBOXs, reducing correlations in the power traces, which, in turn, increases the difficulty of side-channel attacks. The effectiveness of the proposed countermeasures depends greatly on this principle; therefore, the focus of this paper is on the evaluation of implementation diversity techniques.

Download Full-text

BPR-TCAM—Block and Partial Reconfiguration based TCAM on Xilinx FPGAs

Electronics ◽

10.3390/electronics9020353 ◽

2020 ◽

Vol 9 (2) ◽

pp. 353 ◽

Cited By ~ 1

Author(s):

Anees Ullah ◽

Ali Zahir ◽

Noaman A. Khan ◽

Waleed Ahmad ◽

Alexis Ramos ◽

...

Keyword(s):

Resource Utilization ◽

High Speed ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Partial Reconfiguration ◽

Gate Arrays ◽

Content Addressable Memories ◽

Field Programmable ◽

Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.

Download Full-text

BIST Architecture using Area Efficient Low Current LFSR for Embedded Memory Testing Applications Applications

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v7.i1.pp1-11 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

M. Parvathi ◽

N. Vasantha ◽

K. Satya Prasad

Keyword(s):

Systems Design ◽

Digital Signal ◽

Low Area ◽

Gate Arrays ◽

Current Limit ◽

Maximum Current ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Layout Area ◽

Signal Processors

One of the important block of BIST controller is LFSR and the speed with which BIST operates depends on LFSR systems design. There are methods in implementing LFSR using field programmable gate arrays (FPGAs) or digital signal processors (DSPs). BIST controller system speed is then limited to FPGAs and DSPs, which may influence other parameters such as overall area, maximum current, limit and power dissipation. This paper proposes a technique to achieve an efficient BIST controller by redesigning LFSR using GDI based D flip-flops that resulted with low area and low current capabilities. This paper presents three different techniques for implementing flip-flops for an efficient LFSR so that the layout area will be minimized as well as the maximum current drawn will be lower.

Download Full-text

Enhanced Technology Mapping for FPGAs with Exploration of Cell Configurations

Journal of Circuits System and Computers ◽

10.1142/s0218126615500395 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550039 ◽

Cited By ~ 1

Author(s):

Grace Zgheib ◽

Iyad Ouaiss

Keyword(s):

State Of The Art ◽

Variable Structure ◽

Logic Circuits ◽

Technology Mapping ◽

Decomposition Techniques ◽

Gate Arrays ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays ◽

Logic Functions

In the state-of-the-art field-programmable gate arrays (FPGAs), logic circuits are synthesized and mapped on clusters of look-up tables. However, arithmetic operations benefit from an existing dedicated adder along with a carry chain used to ensure a fast carry propagation. This carry chain is a dedicated wire available in the architecture of the FPGA and is as such independent of the external programmable routing resources. In this paper, we propose a variable-structure Boolean matching technology mapper with embedded decomposition techniques to map nonarithmetic logic functions on carry chains. Previously synthesized and mapped logic functions are adapted so that their outputs are routed using the dedicated carry chains instead of the external programmable interconnects. The experimental results show a reduction in the used routing resources as well as the circuit area when using this Boolean matching-based mapper on the Altera Stratix-III FPGA.

Download Full-text

A Fast Approach for Generating Efficient Parsers on FPGAs

Symmetry ◽

10.3390/sym11101265 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1265 ◽

Cited By ~ 1

Author(s):

Zhuang Cao ◽

Huiguo Zhang ◽

Junnan Li ◽

Mei Wen ◽

Chunyuan Zhang

Keyword(s):

High Performance ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Hardware Architecture ◽

Clock Rate ◽

Gate Arrays ◽

Fast Approach ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Vhdl Code

The development of modern networking requires that high-performance network processors be designed quickly and efficiently to support new protocols. As a very important part of the processor, the parser parses the headers of the packets—this is the precondition for further processing and finally forwarding these packets. This paper presents a framework designed to transform P4 programs to VHDL and to generate parsers on Field Programmable Gate Arrays (FPGAs). The framework includes a pipeline-based hardware architecture and a back-end compiler. The hardware architecture comprises many components with varying functionality, each of which has its own optimized VHDL template. By using the output of a standard frontend P4 compiler, our proposed compiler extracts the parameters and relationships from within the used components, which can then be mapped to corresponding templates by configuring, optimizing, and instantiating them. Finally, these templates are connected to output VHDL code. When a prototype of this framework is implemented and evaluated, the results demonstrate that the throughputs of the generated parsers achieve nearly 320 Gbps at a clock rate of around 300 MHz. Compared with state-of-the-art solutions, our proposed parsers achieve an average of twice the throughput when similar amounts of resources are being used.

Download Full-text

RTN: Reparameterized Ternary Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5912 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4780-4787

Author(s):

Yuhang Li ◽

Xin Dong ◽

Sai Qian Zhang ◽

Haoli Bai ◽

Yuanpeng Chen ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Accuracy Improvement ◽

Gate Arrays ◽

Resource Limited ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Speed Up

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.

Download Full-text

New First-Order Secure AES Performance Records

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i2.304-327 ◽

2021 ◽

pp. 304-327

Author(s):

Aein Rezaei Shahmirzadi ◽

Dušan Božilov ◽

Amir Moradi

Keyword(s):

Integrated Circuit ◽

State Of The Art ◽

Large Set ◽

First Order ◽

Low Area ◽

Wide Range ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific Integrated Circuit ◽

The Cost

Being based on a sound theoretical basis, masking schemes are commonly applied to protect cryptographic implementations against Side-Channel Analysis (SCA) attacks. Constructing SCA-protected AES, as the most widely deployed block cipher, has been naturally the focus of several research projects, with a direct application in industry. The majority of SCA-secure AES implementations introduced to the community opted for low area and latency overheads considering Application-Specific Integrated Circuit (ASIC) platforms. Albeit a few, those which particularly targeted Field Programmable Gate Arrays (FPGAs) as the implementation platform yield either a low throughput or a not-highly secure design.In this work, we fill this gap by introducing first-order glitch-extended probing secure masked AES implementations highly optimized for FPGAs, which support both encryption and decryption. Compared to the state of the art, our designs efficiently map the critical non-linear parts of the masked S-box into the built-in Block RAMs (BRAMs).The most performant variant of our constructions accomplishes five first-order secure AES encryptions/decryptions simultaneously in 50 clock cycles. Compared to the equivalent state-of-the-art designs, this leads to at least 70% reduction in utilization of FPGA resources (slices) at the cost of occupying BRAMs. Last but not least, we provide a wide range of such secure and efficient implementations supporting a large set of applications, ranging from low-area to high-throughput.

Download Full-text

Hardware implementation of an α - level based binary search and shifting fuzzifier (α - BSSF)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-190291 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6671-6685

Author(s):

César Barrón-Romero ◽

Antonio Hernández-Zavala

Keyword(s):

Low Cost ◽

Mechatronic Systems ◽

Gate Arrays ◽

High Processing Speed ◽

Short Development Time ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Design Characteristics ◽

Α Level ◽

Computational Resources

Fuzzy processors are used for control actions in nonlinear mechatronic systems where high processing speed is required. The Field Programmable Gate Arrays (FPGA) are a good option to implement low cost fuzzy hardware in a short development time. A very important block in fuzzy hardware is the fuzzifier, since it affects directly in the accuracy of the result and in the processing time for obtaining a fuzzy number. There have been many design methodologies intended for enhancing the performance of this block. This paper presents a parallel fuzzifier circuit called α-BSSF. Its main design characteristics are the use of α-levels for membership representation, usage of integer numbers, and avoiding time-consuming operations. As result, we obtained a fuzzifier that shows advantages in the reduction of the response time and computational resources against the existing sequential fuzzification methods. This proposal is targeted not only for T1FS, but also for T2FS, since the membership calculation through fuzzifier is applied in the same way but twice.

Download Full-text

FPGAs in The Cloud

10.22541/au.163647170.02504770/v1 ◽

2021 ◽

Author(s):

Miriam Leeser ◽

Suranga Handagala ◽

Michael Zink

Keyword(s):

Cloud Computing ◽

High Performance ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Processing Elements ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Performance Computing ◽

Computing Models

As cloud computing grows, the types of computational hardware available in the cloud are diversifying. Field Programmable Gate Arrays (FPGAs) are a relatively new addition to high-performance computing in the cloud, with the ability to accelerate a range of different applications, and the flexibility to offer different cloud computing models. A new and growing configuration is to have the FPGAs directly connected to the network and thus reduce the latency in delivering data to processing elements. We survey the state-of-the-art in FPGAs in the cloud and present the Open Cloud Testbed (OCT), a testbed for research and experimentation into new cloud platforms, which includes network-attached FPGAs in the cloud.

Download Full-text