scholarly journals High Performance and Low Latency ECC Processor for Cryptography

Author(s):  
Mohan Rao Thokala

Elliptic curve cryptography processor implemented for point multiplication on field programmable gate array. Segmented pipelined full-precision multiplier is used to reduce the latency and also data dependency can be avoided by modifying Lopez-Dahab Montgomery PM Algorithm, results in drastic reduction in the number of clock cycles required. The proposed ECC processor is implemented on Xilinx FPGA families i.e. virtex-4, vitrtex-5, virtex-7.single and three multiplier based designs show the fastest performance compared with reported work individually. Our three multiplier based ECC processor implementation is taking the lowest number of clock cycles on FPGA based design processor.

Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2698
Author(s):  
Muhammad Rashid ◽  
Mohammad Mazyad Hazzazi  ◽  
Sikandar Zulqarnain Khan ◽  
Adel R. Alharbi  ◽  
Asher Sajid  ◽  
...  

This paper presents a Point Multiplication (PM) architecture of Elliptic-Curve Cryptography (ECC) over GF(2163) with a focus on the optimization of hardware resources and latency at the same time. The hardware resources are reduced with the use of a bit-serial (traditional schoolbook) multiplication method. Similarly, the latency is optimized with the reduction in a critical path using pipeline registers. To cope with the pipelining, we propose to reschedule point addition and double instructions, required for the computation of a PM operation in ECC. Subsequently, the proposed architecture over GF(2163) is modeled in Verilog Hardware Description Language (HDL) using Vivado Design Suite. To provide a fair performance evaluation, we synthesize our design on various FPGA (field-programmable gate array) devices. These FPGA devices are Virtex-4, Virtex-5, Virtex-6, Virtex-7, Spartan-7, Artix-7, and Kintex-7. The lowest area (433 FPGA slices) is achieved on Spartan-7. The highest speed is realized on Virtex-7, where our design achieves 391 MHz clock frequency and requires 416 μs for one PM computation (latency). For power, the lowest values are achieved on the Artix-7 (56 μW) and Kintex-7 (61 μW) devices. A ratio of throughput over area value of 4.89 is reached for Virtex-7. Our design outperforms most recent state-of-the-art solutions (in terms of area) with an overhead of latency.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yong Xiao ◽  
Weibin Lin ◽  
Yun Zhao ◽  
Chao Cui ◽  
Ziwen Cai

Teleoperated robotic systems are those in which human operators control remote robots through a communication network. The deployment and integration of teleoperated robot’s systems in the medical operation have been hampered by many issues, such as safety concerns. Elliptic curve cryptography (ECC), an asymmetric cryptographic algorithm, is widely applied to practical applications because its far significantly reduced key length has the same level of security as RSA. The efficiency of ECC on GF (p) is dictated by two critical factors, namely, modular multiplication (MM) and point multiplication (PM) scheduling. In this paper, the high-performance ECC architecture of SM2 is presented. MM is composed of multiplication and modular reduction (MR) in the prime field. A two-stage modular reduction (TSMR) algorithm in the SCA-256 prime field is introduced to achieve low latency, which avoids more iterative subtraction operations than traditional algorithms. To cut down the run time, a schedule is put forward when exploiting the parallelism of multiplication and MR inside PM. Synthesized with a 0.13 um CMOS standard cell library, the proposed processor consumes 341.98k gate areas, and each PM takes 0.092 ms.


2019 ◽  
Vol 28 (03) ◽  
pp. 1950050
Author(s):  
Yuhao Dou ◽  
Yisu Zhou ◽  
Bo Xin

In securities trading, low latency helps investors take the leading position in the market. Conventionally, market data is decoded with software running on general computers. However, the serial structure of software and complex operating system scheduling cause high latency. This paper designs an accelerator for decoding market data based on field-programmable gate array (FPGA). We propose a pipeline in the accelerator, where every part works independently and parallelly. Furthermore, we present a mechanism for encoding templates, which avoids reconstructing the accelerator and decreases the cost when the template is renewed. We evaluate this accelerator with real Financial Information eXchange (FIX) messages and FIX Adapting for Streaming (FAST) templates, attaining an average latency of 447[Formula: see text]ns.


2019 ◽  
Vol 45 (3) ◽  
pp. 1-35 ◽  
Author(s):  
Armando Faz-Hernández ◽  
Julio López ◽  
Ricardo Dahab

Robotica ◽  
2011 ◽  
Vol 30 (6) ◽  
pp. 891-912 ◽  
Author(s):  
Damien C. Browne ◽  
Lindsay Kleeman

SUMMARYMatched filtering optimally estimates the arrival time for a sonar sensor by correlating received signals with templates. This paper presents a sonar ring with continuous matched filtering on 48 receiver channels sampled at 500kHz. The design dynamically switches the matched filter templates to account for pulse shape variations with range. To achieve real-time, low-latency and optimal performance, processing is implemented on an field-programmable gate array (FPGA) transmitting sonar pulses (2 periods of a 45kHz sine wave) at repetition rate of 30-Hz to 5.7-m range. The paper describes the removal of secondary peaks of the correlation output of matched filtering and template selection. Results include sonar maps, accuracy measurements and localization of weak targets.


Sign in / Sign up

Export Citation Format

Share Document