scholarly journals Design and Implementation of High-Performance ECC Processor with Unified Point Addition on Twisted Edwards Curve

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5148
Author(s):  
Md. Mainul Islam ◽  
Md. Selim Hossain ◽  
Moh. Khalid Hasan ◽  
Md. Shahjalal ◽  
Yeong Min Jang

With the swift evolution of wireless technologies, the demand for the Internet of Things (IoT) security is rising immensely. Elliptic curve cryptography (ECC) provides an attractive solution to fulfill this demand. In recent years, Edwards curves have gained widespread acceptance in digital signatures and ECC due to their faster group operations and higher resistance against side-channel attacks (SCAs) than that of the Weierstrass form of elliptic curves. In this paper, we propose a high-speed, low-area, simple power analysis (SPA)-resistant field-programmable gate array (FPGA) implementation of ECC processor with unified point addition on a twisted Edwards curve, namely Edwards25519. Efficient hardware architectures for modular multiplication, modular inversion, unified point addition, and elliptic curve point multiplication (ECPM) are proposed. To reduce the computational complexity of ECPM, the ECPM scheme is designed in projective coordinates instead of affine coordinates. The proposed ECC processor performs 256-bit point multiplication over a prime field in 198,715 clock cycles and takes 1.9 ms with a throughput of 134.5 kbps, occupying only 6543 slices on Xilinx Virtex-7 FPGA platform. It supports high-speed public-key generation using fewer hardware resources without compromising the security level, which is a challenging requirement for IoT security.

Author(s):  
Mrs. Lakshmidevi TR ◽  
Ms. Kavana Shree C ◽  
Ms. Arshitha S ◽  
Ms. Kavya L

Creating a high-speed elliptic curve cryptographic (ECC) processor capable of performing fast point Multiplication with low hardware utilisation is a critical requirement in cryptography and network security. This paper describes the implementation of a high-speed, field-programmable gate array (FPGA) in this paper. A high-security digital signature technique is implemented using Edwards25519, a recently approved twisted Edwards’s curve. For point addition and point doubling operations on the twisted Edwards curve, advanced hardware configurations are developed in which each task involves only 516 and 1029 clock cycles, respectively. As an observation the ECC processor presented in this paper begins with the process which takes 1.48 ms of single-point multiplication to be performed. The comparison of key size and its ratio which shows the impact on processing of each processor is shown for ECC processor and RSA processor. The delay and number of slices used for the ECC processor is shown and this is a developed solution saves time by providing rapid scalar multiplication with low hardware consumption without compromising on security.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yong Xiao ◽  
Weibin Lin ◽  
Yun Zhao ◽  
Chao Cui ◽  
Ziwen Cai

Teleoperated robotic systems are those in which human operators control remote robots through a communication network. The deployment and integration of teleoperated robot’s systems in the medical operation have been hampered by many issues, such as safety concerns. Elliptic curve cryptography (ECC), an asymmetric cryptographic algorithm, is widely applied to practical applications because its far significantly reduced key length has the same level of security as RSA. The efficiency of ECC on GF (p) is dictated by two critical factors, namely, modular multiplication (MM) and point multiplication (PM) scheduling. In this paper, the high-performance ECC architecture of SM2 is presented. MM is composed of multiplication and modular reduction (MR) in the prime field. A two-stage modular reduction (TSMR) algorithm in the SCA-256 prime field is introduced to achieve low latency, which avoids more iterative subtraction operations than traditional algorithms. To cut down the run time, a schedule is put forward when exploiting the parallelism of multiplication and MR inside PM. Synthesized with a 0.13 um CMOS standard cell library, the proposed processor consumes 341.98k gate areas, and each PM takes 0.092 ms.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 178811-178826 ◽  
Author(s):  
Md. Mainul Islam ◽  
Md. Selim Hossain ◽  
Moh. Khalid Hasan ◽  
Md. Shahjalal ◽  
Yeong Min Jang

2019 ◽  
Vol 28 (03) ◽  
pp. 1950037 ◽  
Author(s):  
A. Bellemou ◽  
N. Benblidia ◽  
M. Anane ◽  
M. Issad

In this paper, we present Microblaze-based parallel architectures of Elliptic Curve Scalar Multiplication (ECSM) computation for embedded Elliptic Curve Cryptosystem (ECC) on Xilinx FPGA. The proposed implementations support arbitrary Elliptic Curve (EC) forms defined over large prime field ([Formula: see text]) with different security-level sizes. ECSM is performed using Montgomery Power Ladder (MPL) algorithm in Chudnovsky projective coordinates system. At the low abstraction level, Montgomery Modular Multiplication (MMM) is considered as the critical operation. It is implemented within a hardware Accelerator MMM (AccMMM) core based on the modified high radix, [Formula: see text] MMM algorithm. The efficiency of our parallel implementations is achieved by the combination of the mixed SW/HW approach with Multi Processor System on Programmable Chip (MPSoPC) design. The integration of multi MicroBlaze processor in single architecture allows not only the flexibility of the overall system but also the exploitation of the parallelism in ECSM computation with several degrees. The Virtex-5 parallel implementations of 256-bit and 521-bis ECSM computations run at 100[Formula: see text]MHZ frequency and consume between 2,739 and 6,533 slices, 22 and 72 RAMs and between 16 and 48 DSP48E cores. For the considered security-level sizes, the delays to perform single ECSM are between 115[Formula: see text]ms and 14.72[Formula: see text]ms.


2019 ◽  
Vol 28 (09) ◽  
pp. 1950149
Author(s):  
Bahram Rashidi ◽  
Mohammad Abedini

This paper presents efficient lightweight hardware implementations of the complete point multiplication on binary Edwards curves (BECs). The implementations are based on general and special cases of binary Edwards curves. The complete differential addition formulas have the cost of [Formula: see text] and [Formula: see text] for general and special cases of BECs, respectively, where [Formula: see text] and [Formula: see text] denote the costs of a field multiplication, a field squaring and a field multiplication by a constant, respectively. In the general case of BECs, the structure is implemented based on 3 concurrent multipliers. Also in the special case of BECs, two structures by employing 3 and 2 field multipliers are proposed for achieving the highest degree of parallelization and utilization of resources, respectively. The field multipliers are implemented based on the proposed efficient digit–digit polynomial basis multiplier. Two input operands of the multiplier proceed in digit level. This property leads to reduce hardware consumption and critical path delay. Also, in the structure, based on the change of input digit size from low digit size to high digit size the number of clock cycles and input words are different. Therefore, the multiplier can be flexible for different cryptographic considerations such as low-area and high-speed implementations. The point multiplication computation requires field inversion, therefore, we use a low-cost Extended Euclidean Algorithm (EEA) based inversion for implementation of this field operation. Implementation results of the proposed architectures based on Virtex-5 XC5VLX110 FPGA for two fields [Formula: see text] and [Formula: see text] are achieved. The results show improvements in terms of area and efficiency for the proposed structures compared to previous works.


Information ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 285 ◽  
Author(s):  
Mohamad Ali Mehrabi ◽  
Christophe Doche

Twisted Edwards curves have been at the center of attention since their introduction by Bernstein et al. in 2007. The curve ED25519, used for Edwards-curve Digital Signature Algorithm (EdDSA), provides faster digital signatures than existing schemes without sacrificing security. The CURVE25519 is a Montgomery curve that is closely related to ED25519. It provides a simple, constant time, and fast point multiplication, which is used by the key exchange protocol X25519. Software implementations of EdDSA and X25519 are used in many web-based PC and Mobile applications. In this paper, we introduce a low-power, low-area FPGA implementation of the ED25519 and CURVE25519 scalar multiplication that is particularly relevant for Internet of Things (IoT) applications. The efficiency of the arithmetic modulo the prime number 2 255 - 19 , in particular the modular reduction and modular multiplication, are key to the efficiency of both EdDSA and X25519. To reduce the complexity of the hardware implementation, we propose a high-radix interleaved modular multiplication algorithm. One benefit of this architecture is to avoid the use of large-integer multipliers relying on FPGA DSP modules.


2020 ◽  
Author(s):  
Hari Krishna Modalavalasa

The multiplication and accumulation are the vital operations involved in almost all the Digital Signal Processing applications. With the advent of new technology in the domain of VLSI, communication and signal processing, there is an ever going demand for the high speed processing and low area design. In today's technology, Add-Multiply (AM) operator or Multiply Accumulator (MAC) units are generally employed in all high performance digital signal processors (DSP) and controllers. The performance of AM operator mainly depends on the speed of multiplier. A lot of research has been contributed in this area and the conventional multipliers were modified to provide good speed performance but needs to be improved further along with area optimization. Urdhwa-Tiryakbhyam Multiplier (UTM) architecture is adopted from ancient Indian mathematics "Vedas’ and can generate the partial products and sums in one step, which reduces the carry propagation from LSB to MSB. UTM can be used to implement high performance AM operators but results in larger silicon areas. This increased area can be minimized by using the modified compressor based design of UTM. In this work, the carrylook-ahead (CLA) adder is adopted instead of parallel adders for high speed of accumulation. So, the Compressor-Based-Urdhwa-Tiryakbhyam (CB-UT) multiplier with CLA results in both area and performance optimization of Add-Multiply operator. The functionality of this architecture is evaluated by comparing with the Modified Booth (MB) multiplier based AM operator in terms of performance parameters like propagation delay, power consumption and silicon-area. The design is implemented and verified using Xilinx Spartan-3E FPGA and ISE Simulator.


Electronics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 431 ◽  
Author(s):  
Xianghong Hu ◽  
Xin Zheng ◽  
Shengshi Zhang ◽  
Weijun Li ◽  
Shuting Cai ◽  
...  

Elliptic curve cryptography (ECC) is widely used in practical applications because ECC has far fewer bits for operands at the same level of security than other public-key cryptosystems such as RSA. The performance of an ECC processor is usually determined by modular multiplication (MM) and point multiplication (PM) operations. For recommended prime field, MM operation can consist of multiplication and fast reduction operations. In this paper, a 256-bit multiplication operation is implemented by a 129-bit (half-word) multiplier using Karatsuba–Ofman multiplication algorithm. The fast reduction is a modulo operation, which gets 512-bit input data from multiplication and outputs a 256-bit result ( 0 ≤ Z < p ) . We propose a two-stage fast reduction algorithm (TSFR) over SCA-256 prime field, which can obtain an intermediate result of 0 ≤ Z < 2 p instead of 0 ≤ Z < 14 p in traditional algorithm, avoiding a lot of repetitive subtraction operations. The PM operation is implemented in width nonadjacent form (NAF) algorithm and its operational schedules are improved to increase the parallelism of multiplication and fast reduction operations. Synthesized with a 0.13 μ m complementary metal oxide semiconductor (CMOS) standard cell library, the proposed processor costs an area of 280 k gates and PM operation takes 0.057 ms at the frequency of 250 MHz. The design is also implemented on Xilinx Virtex-6 platform, which consumes 27.655 k LUTs and takes 0.37 ms to perform one 256-bit PM operation, attaining six times speed-up over the state-of-the-art. The processor makes a tradeoff between area and performance, thus it is better than other methods.


Sign in / Sign up

Export Citation Format

Share Document