HIGH PERFORMANCE MONTGOMERY MODULAR MULTIPLIER WITH A NEW RECODING METHOD

Modular calculations are widely used in many applications, especially in public key cryptography. Such operations are very time consuming, due to their long operands. To improve the performance of these calculations, many methods have been introduced. Montgomery modular multiplication is an example of such a solution to enhance the performance of modular multiplication and modular exponentiation. The radix-2 version of this method is simple and fast for hardware implementation, where multi-operand adders are required for its implementation. So far, Carry-Save-Adder (CSA) gives the best performance for multi-addition. In this paper, we propose a new recoding method for the Montgomery modular multiplier to enhance its performance. This is done through replacing CSA blocks with new blocks that have better performances than CSA in multi-addition calculations. With this replacement, we can theoretically have up to 40% reduction in area gates. In our experiments, we obtained 5.8% area reduction and 3% speed improvement in a hardware implementation. The idea behind our proposed method is the use of bitwise subtraction operator, where no carry propagation is needed. This recoding method of operands can also be used in many aspects of computer arithmetic, algorithms and computational hardware, such as multiplication, exponentiation and etc., in order to enhance their performances.

Download Full-text

Efficient PSoC Implementation of Modular Multiplication and Exponentiation Based on Serial-Parallel Combination

Journal of Circuits System and Computers ◽

10.1142/s0218126619502293 ◽

2019 ◽

Vol 28 (13) ◽

pp. 1950229 ◽

Cited By ~ 1

Author(s):

M. Issad ◽

B. Boudraa ◽

M. Anane ◽

A. M. Bellemou

Keyword(s):

Input Data ◽

Public Key Cryptography ◽

Modular Exponentiation ◽

Modular Multiplication ◽

Montgomery Modular Multiplication ◽

Implementation Approach ◽

Security Levels ◽

Data Bus ◽

On Chip ◽

Data Length

This paper presents an FPGA implementation of the most critical operations of Public Key Cryptography (PKC), namely the Modular Exponentiation (ME) and the Modular Multiplication (MM). Both operations are integrated as Programmable System on Chip (PSoC) where the processor Microblaze of Xilinx is used for flexibility. Our objective is to achieve a best trade-off between time execution, occupied area and flexibility. The implementation of these operations on such environment requires taking into account several criteria. Indeed, the Hardware (HW) architectures data bus should be smaller than the input data length. The design must be scalable to support different security levels. The implementation achieves optimums execution time and HW resources number. In order to satisfy these constraints, Montgomery Power Ladder (MPL) and Montgomery Modular Multiplication (MMM) algorithms are utilized for the ME and the MM implementations as HW accelerators, respectively. Our implementation approach is based on the digit-serial method for performing the basic arithmetic operations. Efficient parallel and pipeline strategies are developed at the digit level for the optimization of the execution times. The application for 1024-bits data length shows that the MMM run in 6.24[Formula: see text][Formula: see text]s and requires 647 slices. The ME is executed in 6.75[Formula: see text]ms using 2881 slices.

Download Full-text

A Hardware-Accelerated ECDLP with High-Performance Modular Multiplication

International Journal of Reconfigurable Computing ◽

10.1155/2012/439021 ◽

2012 ◽

Vol 2012 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Lyndon Judge ◽

Suvarna Mane ◽

Patrick Schaumont

Keyword(s):

Elliptic Curve ◽

Elliptic Curve Cryptography ◽

High Performance ◽

Design Space ◽

Discrete Logarithm ◽

Public Key Cryptography ◽

Modular Multiplication ◽

Polynomial Representation ◽

Prime Field ◽

Modular Multiplier

Elliptic curve cryptography (ECC) has become a popular public key cryptography standard. The security of ECC is due to the difficulty of solving the elliptic curve discrete logarithm problem (ECDLP). In this paper, we demonstrate a successful attack on ECC over prime field using the Pollard rho algorithm implemented on a hardware-software cointegrated platform. We propose a high-performance architecture for multiplication over prime field using specialized DSP blocks in the FPGA. We characterize this architecture by exploring the design space to determine the optimal integer basis for polynomial representation and we demonstrate an efficient mapping of this design to multiple standard prime field elliptic curves. We use the resulting modular multiplier to demonstrate low-latency multiplications for curves secp112r1 and P-192. We apply our modular multiplier to implement a complete attack on secp112r1 using a Nallatech FSB-Compute platform with Virtex-5 FPGA. The measured performance of the resulting design is 114 cycles per Pollard rho step at 100 MHz, which gives 878 K iterations per second per ECC core. We extend this design to a multicore ECDLP implementation that achieves 14.05 M iterations per second with 16 parallel point addition cores.

Download Full-text

Efficient FPGA Implementation of Modular Multiplication and Exponentiation

Malaysian Journal of Computing and Applied Mathematics ◽

10.37231/myjcam.2020.3.1.37 ◽

2020 ◽

Vol 3 (1) ◽

pp. 1-13

Author(s):

M Issad ◽

M Anane ◽

B Boudraa ◽

A M Bellemou ◽

N Anane

Keyword(s):

Execution Time ◽

Public Key Cryptography ◽

Fpga Implementation ◽

Public Key ◽

Modular Exponentiation ◽

Modular Multiplication ◽

Montgomery Modular Multiplication ◽

Implementation Approach ◽

On Chip ◽

Data Length

This paper presents an FPGA implementation of the most critical operations of Public Key Cryptography (PKC), namely the Modular Exponentiation (ME) and the Modular Multiplication (MM). Both operations are integrated in Hardware (HW) as Programmable System on Chip (PSoC). The processor Microblaze of Xilinx is used for flexibility. Our objective is to achieve a best trade-off between execution time, occupied area and flexibility. In order to satisfy this constraint, Montgomery Power Ladder and Montgomery Modular Multiplication (MMM) algorithms are utilized for the ME and for the MM implementations as HW accelerators, respectively. Our implementation approach is based on the digit-serial method for performing the basic arithmetic operations. Efficient parallel and pipeline strategies are developed at the digit level for the optimization of the execution time. The application for 1024-bits data length shows that the MMM run in 6.24 µs and requires 647 slices. The ME is executed in 6.75 ms, using 2881 slices.

Download Full-text

High-performance, low-power architecture for scalable radix 2 montgomery modular multiplication algorithm

Canadian Journal of Electrical and Computer Engineering ◽

10.1109/cjece.2009.5599422 ◽

2009 ◽

Vol 34 (4) ◽

pp. 152-157 ◽

Cited By ~ 9

Author(s):

Atef Ibrahim ◽

Fayez Gebali ◽

Hamed El-Simary ◽

Amin Nassar

Keyword(s):

Low Power ◽

High Performance ◽

Modular Multiplication ◽

Multiplication Algorithm ◽

Montgomery Modular Multiplication ◽

Power Architecture

Download Full-text

High-Performance RNS Modular Exponentiation by Sum-Residue Reduction

10.21203/rs.3.rs-86431/v1 ◽

2020 ◽

Author(s):

Tao Wu

Keyword(s):

High Performance ◽

Computer Arithmetic ◽

Number System ◽

Residue Number System ◽

Modular Exponentiation ◽

Number Systems ◽

Diffie Hellman ◽

Residue Number ◽

Diffie Hellman Key Exchange ◽

Rsa Cryptography

Abstract Modular exponentiation is fundamental in computer arithmetic and is widely applied in cryptography such as ElGamal cryptography, Diffie-Hellman key exchange protocol, and RSA cryptography. Implementation of modular exponentiation in residue number system leads to high parallelism in computation, and has been applied in many hardware architectures. While most RNS based architectures utilizes RNS Montgomery algorithm with two residue number systems, the recent modular multiplication algorithm with sum-residues performs modular reduction in only one residue number system with about the same parallelism. In this work, it is shown that high-performance modular exponentiation and RSA cryptography can be implemented in RNS. Both the algorithm and architecture are improved to achieve high performance with extra area overheads, where a 1024-bit modular exponentiation can be completed in 0.567 ms in Xilinx XC6VLX195t-3 platform, costing 26,489 slices, 87,357 LUTs, 363 dedicated multipilers of $18\times 18$ bits, and 65 Block RAMs.

Download Full-text