Efficient architectures for implementing montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic

Author(s):  
Alan Daly ◽  
William Marnane
2019 ◽  
Vol 28 (13) ◽  
pp. 1950229 ◽  
Author(s):  
M. Issad ◽  
B. Boudraa ◽  
M. Anane ◽  
A. M. Bellemou

This paper presents an FPGA implementation of the most critical operations of Public Key Cryptography (PKC), namely the Modular Exponentiation (ME) and the Modular Multiplication (MM). Both operations are integrated as Programmable System on Chip (PSoC) where the processor Microblaze of Xilinx is used for flexibility. Our objective is to achieve a best trade-off between time execution, occupied area and flexibility. The implementation of these operations on such environment requires taking into account several criteria. Indeed, the Hardware (HW) architectures data bus should be smaller than the input data length. The design must be scalable to support different security levels. The implementation achieves optimums execution time and HW resources number. In order to satisfy these constraints, Montgomery Power Ladder (MPL) and Montgomery Modular Multiplication (MMM) algorithms are utilized for the ME and the MM implementations as HW accelerators, respectively. Our implementation approach is based on the digit-serial method for performing the basic arithmetic operations. Efficient parallel and pipeline strategies are developed at the digit level for the optimization of the execution times. The application for 1024-bits data length shows that the MMM run in 6.24[Formula: see text][Formula: see text]s and requires 647 slices. The ME is executed in 6.75[Formula: see text]ms using 2881 slices.


2011 ◽  
Vol 20 (03) ◽  
pp. 531-548 ◽  
Author(s):  
KOOROUSH MANOCHEHRI ◽  
BABAK SADEGHIYAN ◽  
SAADAT POURMOZAFARI

Modular calculations are widely used in many applications, especially in public key cryptography. Such operations are very time consuming, due to their long operands. To improve the performance of these calculations, many methods have been introduced. Montgomery modular multiplication is an example of such a solution to enhance the performance of modular multiplication and modular exponentiation. The radix-2 version of this method is simple and fast for hardware implementation, where multi-operand adders are required for its implementation. So far, Carry-Save-Adder (CSA) gives the best performance for multi-addition. In this paper, we propose a new recoding method for the Montgomery modular multiplier to enhance its performance. This is done through replacing CSA blocks with new blocks that have better performances than CSA in multi-addition calculations. With this replacement, we can theoretically have up to 40% reduction in area gates. In our experiments, we obtained 5.8% area reduction and 3% speed improvement in a hardware implementation. The idea behind our proposed method is the use of bitwise subtraction operator, where no carry propagation is needed. This recoding method of operands can also be used in many aspects of computer arithmetic, algorithms and computational hardware, such as multiplication, exponentiation and etc., in order to enhance their performances.


2011 ◽  
Vol 2011 ◽  
pp. 1-10 ◽  
Author(s):  
Guilherme Perin ◽  
Daniel Gomes Mesquita ◽  
João Baptista Martins

This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare thetime×areaefficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23 ms, and the multiplexed architecture executes the same operation in 4.36 ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance.


2020 ◽  
Vol 3 (1) ◽  
pp. 1-13
Author(s):  
M Issad ◽  
M Anane ◽  
B Boudraa ◽  
A M Bellemou ◽  
N Anane

This paper presents an FPGA implementation of the most critical operations of Public Key Cryptography (PKC), namely the Modular Exponentiation (ME) and the Modular Multiplication (MM). Both operations are integrated in Hardware (HW) as Programmable System on Chip (PSoC). The processor Microblaze of Xilinx is used for flexibility. Our objective is to achieve a best trade-off between execution time, occupied area and flexibility. In order to satisfy this constraint, Montgomery Power Ladder and Montgomery Modular Multiplication (MMM) algorithms are utilized for the ME and for the MM implementations as HW accelerators, respectively. Our implementation approach is based on the digit-serial method for performing the basic arithmetic operations. Efficient parallel and pipeline strategies are developed at the digit level for the optimization of the execution time. The application for 1024-bits data length shows that the MMM run in 6.24 µs and requires 647 slices. The ME is executed in 6.75 ms, using 2881 slices.


2017 ◽  
Vol E100.B (5) ◽  
pp. 680-690 ◽  
Author(s):  
Yang LI ◽  
Jinlin WANG ◽  
Xuewen ZENG ◽  
Xiaozhou YE

Author(s):  
Johannes Mittmann ◽  
Werner Schindler

AbstractMontgomery’s and Barrett’s modular multiplication algorithms are widely used in modular exponentiation algorithms, e.g. to compute RSA or ECC operations. While Montgomery’s multiplication algorithm has been studied extensively in the literature and many side-channel attacks have been detected, to our best knowledge no thorough analysis exists for Barrett’s multiplication algorithm. This article closes this gap. For both Montgomery’s and Barrett’s multiplication algorithm, differences of the execution times are caused by conditional integer subtractions, so-called extra reductions. Barrett’s multiplication algorithm allows even two extra reductions, and this feature increases the mathematical difficulties significantly. We formulate and analyse a two-dimensional Markov process, from which we deduce relevant stochastic properties of Barrett’s multiplication algorithm within modular exponentiation algorithms. This allows to transfer the timing attacks and local timing attacks (where a second side-channel attack exhibits the execution times of the particular modular squarings and multiplications) on Montgomery’s multiplication algorithm to attacks on Barrett’s algorithm. However, there are also differences. Barrett’s multiplication algorithm requires additional attack substeps, and the attack efficiency is much more sensitive to variations of the parameters. We treat timing attacks on RSA with CRT, on RSA without CRT, and on Diffie–Hellman, as well as local timing attacks against these algorithms in the presence of basis blinding. Experiments confirm our theoretical results.


Sign in / Sign up

Export Citation Format

Share Document