High-performance NTT architecture for large integer multiplication

Author(s):  
Jheng-Hao Ye ◽  
Ming-Der Shieh
2011 ◽  
Vol 21 (03) ◽  
pp. 359-375 ◽  
Author(s):  
NIALL EMMART ◽  
CHARLES C. WEEMS

We have improved our prior implementation of Strassens algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of up to 13.9 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the implementation for an NVIDIA 480, from which we obtain a factor of up to 19 speedup in comparison with a Core i7 processor core of the same technology generation. To provide a fairer chip to chip comparison, we also determined total GPU throughput on a set of multiplications relative to all of the cores on a multicore chip running in parallel. We find that the GTX 480 provides a factor of six higher throughput than all four cores/eight threads of the Core i7. This paper discusses how we adapted the algorithm to operate within the limitations of the GPU and how we dealt with other issues encountered in the implementation process, including details of the memory layout of our FFTs. Compared with our earlier work, which used Karatsuba's algorithm to guide multiplication of different operand sizes built on top of Strassen's algorithm being applied to fixed-size segments of the operands, we are now able to apply Strassen's algorithm directly to operands ranging in size from 255K bits to 16,320K bits.


Author(s):  
Martin R. Albrecht ◽  
Christian Hanser ◽  
Andrea Hoeller ◽  
Thomas Pöppelmann ◽  
Fernando Virdia ◽  
...  

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.


2012 ◽  
Vol 241-244 ◽  
pp. 2417-2423 ◽  
Author(s):  
Shahram Jahani ◽  
Azman Samsudin

The number theory based cryptography algorithms are the most commonly used public-key cryptosystems. One of the fundamental arithmetic operations for such systems is the large integer multiplication. The efficiency of these cryptosystems is directly related to the efficiency of this large integer multiplication operation. Classical multiplication algorithm and Karatsuba multiplication algorithm, and their hybrid, are among the most popular multiplication algorithms used for this purpose. In this paper, we propose a hybrid of Karatsuba and a classical-based multiplication algorithm, enhanced by a new number representation system. The new number representation, known as "Big-Digits”, is used to carry out the sub-multiplication operation in the new multiplication algorithm. Big-Digits has a compact representation with lower Hamming weight. As the result, the number of sub-multiplication operations for the multiplication algorithm that is based on the Big-Digits representation is significantly reduced. Our results show that the proposed multiplication algorithm is significantly faster than the classical, Karasuba and the hybrid of Karatsuba-Classical multiplication algorithms within the implementation domain of the public-key cryptography.


Sign in / Sign up

Export Citation Format

Share Document