High-performance NTT architecture for large integer multiplication

We have improved our prior implementation of Strassens algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of up to 13.9 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the implementation for an NVIDIA 480, from which we obtain a factor of up to 19 speedup in comparison with a Core i7 processor core of the same technology generation. To provide a fairer chip to chip comparison, we also determined total GPU throughput on a set of multiplications relative to all of the cores on a multicore chip running in parallel. We find that the GTX 480 provides a factor of six higher throughput than all four cores/eight threads of the Core i7. This paper discusses how we adapted the algorithm to operate within the limitations of the GPU and how we dealt with other issues encountered in the implementation process, including details of the memory layout of our FFTs. Compared with our earlier work, which used Karatsuba's algorithm to guide multiplication of different operand sizes built on top of Strassen's algorithm being applied to fixed-size segments of the operands, we are now able to apply Strassen's algorithm directly to operands ranging in size from 255K bits to 16,320K bits.

Download Full-text

Distributed high performance large integer arithmetic

Proceedings. International Conference on Parallel Processing Workshop ◽

10.1109/icppw.2002.1039739 ◽

2003 ◽

Cited By ~ 1

Author(s):

L. Lundberg

Keyword(s):

High Performance ◽

Large Integer ◽

Integer Arithmetic

Download Full-text

Large-Integer Multiplication Based on Homogeneous Polynomials

International Journal of Communications Network and System Sciences ◽

10.4236/ijcns.2012.58054 ◽

2012 ◽

Vol 05 (08) ◽

pp. 437-445

Author(s):

Boris S. Verkhovsky

Keyword(s):

Large Integer ◽

Homogeneous Polynomials ◽

Integer Multiplication

Download Full-text

Implementing RLWE-based Schemes Using an RSA Co-Processor

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2019.i1.169-208 ◽

2018 ◽

pp. 169-208

Author(s):

Martin R. Albrecht ◽

Christian Hanser ◽

Andrea Hoeller ◽

Thomas Pöppelmann ◽

Fernando Virdia ◽

...

Keyword(s):

High Performance ◽

Key Generation ◽

Ideal Lattice ◽

Polynomial Multiplication ◽

Key Encapsulation Mechanism ◽

Speed Up ◽

Identity Cards ◽

Integer Multiplication ◽

Lattice Based Cryptography ◽

Hardware Security Modules

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Download Full-text

Large integer multiplication on massively parallel processors

[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation ◽

10.1109/fmpc.1990.89434 ◽

2002 ◽

Cited By ~ 3

Author(s):

B.S. Fagin

Keyword(s):

Parallel Processors ◽

Massively Parallel ◽

Large Integer ◽

Integer Multiplication

Download Full-text

The optimal split method of large integer multiplication for smart low-end devices on P2P ubiquitous networks

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-012-0189-8 ◽

2012 ◽

Vol 7 (4) ◽

pp. 655-664

Author(s):

Ren-Junn Hwang ◽

Loang-Shing Huang

Keyword(s):

Large Integer ◽

Split Method ◽

Ubiquitous Networks ◽

Integer Multiplication

Download Full-text

Karatsuba-ZOT Multiplication Algorithm and its Application in Cryptography

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.2417 ◽

2012 ◽

Vol 241-244 ◽

pp. 2417-2423 ◽

Cited By ~ 1

Author(s):

Shahram Jahani ◽

Azman Samsudin

Keyword(s):

Public Key Cryptography ◽

Public Key ◽

Large Integer ◽

Compact Representation ◽

Number Representation ◽

The Public ◽

Public Key Cryptosystems ◽

Multiplication Algorithm ◽

Multiplication Operation ◽

Integer Multiplication

The number theory based cryptography algorithms are the most commonly used public-key cryptosystems. One of the fundamental arithmetic operations for such systems is the large integer multiplication. The efficiency of these cryptosystems is directly related to the efficiency of this large integer multiplication operation. Classical multiplication algorithm and Karatsuba multiplication algorithm, and their hybrid, are among the most popular multiplication algorithms used for this purpose. In this paper, we propose a hybrid of Karatsuba and a classical-based multiplication algorithm, enhanced by a new number representation system. The new number representation, known as "Big-Digits”, is used to carry out the sub-multiplication operation in the new multiplication algorithm. Big-Digits has a compact representation with lower Hamming weight. As the result, the number of sub-multiplication operations for the multiplication algorithm that is based on the Big-Digits representation is significantly reduced. Our results show that the proposed multiplication algorithm is significantly faster than the classical, Karasuba and the hybrid of Karatsuba-Classical multiplication algorithms within the implementation domain of the public-key cryptography.

Download Full-text