On the Efficiency of Polynomial Multiplication for Lattice-Based Cryptography on GPUs Using CUDA

Polynomial multiplication on embedded vector architectures

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.482-505 ◽

2021 ◽

pp. 482-505

Author(s):

Hanno Becker ◽

Jose Maria Bermudo Mera ◽

Angshuman Karmakar ◽

Joseph Yiu ◽

Ingrid Verbauwhede

Keyword(s):

Instruction Scheduling ◽

Polynomial Multiplication ◽

Performance Improvements ◽

Low Area ◽

Memory Efficiency ◽

Key Encapsulation Mechanism ◽

Profile Vector ◽

And Performance ◽

Lattice Based Cryptography ◽

High Degree

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

Download Full-text

Towards efficient polynomial multiplication for lattice-based cryptography

2016 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2016.7527456 ◽

2016 ◽

Cited By ~ 5

Author(s):

Chaohui Du ◽

Guoqiang Bai

Keyword(s):

Polynomial Multiplication ◽

Lattice Based Cryptography

Download Full-text

Implementing RLWE-based Schemes Using an RSA Co-Processor

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2019.i1.169-208 ◽

2018 ◽

pp. 169-208

Author(s):

Martin R. Albrecht ◽

Christian Hanser ◽

Andrea Hoeller ◽

Thomas Pöppelmann ◽

Fernando Virdia ◽

...

Keyword(s):

High Performance ◽

Key Generation ◽

Ideal Lattice ◽

Polynomial Multiplication ◽

Key Encapsulation Mechanism ◽

Speed Up ◽

Identity Cards ◽

Integer Multiplication ◽

Lattice Based Cryptography ◽

Hardware Security Modules

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Download Full-text

Sparse polynomial multiplication for lattice-based cryptography with small complexity

The Journal of Supercomputing ◽

10.1007/s11227-015-1570-1 ◽

2015 ◽

Vol 72 (2) ◽

pp. 438-450 ◽

Cited By ~ 2

Author(s):

Sedat Akleylek ◽

Erdem Alkım ◽

Zaliha Yüce Tok

Keyword(s):

Polynomial Multiplication ◽

Sparse Polynomial ◽

Lattice Based Cryptography

Download Full-text

Computational Aspects of Lattice-Based Cryptography on Graphical Processing Unit

Improving Information Security Practices through Computational Intelligence - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-9426-2.ch010 ◽

2016 ◽

pp. 255-284 ◽

Cited By ~ 1

Author(s):

Sedat Akleylek ◽

Zaliha Yuce Tok

Keyword(s):

Quotient Ring ◽

Graphical Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Signature Scheme ◽

Polynomial Multiplication ◽

Device Architecture ◽

Computational Aspects ◽

Graphical Processing ◽

Lattice Based Cryptography

In this chapter, the aim is to discuss computational aspects of lattice-based cryptographic schemes focused on NTRU in view of the time complexity on a graphical processing unit (GPU). Polynomial multiplication algorithms, having a very important role in lattice-based cryptographic schemes, are implemented on the GPU using the compute unified device architecture (CUDA) platform. They are implemented in both serial and parallel way. Compact and efficient implementation architectures of polynomial multiplication for lattice-based cryptographic schemes are presented for the quotient ring both Zp [x]/(xn-1) and Zp [x]/(xn+1), where p is a prime number. Then, by using these implementations the NTRUEncrypt and signature scheme working over Zp [x]/(xn+1) are implemented on the GPU using CUDA platform. Implementation details are also discussed.

Download Full-text

An efficient and light weight polynomial multiplication for ideal lattice-based cryptography

Multimedia Tools and Applications ◽

10.1007/s11042-020-09706-8 ◽

2020 ◽

Author(s):

Vijay Kumar Yadav ◽

Shekhar Verma ◽

S. Venkatesan

Keyword(s):

Light Weight ◽

Ideal Lattice ◽

Polynomial Multiplication ◽

Lattice Based Cryptography

Download Full-text

Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2020.i2.222-244 ◽

2020 ◽

pp. 222-244

Author(s):

Jose Maria Bermudo Mera ◽

Angshuman Karmakar ◽

Ingrid Verbauwhede

Keyword(s):

Linear Transformations ◽

Polynomial Multiplication ◽

Matrix Vector Multiplication ◽

Long Time ◽

Key Encapsulation Mechanism ◽

Learning With Errors ◽

Lattice Based Cryptography ◽

Learning With Errors Problem ◽

Matrix Vector ◽

Processing Steps

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.

Download Full-text

Optimized Schoolbook Polynomial Multiplication for Compact Lattice-Based Cryptography on FPGA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2019.2922999 ◽

2019 ◽

Vol 27 (10) ◽

pp. 2459-2463 ◽

Cited By ~ 8

Author(s):

Weiqiang Liu ◽

Sailong Fan ◽

Ayesha Khalid ◽

Ciara Rafferty ◽

Maire O'Neill

Keyword(s):

Polynomial Multiplication ◽

Lattice Based Cryptography ◽

Compact Lattice

Download Full-text

Efficient Three-Way Split Formulas for Binary Polynomial Multiplication and Toeplitz Matrix Vector Product

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e101.a.239 ◽

2018 ◽

Vol E101.A (1) ◽

pp. 239-248

Author(s):

Sun-Mi PARK ◽

Ku-Young CHANG ◽

Dowon HONG ◽

Changho SEO

Keyword(s):

Toeplitz Matrix ◽

Vector Product ◽

Polynomial Multiplication ◽

Matrix Vector

Download Full-text

A High Speed NTT Accelerator for Lattice-Based Cryptography

2021 International Conference on Communications, Information System and Computer Engineering (CISCE) ◽

10.1109/cisce52179.2021.9445982 ◽

2021 ◽

Author(s):

Chongyang Li ◽

Wenping zhu ◽

Leibo Liu

Keyword(s):

High Speed ◽

Lattice Based Cryptography

Download Full-text