An efficient and light weight polynomial multiplication for ideal lattice-based cryptography

Implementing RLWE-based Schemes Using an RSA Co-Processor

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2019.i1.169-208 ◽

2018 ◽

pp. 169-208

Author(s):

Martin R. Albrecht ◽

Christian Hanser ◽

Andrea Hoeller ◽

Thomas Pöppelmann ◽

Fernando Virdia ◽

...

Keyword(s):

High Performance ◽

Key Generation ◽

Ideal Lattice ◽

Polynomial Multiplication ◽

Key Encapsulation Mechanism ◽

Speed Up ◽

Identity Cards ◽

Integer Multiplication ◽

Lattice Based Cryptography ◽

Hardware Security Modules

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Download Full-text

Polynomial multiplication on embedded vector architectures

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.482-505 ◽

2021 ◽

pp. 482-505

Author(s):

Hanno Becker ◽

Jose Maria Bermudo Mera ◽

Angshuman Karmakar ◽

Joseph Yiu ◽

Ingrid Verbauwhede

Keyword(s):

Instruction Scheduling ◽

Polynomial Multiplication ◽

Performance Improvements ◽

Low Area ◽

Memory Efficiency ◽

Key Encapsulation Mechanism ◽

Profile Vector ◽

And Performance ◽

Lattice Based Cryptography ◽

High Degree

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

Download Full-text

Towards efficient polynomial multiplication for lattice-based cryptography

2016 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2016.7527456 ◽

2016 ◽

Cited By ~ 5

Author(s):

Chaohui Du ◽

Guoqiang Bai

Keyword(s):

Polynomial Multiplication ◽

Lattice Based Cryptography

Download Full-text

High-Performance Ideal Lattice-Based Cryptography on 8-Bit AVR Microcontrollers

ACM Transactions on Embedded Computing Systems ◽

10.1145/3092951 ◽

2017 ◽

Vol 16 (4) ◽

pp. 1-24 ◽

Cited By ~ 7

Author(s):

Zhe Liu ◽

Thomas Pöppelmann ◽

Tobias Oder ◽

Hwajeong Seo ◽

Sujoy Sinha Roy ◽

...

Keyword(s):

High Performance ◽

Ideal Lattice ◽

Lattice Based Cryptography

Download Full-text

Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography

Cryptology and Network Security - Lecture Notes in Computer Science ◽

10.1007/978-3-319-48965-0_8 ◽

2016 ◽

pp. 124-139 ◽

Cited By ~ 31

Author(s):

Patrick Longa ◽

Michael Naehrig

Keyword(s):

Ideal Lattice ◽

Lattice Based Cryptography

Download Full-text

Sparse polynomial multiplication for lattice-based cryptography with small complexity

The Journal of Supercomputing ◽

10.1007/s11227-015-1570-1 ◽

2015 ◽

Vol 72 (2) ◽

pp. 438-450 ◽

Cited By ~ 2

Author(s):

Sedat Akleylek ◽

Erdem Alkım ◽

Zaliha Yüce Tok

Keyword(s):

Polynomial Multiplication ◽

Sparse Polynomial ◽

Lattice Based Cryptography

Download Full-text

Computational Aspects of Lattice-Based Cryptography on Graphical Processing Unit

Improving Information Security Practices through Computational Intelligence - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-9426-2.ch010 ◽

2016 ◽

pp. 255-284 ◽

Cited By ~ 1

Author(s):

Sedat Akleylek ◽

Zaliha Yuce Tok

Keyword(s):

Quotient Ring ◽

Graphical Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Signature Scheme ◽

Polynomial Multiplication ◽

Device Architecture ◽

Computational Aspects ◽

Graphical Processing ◽

Lattice Based Cryptography

In this chapter, the aim is to discuss computational aspects of lattice-based cryptographic schemes focused on NTRU in view of the time complexity on a graphical processing unit (GPU). Polynomial multiplication algorithms, having a very important role in lattice-based cryptographic schemes, are implemented on the GPU using the compute unified device architecture (CUDA) platform. They are implemented in both serial and parallel way. Compact and efficient implementation architectures of polynomial multiplication for lattice-based cryptographic schemes are presented for the quotient ring both Zp [x]/(xn-1) and Zp [x]/(xn+1), where p is a prime number. Then, by using these implementations the NTRUEncrypt and signature scheme working over Zp [x]/(xn+1) are implemented on the GPU using CUDA platform. Implementation details are also discussed.

Download Full-text