An efficient and light weight polynomial multiplication for ideal lattice-based cryptography

Author(s):  
Vijay Kumar Yadav ◽  
Shekhar Verma ◽  
S. Venkatesan
Author(s):  
Martin R. Albrecht ◽  
Christian Hanser ◽  
Andrea Hoeller ◽  
Thomas Pöppelmann ◽  
Fernando Virdia ◽  
...  

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.


Author(s):  
Hanno Becker ◽  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Joseph Yiu ◽  
Ingrid Verbauwhede

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber


2017 ◽  
Vol 16 (4) ◽  
pp. 1-24 ◽  
Author(s):  
Zhe Liu ◽  
Thomas Pöppelmann ◽  
Tobias Oder ◽  
Hwajeong Seo ◽  
Sujoy Sinha Roy ◽  
...  

Author(s):  
Sedat Akleylek ◽  
Zaliha Yuce Tok

In this chapter, the aim is to discuss computational aspects of lattice-based cryptographic schemes focused on NTRU in view of the time complexity on a graphical processing unit (GPU). Polynomial multiplication algorithms, having a very important role in lattice-based cryptographic schemes, are implemented on the GPU using the compute unified device architecture (CUDA) platform. They are implemented in both serial and parallel way. Compact and efficient implementation architectures of polynomial multiplication for lattice-based cryptographic schemes are presented for the quotient ring both Zp [x]/(xn-1) and Zp [x]/(xn+1), where p is a prime number. Then, by using these implementations the NTRUEncrypt and signature scheme working over Zp [x]/(xn+1) are implemented on the GPU using CUDA platform. Implementation details are also discussed.


Sign in / Sign up

Export Citation Format

Share Document