scholarly journals Implementing RLWE-based Schemes Using an RSA Co-Processor

Author(s):  
Martin R. Albrecht ◽  
Christian Hanser ◽  
Andrea Hoeller ◽  
Thomas Pöppelmann ◽  
Fernando Virdia ◽  
...  

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Author(s):  
Hanno Becker ◽  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Joseph Yiu ◽  
Ingrid Verbauwhede

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber


2017 ◽  
Vol 16 (4) ◽  
pp. 1-24 ◽  
Author(s):  
Zhe Liu ◽  
Thomas Pöppelmann ◽  
Tobias Oder ◽  
Hwajeong Seo ◽  
Sujoy Sinha Roy ◽  
...  

Author(s):  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Ingrid Verbauwhede

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.


Author(s):  
Jan Richter-Brockmann ◽  
Ming-Shing Chen ◽  
Santosh Ghosh ◽  
Tim Güneysu

BIKE is a Key Encapsulation Mechanism selected as an alternate candidate in NIST’s PQC standardization process, in which performance plays a significant role in the third round. This paper presents FPGA implementations of BIKE with the best area-time performance reported in literature. We optimize two key arithmetic operations, which are the sparse polynomial multiplication and the polynomial inversion. Our sparse multiplier achieves time-constancy for sparse polynomials of indefinite Hamming weight used in BIKE’s encapsulation. The polynomial inversion is based on the extended Euclidean algorithm, which is unprecedented in current BIKE implementations. Our optimized design results in a 5.5 times faster key generation compared to previous implementations based on Fermat’s little theorem.Besides the arithmetic optimizations, we present a united hardware design of BIKE with shared resources and shared sub-modules among KEM functionalities. On Xilinx Artix-7 FPGAs, our light-weight implementation consumes only 3 777 slices and performs a key generation, encapsulation, and decapsulation in 3 797 μs, 443 μs, and 6 896 μs, respectively. Our high-speed design requires 7 332 slices and performs the three KEM operations in 1 672 μs, 132 μs, and 1 892 μs, respectively.


Author(s):  
Erdem Alkim ◽  
Dean Yun-Li Cheng ◽  
Chi-Ming Marvin Chung ◽  
Hülya Evkan ◽  
Leo Wei-Lun Huang ◽  
...  

This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.


Author(s):  
James Howe ◽  
Marco Martinoli ◽  
Elisabeth Oswald ◽  
Francesco Regazzoni

AbstractFrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a $$16\times $$ 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a $$14\times $$ 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.


Author(s):  
Hai-Yan Yin ◽  
Ya-Peng Fan ◽  
Juan Liu ◽  
Dao-Tong Li ◽  
Jing Guo ◽  
...  

AbstractPurinergic signalling adenosine and its A1 receptors have been demonstrated to get involved in the mechanism of acupuncture (needling therapy) analgesia. However, whether purinergic signalling would be responsible for the local analgesic effect of moxibustion therapy, the predominant member in acupuncture family procedures also could trigger analgesic effect on pain diseases, it still remains unclear. In this study, we applied moxibustion to generate analgesic effect on complete Freund’s adjuvant (CFA)-induced inflammatory pain rats and detected the purine released from moxibustioned-acupoint by high-performance liquid chromatography (HPLC) approach. Intramuscular injection of ARL67156 into the acupoint Zusanli (ST36) to inhibit the breakdown of ATP showed the analgesic effect of moxibustion was increased while intramuscular injection of ATPase to speed up ATP hydrolysis caused a reduced moxibustion-induced analgesia. These data implied that purinergic ATP at the location of ST36 acupoint is a potentially beneficial factor for moxibustion-induced analgesia.


Sign in / Sign up

Export Citation Format

Share Document