Implementing RLWE-based Schemes Using an RSA Co-Processor

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Download Full-text

Polynomial multiplication on embedded vector architectures

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.482-505 ◽

2021 ◽

pp. 482-505

Author(s):

Hanno Becker ◽

Jose Maria Bermudo Mera ◽

Angshuman Karmakar ◽

Joseph Yiu ◽

Ingrid Verbauwhede

Keyword(s):

Instruction Scheduling ◽

Polynomial Multiplication ◽

Performance Improvements ◽

Low Area ◽

Memory Efficiency ◽

Key Encapsulation Mechanism ◽

Profile Vector ◽

And Performance ◽

Lattice Based Cryptography ◽

High Degree

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

Download Full-text

High-Performance Ideal Lattice-Based Cryptography on 8-Bit AVR Microcontrollers

ACM Transactions on Embedded Computing Systems ◽

10.1145/3092951 ◽

2017 ◽

Vol 16 (4) ◽

pp. 1-24 ◽

Cited By ~ 7

Author(s):

Zhe Liu ◽

Thomas Pöppelmann ◽

Tobias Oder ◽

Hwajeong Seo ◽

Sujoy Sinha Roy ◽

...

Keyword(s):

High Performance ◽

Ideal Lattice ◽

Lattice Based Cryptography

Download Full-text

High-Performance Ideal Lattice-Based Cryptography on 8-Bit ATxmega Microcontrollers

Progress in Cryptology -- LATINCRYPT 2015 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-22174-8_19 ◽

2015 ◽

pp. 346-365 ◽

Cited By ~ 32

Author(s):

Thomas Pöppelmann ◽

Tobias Oder ◽

Tim Güneysu

Keyword(s):

High Performance ◽

Ideal Lattice ◽

Lattice Based Cryptography

Download Full-text

An efficient and light weight polynomial multiplication for ideal lattice-based cryptography

Multimedia Tools and Applications ◽

10.1007/s11042-020-09706-8 ◽

2020 ◽

Author(s):

Vijay Kumar Yadav ◽

Shekhar Verma ◽

S. Venkatesan

Keyword(s):

Light Weight ◽

Ideal Lattice ◽

Polynomial Multiplication ◽

Lattice Based Cryptography

Download Full-text

Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2020.i2.222-244 ◽

2020 ◽

pp. 222-244

Author(s):

Jose Maria Bermudo Mera ◽

Angshuman Karmakar ◽

Ingrid Verbauwhede

Keyword(s):

Linear Transformations ◽

Polynomial Multiplication ◽

Matrix Vector Multiplication ◽

Long Time ◽

Key Encapsulation Mechanism ◽

Learning With Errors ◽

Lattice Based Cryptography ◽

Learning With Errors Problem ◽

Matrix Vector ◽

Processing Steps

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.

Download Full-text

Racing BIKE: Improved Polynomial Multiplication and Inversion in Hardware

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.557-588 ◽

2021 ◽

pp. 557-588

Author(s):

Jan Richter-Brockmann ◽

Ming-Shing Chen ◽

Santosh Ghosh ◽

Tim Güneysu

Keyword(s):

High Speed ◽

Optimized Design ◽

Key Generation ◽

Shared Resources ◽

Polynomial Multiplication ◽

Sparse Polynomial ◽

Sparse Polynomials ◽

Key Encapsulation Mechanism ◽

Standardization Process ◽

High Speed Design

BIKE is a Key Encapsulation Mechanism selected as an alternate candidate in NIST’s PQC standardization process, in which performance plays a significant role in the third round. This paper presents FPGA implementations of BIKE with the best area-time performance reported in literature. We optimize two key arithmetic operations, which are the sparse polynomial multiplication and the polynomial inversion. Our sparse multiplier achieves time-constancy for sparse polynomials of indefinite Hamming weight used in BIKE’s encapsulation. The polynomial inversion is based on the extended Euclidean algorithm, which is unprecedented in current BIKE implementations. Our optimized design results in a 5.5 times faster key generation compared to previous implementations based on Fermat’s little theorem.Besides the arithmetic optimizations, we present a united hardware design of BIKE with shared resources and shared sub-modules among KEM functionalities. On Xilinx Artix-7 FPGAs, our light-weight implementation consumes only 3 777 slices and performs a key generation, encapsulation, and decapsulation in 3 797 μs, 443 μs, and 6 896 μs, respectively. Our high-speed design requires 7 332 slices and performs the three KEM operations in 1 672 μs, 132 μs, and 1 892 μs, respectively.

Download Full-text

Polynomial Multiplication in NTRU Prime

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i1.217-238 ◽

2020 ◽

pp. 217-238

Author(s):

Erdem Alkim ◽

Dean Yun-Li Cheng ◽

Chi-Ming Marvin Chung ◽

Hülya Evkan ◽

Leo Wei-Lun Huang ◽

...

Keyword(s):

Polynomial Ring ◽

State Of The Art ◽

The Other ◽

Polynomial Rings ◽

Key Generation ◽

Polynomial Multiplication ◽

Current State ◽

Key Encapsulation Mechanism

This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.

Download Full-text

Exploring Parallelism to Improve the Performance of FrodoKEM in Hardware

Journal of Cryptographic Engineering ◽

10.1007/s13389-021-00258-7 ◽

2021 ◽

Author(s):

James Howe ◽

Marco Martinoli ◽

Elisabeth Oswald ◽

Francesco Regazzoni

Keyword(s):

Stream Cipher ◽

State Of The Art ◽

Matrix Multiplication ◽

First Order ◽

The Matrix ◽

Key Encapsulation Mechanism ◽

Speed Up ◽

Previous State ◽

Lattice Based Cryptography ◽

Hardware Designs

AbstractFrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a $$16\times $$ 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a $$14\times $$ 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.

Download Full-text

Purinergic ATP triggers moxibustion-induced local anti-nociceptive effect on inflammatory pain model

Purinergic Signalling ◽

10.1007/s11302-021-09815-5 ◽

2021 ◽

Author(s):

Hai-Yan Yin ◽

Ya-Peng Fan ◽

Juan Liu ◽

Dao-Tong Li ◽

Jing Guo ◽

...

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Inflammatory Pain ◽

Analgesic Effect ◽

Intramuscular Injection ◽

High Performance ◽

Atp Hydrolysis ◽

Purinergic Signalling ◽

Pain Model ◽

Speed Up

AbstractPurinergic signalling adenosine and its A1 receptors have been demonstrated to get involved in the mechanism of acupuncture (needling therapy) analgesia. However, whether purinergic signalling would be responsible for the local analgesic effect of moxibustion therapy, the predominant member in acupuncture family procedures also could trigger analgesic effect on pain diseases, it still remains unclear. In this study, we applied moxibustion to generate analgesic effect on complete Freund’s adjuvant (CFA)-induced inflammatory pain rats and detected the purine released from moxibustioned-acupoint by high-performance liquid chromatography (HPLC) approach. Intramuscular injection of ARL67156 into the acupoint Zusanli (ST36) to inhibit the breakdown of ATP showed the analgesic effect of moxibustion was increased while intramuscular injection of ATPase to speed up ATP hydrolysis caused a reduced moxibustion-induced analgesia. These data implied that purinergic ATP at the location of ST36 acupoint is a potentially beneficial factor for moxibustion-induced analgesia.

Download Full-text

Graphic processors to speed-up simulations for the design of high performance solar receptors

2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP) ◽

10.1109/asap.2007.4459293 ◽

2007 ◽

Cited By ~ 2

Author(s):

Sylvain Collange ◽

Marc Daumas ◽

David Defour

Keyword(s):

High Performance ◽

Speed Up

Download Full-text