polynomial multiplication
Recently Published Documents


TOTAL DOCUMENTS

132
(FIVE YEARS 37)

H-INDEX

14
(FIVE YEARS 3)

Author(s):  
Hanno Becker ◽  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Joseph Yiu ◽  
Ingrid Verbauwhede

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber


Author(s):  
Hanno Becker ◽  
Vincent Hwang ◽  
Matthias J. Kannwischer ◽  
Bo-Yin Yang ◽  
Shang-Yi Yang

We present new speed records on the Armv8-A architecture for the latticebased schemes Dilithium, Kyber, and Saber. The core novelty in this paper is the combination of Montgomery multiplication and Barrett reduction resulting in “Barrett multiplication” which allows particularly efficient modular one-known-factor multiplication using the Armv8-A Neon vector instructions. These novel techniques combined with fast two-unknown-factor Montgomery multiplication, Barrett reduction sequences, and interleaved multi-stage butterflies result in significantly faster code. We also introduce “asymmetric multiplication” which is an improved technique for caching the results of the incomplete NTT, used e.g. for matrix-to-vector polynomial multiplication. Our implementations target the Arm Cortex-A72 CPU, on which our speed is 1.7× that of the state-of-the-art matrix-to-vector polynomial multiplication in kyber768 [Nguyen–Gaj 2021]. For Saber, NTTs are far superior to Toom–Cook multiplication on the Armv8-A architecture, outrunning the matrix-to-vector polynomial multiplication by 2.0×. On the Apple M1, our matrix-vector products run 2.1× and 1.9× faster for Kyber and Saber respectively.


Author(s):  
Jan Richter-Brockmann ◽  
Ming-Shing Chen ◽  
Santosh Ghosh ◽  
Tim Güneysu

BIKE is a Key Encapsulation Mechanism selected as an alternate candidate in NIST’s PQC standardization process, in which performance plays a significant role in the third round. This paper presents FPGA implementations of BIKE with the best area-time performance reported in literature. We optimize two key arithmetic operations, which are the sparse polynomial multiplication and the polynomial inversion. Our sparse multiplier achieves time-constancy for sparse polynomials of indefinite Hamming weight used in BIKE’s encapsulation. The polynomial inversion is based on the extended Euclidean algorithm, which is unprecedented in current BIKE implementations. Our optimized design results in a 5.5 times faster key generation compared to previous implementations based on Fermat’s little theorem.Besides the arithmetic optimizations, we present a united hardware design of BIKE with shared resources and shared sub-modules among KEM functionalities. On Xilinx Artix-7 FPGAs, our light-weight implementation consumes only 3 777 slices and performs a key generation, encapsulation, and decapsulation in 3 797 μs, 443 μs, and 6 896 μs, respectively. Our high-speed design requires 7 332 slices and performs the three KEM operations in 1 672 μs, 132 μs, and 1 892 μs, respectively.


2021 ◽  
Author(s):  
Jigang Yang ◽  
Zhenmin Li ◽  
Jingwei Ren ◽  
Xiaolei Wang ◽  
Wei Ni ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Mingyang Song ◽  
Yingpeng Sang ◽  
Yuying Zeng ◽  
Shunchao Luo

The efficiency of fully homomorphic encryption has always affected its practicality. With the dawn of Internet of things, the demand for computation and encryption on resource-constrained devices is increasing. Complex cryptographic computing is a major burden for those devices, while outsourcing can provide great convenience for them. In this paper, we firstly propose a generic blockchain-based framework for secure computation outsourcing and then propose an algorithm for secure outsourcing of polynomial multiplication into the blockchain. Our algorithm for polynomial multiplication can reduce the local computation cost to O n . Previous work based on Fast Fourier Transform can only achieve O n log n for the local cost. Finally, we integrate the two secure outsourcing schemes for polynomial multiplication and modular exponentiation into the fully homomorphic encryption using hidden ideal lattice and get an outsourcing scheme of fully homomorphic encryption. Through security analysis, our schemes achieve the goals of privacy protection against passive attackers and cheating detection against active attackers. Experiments also demonstrate our schemes are more efficient in comparisons with the corresponding nonoutsourcing schemes.


Sign in / Sign up

Export Citation Format

Share Document