scholarly journals Parallel modular multiplication using 512-bit advanced vector instructions

Author(s):  
Benjamin Buhrow ◽  
Barry Gilbert ◽  
Clifton Haider

AbstractApplications such as public-key cryptography are critically reliant on the speed of modular multiplication for their performance. This paper introduces a new block-based variant of Montgomery multiplication, the Block Product Scanning (BPS) method, which is particularly efficient using new 512-bit advanced vector instructions (AVX-512) on modern Intel processor families. Our parallel-multiplication approach also allows for squaring and sub-quadratic Karatsuba enhancements. We demonstrate $$1.9\,\times $$ 1.9 × improvement in decryption throughput in comparison with OpenSSL and $$1.5\,\times $$ 1.5 × improvement in modular exponentiation throughput compared to GMP-6.1.2 on an Intel Xeon CPU. In addition, we show $$1.4\,\times $$ 1.4 × improvement in decryption throughput in comparison with state-of-the-art vector implementations on many-core Knights Landing Xeon Phi hardware. Finally, we show how interleaving Chinese remainder theorem-based RSA calculations within our parallel BPS technique halves decryption latency while providing protection against fault-injection attacks.

Author(s):  
Henitsoa Rakotomalala ◽  
Xuan Thuy Ngo ◽  
Zakaria Najm ◽  
Jean-Luc Danger ◽  
Sylvain Guilley

2011 ◽  
Vol 1 (4) ◽  
pp. 265-270 ◽  
Author(s):  
Sho Endo ◽  
Takeshi Sugawara ◽  
Naofumi Homma ◽  
Takafumi Aoki ◽  
Akashi Satoh

Sign in / Sign up

Export Citation Format

Share Document