scholarly journals Data Flow Oriented Hardware Design of RNS-based Polynomial Multiplication for SHE Acceleration

Author(s):  
Joël Cathébras ◽  
Alexandre Carbon ◽  
Peter Milder ◽  
Renaud Sirdey ◽  
Nicolas Ventroux

This paper presents a hardware implementation of a Residue Polynomial Multiplier (RPM), designed to accelerate the full Residue Number System (RNS) variant of the Fan-Vercauteren scheme proposed by Bajard et al. [BEHZ16]. Our design speeds up polynomial multiplication via a Negative Wrapped Convolution (NWC) which locally computes the required RNS channel dependent twiddle factors. Compared to related works, this design is more versatile regarding the addressable parameter sets for the BFV scheme. This is mainly brought by our proposed twiddle factor generator that makes the design BRAM utilization independent of the RNS basis size, with a negligible communication bandwidth usage for non-payload data. Furthermore, the generalization of a DFT hardware generator is explored in order to generate RNS friendly NTT architectures. This approach helps us to validate our RPM design over parameter sets from the work of Halevi et al. [HPS18]. For the depth-20 setting, we achieve an estimated speed up for the residue polynomial multiplications greater than 76 during ciphertexts multiplication, and greater than 16 during relinearization. It thus results in a single-threaded Mult&Relin ciphertext operation in 109.4 ms (×3.19 faster than [HPS18]) with RPM counting for less than 15% of the new computation time. Our RPM design scales up with reasonable use of hardware resources and realistic bandwidth requirements. It can also be exploited for other RNS based implementations of RLWE cryptosystems.

2020 ◽  
Vol 29 (11) ◽  
pp. 2030008
Author(s):  
Raj Kumar ◽  
Ritesh Kumar Jaiswal ◽  
Ram Awadh Mishra

Modulo multiplier has been attracting considerable attention as one of the essential components of residue number system (RNS)-based computational circuits. This paper contributes a comprehensive review in the design of modulo [Formula: see text] multipliers for the first time. The modulo multipliers can be implemented using ROM (look-up-table) as well as VLSI components (memoryless); however, the former is preferable for lower word-length and later for larger word-length. The modular and parallelism properties of RNS are used to improve the performance of memoryless multipliers. Moreover, a Booth-encoding algorithm is used to speed-up the multipliers. Also, an advanced modulo [Formula: see text] multiplier based on redundant RNS (RRNS) could be further chosen for very high dynamic range. These perspectives of modulo [Formula: see text] multipliers have been extensively studied for recent state-of-the-art and analyzed using Synopsis design compiler tool.


2021 ◽  
pp. 15-21
Author(s):  
Pavel Alekseyevich Lyakhov ◽  
Andrey Sergeevich Ionisyan ◽  
Violetta Vladimirovna Masaeva ◽  
Maria Vasilevna Valueva

Author(s):  
Carlos Arturo Gayoso ◽  
Claudio Gonzalez ◽  
Leonardo Arnone ◽  
Miguel Rabini ◽  
Jorge Castineira Moreira

Author(s):  
Shamim Akhter ◽  
Divya Bareja ◽  
Satyendra Kumar

<p><strong>Notice of Retraction</strong></p><p>-----------------------------------------------------------------------<br />After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of IAES's Publication Principles.<br /><br />We hereby retract the content of this paper. Reasonable effort should be made to remove all past references to this paper.<br /><br />The presenting author of this paper has the option to appeal this decision by contacting [email protected].</p><p>-----------------------------------------------------------------------</p><p>Residue Number System (RNS) is a very old number system which was proposed in 1500 AD. Parallel nature for mathematical operations in RNS results in faster computation. This paper deals with designing of modulo multiplication in RNS. Direct computation of |AB|<sub>m</sub>, requires multiplier to get A.B first and then Mod-m calculator to get the final result. We have used Vedic technique along with RNS to improve the computation time for modulo multiplication. This paper is aimed at designing and analysis of modulo multiplier for special moduli set like 3, 5 and 7. Comparative analysis in terms of area and delay is performed for input data size (N=8, 16 and 32-bit) between proposed technique and direct computation using Xilinx ISE 14.1. Design is also been compared using Synopsys Design Compiler with 32 nm Std_Cell Library. It is found that proposed technique is more efficient in terms of speed when input data size increases.</p>


2019 ◽  
Vol 43 (5) ◽  
pp. 857-868 ◽  
Author(s):  
N.I. Chervyakov ◽  
P.A. Lyakhov ◽  
N.N. Nagornov ◽  
M.V. Valueva ◽  
G.V. Valuev

Modern convolutional neural networks architectures are very resource intensive which limits the possibilities for their wide practical application. We propose a convolutional neural network architecture in which the neural network is divided into hardware and software parts to increase performance and reduce the cost of implementation resources. We also propose to use the residue number system in the hardware part to implement the convolutional layer of the neural network for resource costs reducing. A numerical method for quantizing the filters coefficients of a convolutional network layer is proposed to minimize the influence of quantization noise on the calculation result in the residue number system and determine the bit-width of the filters coefficients. This method is based on scaling the coefficients by a fixed number of bits and rounding up and down. The operations used make it possible to reduce resources in hardware implementation due to the simplifying of their execution. All calculations in the convolutional layer are performed on numbers in a fixed-point format. Software simulations using Matlab 2017b showed that convolutional neural network with a minimum number of layers can be quickly and successfully trained. Hardware implementation using the field-programmable gate array Kintex7 xc7k70tfbg484-2 showed that the use of residue number system in the convolutional layer of the neural network reduces the hardware costs by 32.6% compared with the traditional approach based on the two’s complement representation. The research results can be applied to create effective video surveillance systems, for recognizing handwriting, individuals, objects and terrain.


Sign in / Sign up

Export Citation Format

Share Document