High-Performance Computation in Residue Number System Using Floating-Point Arithmetic

Konstantin Isupov

doi:10.3390/computation9020009

High-Performance Computation in Residue Number System Using Floating-Point Arithmetic

Computation ◽

10.3390/computation9020009 ◽

2021 ◽

Vol 9 (2) ◽

pp. 9

Author(s):

Konstantin Isupov

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Dynamic Range ◽

Practical Interest ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Mixed Radix Conversion ◽

Multiple Precision ◽

Residue Number

Residue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection, and division are still hard to implement in RNS. For such operations, most of the methods proposed in the literature only support small dynamic ranges (up to several tens of bits), so they are only suitable for low-precision applications. We recently proposed a method that supports arbitrary moduli sets with cryptographically sized dynamic ranges, up to several thousands of bits. The practical interest of our method compared to existing methods is that it relies only on very fast standard floating-point operations, so it is suitable for multiple-precision applications and can be efficiently implemented on many general-purpose platforms that support IEEE 754 arithmetic. In this paper, we make further improvements to this method and demonstrate that it can successfully be applied to implement efficient data-parallel primitives operating in the RNS domain, namely finding the maximum element of an array of RNS numbers on graphics processing units. Our experimental results on an NVIDIA RTX 2080 GPU show that for random residues and a 128-moduli set with 2048-bit dynamic range, the proposed implementation reduces the running time by a factor of 39 and the memory consumption by a factor of 13 compared to an implementation based on mixed-radix conversion.

Download Full-text

Multiple-Precision BLAS Library for Graphics Processing Units

10.36227/techrxiv.12580301.v1 ◽

2020 ◽

Author(s):

Konstantin Isupov ◽

Vladimir Knyazkov

Keyword(s):

Graphics Processing Units ◽

Arithmetic Operation ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Data Types ◽

Rounding Errors ◽

Multiple Precision ◽

Graphics Processing ◽

Point Arithmetic

The binary32 and binary64 floating-point formats provide good performance on current hardware, but also introduce a rounding error in almost every arithmetic operation. Consequently, the accumulation of rounding errors in large computations can cause accuracy issues. One way to prevent these issues is to use multiple-precision floating-point arithmetic. This preprint, submitted to Russian Supercomputing Days 2020, presents a new library of basic linear algebra operations with multiple precision for graphics processing units. The library is written in CUDA C/C++ and uses the residue number system to represent multiple-precision significands of floating-point numbers. The supported data types, memory layout, and main features of the library are considered. Experimental results are presented showing the performance of the library.

Download Full-text

A MEMORYLESS REVERSE CONVERTER FOR THE 4-MODULI SUPERSET {2n - 1, 2n, 2n + 1, 2n + 1 - 1}

Journal of Circuits System and Computers ◽

10.1142/s0218126600000044 ◽

2000 ◽

Vol 10 (01n02) ◽

pp. 85-99 ◽

Cited By ~ 37

Author(s):

A. P VINOD ◽

A. BENJAMIN PREMKUMAR

Keyword(s):

High Performance ◽

Dynamic Range ◽

Number System ◽

Residue Number System ◽

Main Challenge ◽

Reverse Converter ◽

Residue Number ◽

Compensation Technique ◽

The Common ◽

Performance Computing

This paper presents a residue number system to binary converter in the four moduli set {2n - 1, 2n, 2n + 1, 2n + 1 - 1}, valid for even values of n. This moduli set is an extension of the popular set {2n - 1, 2n + 1}. The number theoretic properties of the moduli set of the form 2n ± 1 are exploited to design the converter. The main challenge of dealing with fractions in Residue Number System is overcome by using the fraction compensation technique. A hardware implementation using only adders is also proposed. When compared to the common three moduli reverse converters, this four moduli converter offers a larger dynamic range and higher parallelism, which makes it useful for high performance computing.

Download Full-text

Interval Estimation of Relative Values in Residue Number System

Journal of Circuits System and Computers ◽

10.1142/s0218126618500044 ◽

2017 ◽

Vol 27 (01) ◽

pp. 1850004 ◽

Cited By ~ 6

Author(s):

Konstantin Isupov ◽

Vladimir Knyazkov

Keyword(s):

High Speed ◽

Interval Estimation ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Limiting Factor ◽

Small Integer ◽

Magnitude Comparison ◽

Mixed Radix Conversion ◽

Residue Number

Residue number system (RNS), due to its carry-free nature, is popular in many applications of high-speed computer arithmetic, especially in digital signal processing and cryptography. However, the main limiting factor of RNS is a high complexity of such operations as magnitude comparison, sign determination and overflow detection. These operations have, for many years, been a major obstacle to more widespread use of parallel residue arithmetic. This paper presents a new efficient method to perform these operations, which is based on computation and analysis of the interval estimation for the relative value of an RNS number. The estimation, which is called the interval floating-point characteristic (IFC), is represented by two directed rounded bounds that are fixed-precision numbers. Generally, the time complexities of serial and parallel computations of IFC are linear and logarithmic functions of the size of the moduli set, respectively. The new method requires only small-integer and fixed-precision floating-point operations and focuses on arbitrary moduli sets with large dynamic ranges ([Formula: see text]). Experiments indicate that the performance of the proposed method is significantly higher than that of methods based on Mixed-Radix Conversion.

Download Full-text

Area Efficient Memoryless Reverse Converter for New Four Moduli Set {2n−1,2n−1,2n+1,22n+1−1}

Journal of Circuits System and Computers ◽

10.1142/s0218126618500755 ◽

2018 ◽

Vol 27 (05) ◽

pp. 1850075 ◽

Cited By ~ 2

Author(s):

Ritesh Kumar Jaiswal ◽

Raj Kumar ◽

Ram Awadh Mishra

Keyword(s):

Dynamic Range ◽

Number System ◽

Residue Number System ◽

Hardware Complexity ◽

Reverse Conversion ◽

Reverse Converter ◽

Mixed Radix Conversion ◽

Residue Number ◽

Least Area ◽

State Of Art

The efficiency of residue number system depends on the reverse converter due to several modulo operations like addition, subtraction and multiplication. In this paper, a design of new four moduli set [Formula: see text], reverse converter is presented. The moduli set have moduli with length ranging from ([Formula: see text]) to ([Formula: see text])-bits. The reverse conversion for moduli set [Formula: see text] has been optimized in existing state of art. Thus, proposed converter is based on two new moduli set [Formula: see text] and utilizes the mixed radix conversion. This converter is memoryless, and occupies least area. The proposed converter is based on carry save adder (CSA) and modulo adder enabling more speed and less hardware complexity for dynamic range of [Formula: see text]-bit, offering good area-delay product.

Download Full-text

LARGE DYNAMIC RANGE RNS SYSTEMS AND THEIR RESIDUE TO BINARY CONVERTERS

Journal of Circuits System and Computers ◽

10.1142/s0218126607003666 ◽

2007 ◽

Vol 16 (02) ◽

pp. 267-286 ◽

Cited By ~ 4

Author(s):

ALEXANDER SKAVANTZOS ◽

MOHAMMAD ABDALLAH ◽

THANOS STOURAITIS

Keyword(s):

High Speed ◽

Dynamic Range ◽

Chinese Remainder Theorem ◽

Digital Signal ◽

Number System ◽

Residue Number System ◽

New Class ◽

Mixed Radix Conversion ◽

Residue Number ◽

Signal Processors

The Residue Number System (RNS) is an integer system appropriate for implementing fast digital signal processors. It can be used for supporting high-speed arithmetic by operating in parallel channels without need for exchanging information among the channels. In this paper, two novel RNS are proposed. First, a new RNS system based on the modulus set {2n+1, 2n - 1, 2n + 1, 2n + 2(n+1)/2 + 1, 2n - 2(n+1)/2 + 1}, n odd, is developed, along with an efficient implementation of its residue-to-weighted converter. The new RNS is a balanced five-modulus system, appropriate for large dynamic ranges. The proposed residue-to-binary converter is fast and hardware efficient and is based on a one's complement multi-operand adder that adds operands of size only 80% of the size dictated by the system's dynamic range. Second, a new class of multi-modulus RNS systems is proposed. These systems are based on sets consisting of two groups of moduli with the modulus product within one group being of the form 2a(2b - 1), while the modulus product within the other group is of the form 2c - 1. Their RNS-to-weighted converters are based on efficient combinations of the Chinese Remainder Theorem and Mixed Radix Conversion decoding techniques. Systems based on four, five, and seven moduli are constructed and analyzed. The new systems allow efficient implementations for their RNS-to-weighted decoders, imply fast and balanced RNS arithmetic, and may achieve large dynamic ranges. The presented residue-to-weighted converters for these systems rely on simple mod (2x - 1) hardware, which can be easily implemented as one's complement hardware.

Download Full-text

Multiple-Precision BLAS Library for Graphics Processing Units

10.36227/techrxiv.12580301 ◽

2020 ◽

Author(s):

Konstantin Isupov ◽

Vladimir Knyazkov

Keyword(s):

Graphics Processing Units ◽

Arithmetic Operation ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Data Types ◽

Rounding Errors ◽

Multiple Precision ◽

Graphics Processing ◽

Point Arithmetic

Download Full-text

Perspective and Opportunities of Modulo 2n−1 Multipliers in Residue Number System: A Review

Journal of Circuits System and Computers ◽

10.1142/s0218126620300081 ◽

2020 ◽

Vol 29 (11) ◽

pp. 2030008

Author(s):

Raj Kumar ◽

Ritesh Kumar Jaiswal ◽

Ram Awadh Mishra

Keyword(s):

Word Length ◽

Dynamic Range ◽

High Dynamic Range ◽

Number System ◽

Residue Number System ◽

System A ◽

Essential Components ◽

Residue Number ◽

Speed Up ◽

Computational Circuits

Modulo multiplier has been attracting considerable attention as one of the essential components of residue number system (RNS)-based computational circuits. This paper contributes a comprehensive review in the design of modulo [Formula: see text] multipliers for the first time. The modulo multipliers can be implemented using ROM (look-up-table) as well as VLSI components (memoryless); however, the former is preferable for lower word-length and later for larger word-length. The modular and parallelism properties of RNS are used to improve the performance of memoryless multipliers. Moreover, a Booth-encoding algorithm is used to speed-up the multipliers. Also, an advanced modulo [Formula: see text] multiplier based on redundant RNS (RRNS) could be further chosen for very high dynamic range. These perspectives of modulo [Formula: see text] multipliers have been extensively studied for recent state-of-the-art and analyzed using Synopsis design compiler tool.

Download Full-text

High-Performance Digital Filtering on Truncated Multiply-Accumulate Units in the Residue Number System

IEEE Access ◽

10.1109/access.2020.3038496 ◽

2020 ◽

Vol 8 ◽

pp. 209181-209190

Author(s):

Pavel Lyakhov ◽

Maria Valueva ◽

Georgii Valuev ◽

Nikolai Nagornov

Keyword(s):

High Performance ◽

Digital Filtering ◽

Number System ◽

Residue Number System ◽

Residue Number

Download Full-text

Parallel multiple-precision arithmetic based on residue number system

Program systems theory and applications ◽

10.25209/2079-3316-2016-7-1-61-97 ◽

2016 ◽

Vol 7 (1) ◽

pp. 61-97 ◽

Cited By ~ 4

Author(s):

K.S. Isupov ◽

◽

V.S. Knyazkov ◽

Keyword(s):

Number System ◽

Residue Number System ◽

Multiple Precision ◽

Residue Number ◽

Multiple Precision Arithmetic

Download Full-text

High-Performance RNS Modular Exponentiation by Sum-Residue Reduction

10.21203/rs.3.rs-86431/v1 ◽

2020 ◽

Author(s):

Tao Wu

Keyword(s):

High Performance ◽

Computer Arithmetic ◽

Number System ◽

Residue Number System ◽

Modular Exponentiation ◽

Number Systems ◽

Diffie Hellman ◽

Residue Number ◽

Diffie Hellman Key Exchange ◽

Rsa Cryptography

Abstract Modular exponentiation is fundamental in computer arithmetic and is widely applied in cryptography such as ElGamal cryptography, Diffie-Hellman key exchange protocol, and RSA cryptography. Implementation of modular exponentiation in residue number system leads to high parallelism in computation, and has been applied in many hardware architectures. While most RNS based architectures utilizes RNS Montgomery algorithm with two residue number systems, the recent modular multiplication algorithm with sum-residues performs modular reduction in only one residue number system with about the same parallelism. In this work, it is shown that high-performance modular exponentiation and RSA cryptography can be implemented in RNS. Both the algorithm and architecture are improved to achieve high performance with extra area overheads, where a 1024-bit modular exponentiation can be completed in 0.567 ms in Xilinx XC6VLX195t-3 platform, costing 26,489 slices, 87,357 LUTs, 363 dedicated multipilers of $18\times 18$ bits, and 65 Block RAMs.

Download Full-text