An Integrated Prime-Field ECDLP Hardware Accelerator with High-Performance Modular Arithmetic Units

Teleoperated robotic systems are those in which human operators control remote robots through a communication network. The deployment and integration of teleoperated robot’s systems in the medical operation have been hampered by many issues, such as safety concerns. Elliptic curve cryptography (ECC), an asymmetric cryptographic algorithm, is widely applied to practical applications because its far significantly reduced key length has the same level of security as RSA. The efficiency of ECC on GF (p) is dictated by two critical factors, namely, modular multiplication (MM) and point multiplication (PM) scheduling. In this paper, the high-performance ECC architecture of SM2 is presented. MM is composed of multiplication and modular reduction (MR) in the prime field. A two-stage modular reduction (TSMR) algorithm in the SCA-256 prime field is introduced to achieve low latency, which avoids more iterative subtraction operations than traditional algorithms. To cut down the run time, a schedule is put forward when exploiting the parallelism of multiplication and MR inside PM. Synthesized with a 0.13 um CMOS standard cell library, the proposed processor consumes 341.98k gate areas, and each PM takes 0.092 ms.

Download Full-text

High Performance and Fault Tolerance Double Precision Floating Point Arithmetic Units

Journal of Artificial Intelligence ◽

10.3923/jai.2013.154.160 ◽

2013 ◽

Vol 6 (2) ◽

pp. 154-160

Author(s):

N. Vinothkuma ◽

M.S. Ravi ◽

Kittur Harish Maillikarj

Keyword(s):

Fault Tolerance ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Floating Point Arithmetic ◽

Arithmetic Units ◽

Point Arithmetic

Download Full-text

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text

QAT: Evaluation of a dedicated hardware accelerator for high performance web service

2018 20th International Conference on Advanced Communication Technology (ICACT) ◽

10.23919/icact.2018.8323723 ◽

2018 ◽

Author(s):

Xue Shuai ◽

Liu Yao ◽

Zhang Wang

Keyword(s):

Web Service ◽

High Performance ◽

Hardware Accelerator ◽

Dedicated Hardware

Download Full-text

FPGA-based hardware accelerator for high-performance data-stream processing

Pattern Recognition and Image Analysis ◽

10.1134/s1054661812030054 ◽

2013 ◽

Vol 23 (1) ◽

pp. 26-34 ◽

Cited By ~ 8

Author(s):

K. F. Lysakov ◽

M. Yu. Shadrin

Keyword(s):

Data Stream ◽

High Performance ◽

Stream Processing ◽

Performance Data ◽

Hardware Accelerator ◽

Data Stream Processing

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units

10.1109/icecs53924.2021.9665459 ◽

2021 ◽

Author(s):

Chen Yang ◽

Siwei Xiang ◽

Jiaxing Wang ◽

Liyan Liang

Keyword(s):

High Performance ◽

Hardware Implementation ◽

Floating Point ◽

Floating Point Arithmetic ◽

Arithmetic Units ◽

Full Utilization ◽

Point Arithmetic

Download Full-text

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

VLSI Design ◽

10.1155/2014/712085 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Antony Savich ◽

Shawki Areibi

Keyword(s):

Low Power ◽

High Performance ◽

Matrix Multiplication ◽

Hardware Accelerator ◽

Power Performance ◽

Matrix Operations ◽

Simulated System ◽

Scalable Hardware ◽

Functional Prototype ◽

High Performance Computation

Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations play an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2 GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36× more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14× more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.

Download Full-text