Modeling and Execution of Floating Point Parallel Processing Operation for RISC Processor

The development of processors with sundry suggestions have been made regarding a exactitude definition of RISC, but the prosaic concept is that such a computer has a small set of simple and prosaic instructions, instead of an outsized set of intricate and specialized instructions. This project proposes the planning of a high speed 64 bit RISC processor. The miens of this processor consume less power and it contrives on high speed. The processor comprises of sections namely Instruction Fetch section, Instruction Decode section, and Execution section. The ALU within the execution section comprises a double-precision floating-point multiplier designed during a corollary architecture thus improving the speed and veracity of the execution. All the sections are designed using Verilog coding. Monotonous instruction format, cognate prosaic-purpose registers, and pellucid addressing modes were the other miens. RISC exemplified as Reduced Instruction Set Computer. For designing high-performance processors, RISC is considered to be the footing. The RISC processor has a diminished number of Instructions, fixed instruction length, more prosaic-purpose register which are catalogued into the register file, load-store architecture and facilitate addressing modes which make diacritic instruction execute faster and achieve a net gain in performance. Thus the cardinal intent of this paper is to consummate the veridicality by devouring less power, area and with merest delay and it would be done by reinstating the floating-point ALU with single precision section by floating- point double precision section. Video processing, telecommunications and image processing were the high end applications used by architecture

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

A Novel Rounding Algorithm for a High Performance IEEE 754 Double-Precision Floating-Point Multiplier

2020 IEEE 38th International Conference on Computer Design (ICCD) ◽

10.1109/iccd50377.2020.00081 ◽

2020 ◽

Author(s):

S. Ross Thompson ◽

James E. Stine

Keyword(s):

High Performance ◽

Floating Point ◽

Double Precision ◽

Rounding Algorithm

Download Full-text

High Performance and Fault Tolerance Double Precision Floating Point Arithmetic Units

Journal of Artificial Intelligence ◽

10.3923/jai.2013.154.160 ◽

2013 ◽

Vol 6 (2) ◽

pp. 154-160

Author(s):

N. Vinothkuma ◽

M.S. Ravi ◽

Kittur Harish Maillikarj

Keyword(s):

Fault Tolerance ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Floating Point Arithmetic ◽

Arithmetic Units ◽

Point Arithmetic

Download Full-text

Implementation of Low Power Pipelined 64-bit RISC Processor with Unbiased FPU on CPLD

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v5.i2.pp118-123 ◽

2016 ◽

Vol 5 (2) ◽

pp. 118

Author(s):

J. Vijay Kumar ◽

B. Naga Raju ◽

M. Vasu Babu ◽

T. Ramanjappa

Keyword(s):

Low Power ◽

Arithmetic Operation ◽

Floating Point ◽

Double Precision ◽

Verilog Hdl ◽

Logical Function ◽

Floating Point Arithmetic ◽

Risc Processor ◽

Operation Results ◽

Point Arithmetic

This article represents the implementation of low power pipelined 64-bit RISC processor on Altera MAXV CPLD device. The design is verified for arithmetic operations of both fixed and floating point numbers, branch and logical function of RISC processor. For all the jump instruction, the processor architecture will automatically flush the data in the pipeline, so as to avoid any misbehavior. This processor contains FPU unit, which supports double precision IEEE-754 format operations very accurately. The simulation results have been verified by using ModelSim software. The ALU operations and double precision floating point arithmetic operation results are displayed on 7-Segments. The necessary code is written in Verilog HDL.

Download Full-text

High performance and energy efficient single‐precision and double‐precision merged floating‐point adder on FPGA

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2016.0200 ◽

2017 ◽

Vol 12 (1) ◽

pp. 20-29 ◽

Cited By ~ 4

Author(s):

Hao Zhang ◽

Dongdong Chen ◽

Seok‐Bum Ko

Keyword(s):

Energy Efficient ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Single Precision

Download Full-text

Design of a Reconfigurable Coprocessor for Double Precision Floating Point Matrix Algorithms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.58-60.1037 ◽

2011 ◽

Vol 58-60 ◽

pp. 1037-1042

Author(s):

Sheng Long Li ◽

Zhao Lin Li ◽

Qing Wei Zheng

Keyword(s):

High Performance ◽

Cmos Technology ◽

General Purpose ◽

Floating Point ◽

Double Precision ◽

Synthesis Time ◽

Matrix Algorithms ◽

Matrix Operations ◽

On Chip ◽

Software Execution

Double precision floating point matrix operations are wildly used in a variety of engineering and scientific computing applications. However, it’s inefficient to achieve these operations using software approaches on general purpose processors. In order to reduce the processing time and satisfy the real-time demand, a reconfigurable coprocessor for double precision floating point matrix algorithms is proposed in this paper. The coprocessor is embedded in a Multi-Processor System on Chip (MPSoC), cooperates with an ARM core and a DSP core for high-performance control and calculation. One algorithm in GPS applications is taken for example to illustrate the efficiency of the coprocessor proposed in this paper. The experiment result shows that the coprocessor can achieve speedup a factor of 50 for the quaternion algorithm of attitude solution in inertial navigation application compare with software execution time of a TI C6713 DSP. The coprocessor is implemented in SMIC 0.13μm CMOS technology, the synthesis time delay is 9.75ns, and the power consumption is 63.69 mW when it works at 100MHz.

Download Full-text

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0052 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190052 ◽

Cited By ~ 4

Author(s):

Michael Hopkins ◽

Mantas Mikaitis ◽

Dave R. Lester ◽

Steve Furber

Keyword(s):

Fixed Point ◽

Differential Equations ◽

Ordinary Differential Equations ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Least Significant Bit ◽

Fixed Point Arithmetic ◽

Solution Algorithms ◽

Point Arithmetic

Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and memory footprint and bandwidth. However, simply switching to lower-precision types typically results in increased numerical errors. We investigate approaches to improving the accuracy of reduced-precision fixed-point arithmetic types, using examples in an important domain for numerical computation in neuroscience: the solution of ordinary differential equations (ODEs). The Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms. In particular, fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single-precision floating-point and fixed-point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers. A computationally much cheaper alternative is also investigated, inspired by the concept of dither that is a widely understood mechanism for providing resolution below the least significant bit in digital signal processing. These results will have implications for the solution of ODEs in other subject areas, and should also be directly relevant to the huge range of practical problems that are represented by partial differential equations. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

FPC: A High-Speed Compressor for Double-Precision Floating-Point Data

IEEE Transactions on Computers ◽

10.1109/tc.2008.131 ◽

2009 ◽

Vol 58 (1) ◽

pp. 18-31 ◽

Cited By ~ 107

Author(s):

Martin Burtscher ◽

Paruj Ratanaworabhan

Keyword(s):

High Speed ◽

Floating Point ◽

Double Precision ◽

Point Data

Download Full-text

A design of high speed double precision floating point adder using macro modules

Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005. ◽

10.1109/aspdac.2005.1466603 ◽

2005 ◽

Author(s):

Chi Huang ◽

Xinyu Wu ◽

Jinmei Lai ◽

Chengshou Sun ◽

Gang Li

Keyword(s):

High Speed ◽

Floating Point ◽

Double Precision

Download Full-text

DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation

Energies ◽

10.3390/en14020376 ◽

2021 ◽

Vol 14 (2) ◽

pp. 376

Author(s):

Matej Špeťko ◽

Ondřej Vysocký ◽

Branislav Jansík ◽

Lubomír Říha

Keyword(s):

Artificial Intelligence ◽

Thermal Behavior ◽

Energy Efficient ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Face To Face ◽

Scientific Simulations ◽

Performance Computing ◽

Dynamic Frequency

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.

Download Full-text