A DESIGN AND IMPLEMENTATION OF HIGH SPEED IEEE-754 DOUBLE PRECISION FLOATING POINT UNIT BASED ON VEDIC TECHNIQUES

Currently, each CPU has one or additional Floating Point Units (FPUs) integrated inside it. It is usually utilized in math wide-ranging applications, such as digital signal processing. It is found in places be established in engineering, medical and military fields in adding along to in different fields requiring audio, image or video handling. A high-speed and energy-efficient floating point unit is naturally needed in the electronics diligence as an arithmetic unit in microprocessors. The most operations accounting 95% of conformist FPU are multiplication and addition. Many applications need the speedy execution of arithmetic operations. In the existing system, the FPM(Floating Point Multiplication) and FPA(Floating Point Addition) have more delay and fewer speed and fewer throughput. The demand for high speed and throughput intended to design the multiplier and adder blocks within the FPM (Floating point multiplication)and FPA(Floating Point Addition) in a format of single precision floating point and double-precision floating point operation is internally pipelined to achieve high throughput and these are supported by the IEEE 754 standard floating point representations. This is designed with the Verilog code using Xilinx ISE 14.5 software tool is employed to code and verify the ensuing waveforms of the designed code

Download Full-text

Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor

Scientific Programming ◽

10.1155/2009/401691 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 43-57 ◽

Cited By ~ 4

Author(s):

Michael Kistler ◽

John Gunnels ◽

Daniel Brokenshire ◽

Brad Benton

Keyword(s):

High Speed ◽

Double Precision ◽

Data Movement ◽

Processing Elements ◽

Cell Broadband Engine ◽

Design And Implementation ◽

Computational Capability ◽

High Speed Data ◽

Linpack Benchmark ◽

And Performance

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.

Download Full-text

Design and Implementation of Floating Point Unit using 15 nm FIFET

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i37/102131 ◽

2016 ◽

Vol 9 (37) ◽

Author(s):

R. Dhanabal ◽

Sarat Kumar Sahoo

Keyword(s):

Floating Point ◽

Design And Implementation ◽

Floating Point Unit

Download Full-text

Design and Implementation and On-Chip High-Speed Test of SFQ Half-Precision Floating-Point Adders

IEEE Transactions on Applied Superconductivity ◽

10.1109/tasc.2009.2019070 ◽

2009 ◽

Vol 19 (3) ◽

pp. 634-639 ◽

Cited By ~ 14

Author(s):

Heejoung Park ◽

Y. Yamanashi ◽

K. Taketomi ◽

N. Yoshikawa ◽

M. Tanaka ◽

...

Keyword(s):

High Speed ◽

Floating Point ◽

Speed Test ◽

Design And Implementation ◽

On Chip

Download Full-text

Design and Implementation of Double Precision Floating Point Division and Square Root on FPGAs

2006 IEEE Aerospace Conference ◽

10.1109/aero.2006.1655961 ◽

2006 ◽

Cited By ~ 8

Author(s):

A.J. Thakkar ◽

A. Ejnioui

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Root ◽

Design And Implementation

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

FPC: A High-Speed Compressor for Double-Precision Floating-Point Data

IEEE Transactions on Computers ◽

10.1109/tc.2008.131 ◽

2009 ◽

Vol 58 (1) ◽

pp. 18-31 ◽

Cited By ~ 107

Author(s):

Martin Burtscher ◽

Paruj Ratanaworabhan

Keyword(s):

High Speed ◽

Floating Point ◽

Double Precision ◽

Point Data

Download Full-text

A Vector-Like Reconfigurable Floating-Point Unit for the Logarithm

International Journal of Reconfigurable Computing ◽

10.1155/2011/341510 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Nikolaos Alachiotis ◽

Alexandros Stamatakis

Keyword(s):

Reconfigurable Computing ◽

Lookup Table ◽

Floating Point ◽

Double Precision ◽

Single Precision ◽

Reconfigurable Devices ◽

Floating Point Unit ◽

New Generation ◽

Logarithm Function

The use of reconfigurable computing for accelerating floating-point intensive codes is becoming common due to the availability of DSPs in new-generation FPGAs. We present the design of an efficient, pipelined floating-point datapath for calculating the logarithm function on reconfigurable devices. We integrate the datapath into a stand-alone LUT-based (Lookup Table) component, the LAU (Logarithm Approximation Unit). We extended the LAU, by integrating two architecturally independent, LAU-based datapaths into a larger component, the VLAU (vector-like LAU). The VLAU produces 2 results/cycle, while occupying the same amount of memory as the LAU. Under single precision, one LAU is 12 and 1.7 times faster than the GNU and Intel Math Kernel Library (MKL) implementations, respectively. The LAU is also 1.6 times faster than the FloPoCo reconfigurable logarithm architecture. Under double precision, one LAU is 20 and 2.6 times faster than the respective GNU and MKL functions and 1.4 times faster than the FloPoCo logarithm. The VLAU is approximately twice as fast as the LAU, both under single and double precision.

Download Full-text

A design of high speed double precision floating point adder using macro modules

Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005. ◽

10.1109/aspdac.2005.1466603 ◽

2005 ◽

Author(s):

Chi Huang ◽

Xinyu Wu ◽

Jinmei Lai ◽

Chengshou Sun ◽

Gang Li

Keyword(s):

High Speed ◽

Floating Point ◽

Double Precision

Download Full-text

Modeling and Execution of Floating Point Parallel Processing Operation for RISC Processor

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6203.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3783-3789

Keyword(s):

Video Processing ◽

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Risc Processor ◽

Small Set ◽

Reduced Instruction Set Computer ◽

Definition Of ◽

Instruction Format

The development of processors with sundry suggestions have been made regarding a exactitude definition of RISC, but the prosaic concept is that such a computer has a small set of simple and prosaic instructions, instead of an outsized set of intricate and specialized instructions. This project proposes the planning of a high speed 64 bit RISC processor. The miens of this processor consume less power and it contrives on high speed. The processor comprises of sections namely Instruction Fetch section, Instruction Decode section, and Execution section. The ALU within the execution section comprises a double-precision floating-point multiplier designed during a corollary architecture thus improving the speed and veracity of the execution. All the sections are designed using Verilog coding. Monotonous instruction format, cognate prosaic-purpose registers, and pellucid addressing modes were the other miens. RISC exemplified as Reduced Instruction Set Computer. For designing high-performance processors, RISC is considered to be the footing. The RISC processor has a diminished number of Instructions, fixed instruction length, more prosaic-purpose register which are catalogued into the register file, load-store architecture and facilitate addressing modes which make diacritic instruction execute faster and achieve a net gain in performance. Thus the cardinal intent of this paper is to consummate the veridicality by devouring less power, area and with merest delay and it would be done by reinstating the floating-point ALU with single precision section by floating- point double precision section. Video processing, telecommunications and image processing were the high end applications used by architecture

Download Full-text