A DESIGN AND IMPLEMENTATION OF HIGH SPEED IEEE-754 DOUBLE PRECISION FLOATING POINT UNIT BASED ON VEDIC TECHNIQUES

2017 ◽  
Vol 8 (1) ◽  
pp. 19
Author(s):  
AGRAWAL RAHUL KUMAR ◽  
TYAGI RHYTHM ◽  
TRIPATHI MADAN MOHAN ◽  
◽  
◽  
...  

Currently, each CPU has one or additional Floating Point Units (FPUs) integrated inside it. It is usually utilized in math wide-ranging applications, such as digital signal processing. It is found in places be established in engineering, medical and military fields in adding along to in different fields requiring audio, image or video handling. A high-speed and energy-efficient floating point unit is naturally needed in the electronics diligence as an arithmetic unit in microprocessors. The most operations accounting 95% of conformist FPU are multiplication and addition. Many applications need the speedy execution of arithmetic operations. In the existing system, the FPM(Floating Point Multiplication) and FPA(Floating Point Addition) have more delay and fewer speed and fewer throughput. The demand for high speed and throughput intended to design the multiplier and adder blocks within the FPM (Floating point multiplication)and FPA(Floating Point Addition) in a format of single precision floating point and double-precision floating point operation is internally pipelined to achieve high throughput and these are supported by the IEEE 754 standard floating point representations. This is designed with the Verilog code using Xilinx ISE 14.5 software tool is employed to code and verify the ensuing waveforms of the designed code


2009 ◽  
Vol 17 (1-2) ◽  
pp. 43-57 ◽  
Author(s):  
Michael Kistler ◽  
John Gunnels ◽  
Daniel Brokenshire ◽  
Brad Benton

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.


2009 ◽  
Vol 19 (3) ◽  
pp. 634-639 ◽  
Author(s):  
Heejoung Park ◽  
Y. Yamanashi ◽  
K. Taketomi ◽  
N. Yoshikawa ◽  
M. Tanaka ◽  
...  

2014 ◽  
Vol 550 ◽  
pp. 126-136
Author(s):  
N. Ramya Rani

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.


2009 ◽  
Vol 58 (1) ◽  
pp. 18-31 ◽  
Author(s):  
Martin Burtscher ◽  
Paruj Ratanaworabhan

2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Nikolaos Alachiotis ◽  
Alexandros Stamatakis

The use of reconfigurable computing for accelerating floating-point intensive codes is becoming common due to the availability of DSPs in new-generation FPGAs. We present the design of an efficient, pipelined floating-point datapath for calculating the logarithm function on reconfigurable devices. We integrate the datapath into a stand-alone LUT-based (Lookup Table) component, the LAU (Logarithm Approximation Unit). We extended the LAU, by integrating two architecturally independent, LAU-based datapaths into a larger component, the VLAU (vector-like LAU). The VLAU produces 2 results/cycle, while occupying the same amount of memory as the LAU. Under single precision, one LAU is 12 and 1.7 times faster than the GNU and Intel Math Kernel Library (MKL) implementations, respectively. The LAU is also 1.6 times faster than the FloPoCo reconfigurable logarithm architecture. Under double precision, one LAU is 20 and 2.6 times faster than the respective GNU and MKL functions and 1.4 times faster than the FloPoCo logarithm. The VLAU is approximately twice as fast as the LAU, both under single and double precision.


The development of processors with sundry suggestions have been made regarding a exactitude definition of RISC, but the prosaic concept is that such a computer has a small set of simple and prosaic instructions, instead of an outsized set of intricate and specialized instructions. This project proposes the planning of a high speed 64 bit RISC processor. The miens of this processor consume less power and it contrives on high speed. The processor comprises of sections namely Instruction Fetch section, Instruction Decode section, and Execution section. The ALU within the execution section comprises a double-precision floating-point multiplier designed during a corollary architecture thus improving the speed and veracity of the execution. All the sections are designed using Verilog coding. Monotonous instruction format, cognate prosaic-purpose registers, and pellucid addressing modes were the other miens. RISC exemplified as Reduced Instruction Set Computer. For designing high-performance processors, RISC is considered to be the footing. The RISC processor has a diminished number of Instructions, fixed instruction length, more prosaic-purpose register which are catalogued into the register file, load-store architecture and facilitate addressing modes which make diacritic instruction execute faster and achieve a net gain in performance. Thus the cardinal intent of this paper is to consummate the veridicality by devouring less power, area and with merest delay and it would be done by reinstating the floating-point ALU with single precision section by floating- point double precision section. Video processing, telecommunications and image processing were the high end applications used by architecture


Sign in / Sign up

Export Citation Format

Share Document