floating point computation
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 4)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Vol 10 (2) ◽  
pp. 1
Author(s):  
Amira Bibo Sallow

The rapid evolution of floating-point computing capacity and memory in recent years has resulted graphics processing units (GPUs) an increasingly attractive platform to speed scientific applications and are popular rapidly due to the large amount of data that processes the data on time. Fractals have many implementations that involve faster computation and massive amounts of floating-point computation. In this paper, constructing the fractal image algorithm has been implemented both sequential and parallel versions using fractal Mandelbrot and Julia sets. CPU was used for the execution in sequential mode while GPUarray and CUDA kernel was used for the parallel mode. The evaluation of the performance of the constructed algorithms for sequential structure using CPUs (2.20 GHz and 2.60 GHz) and parallelism structure for various models of GPU (GeForce GTX 1060 and GeForce GTX 1660 Ti ) devices, calculated in terms of execution time and speedup to compare between CPU and GPU maximum ability. The results showed that the execution on GPU using GPUArray or GUDA kernel is faster than its sequential implementation using CPU. And the execution using the GUDA kernel is faster than the execution using GPUArray, and the execution time between GPU devices was different, GPU with (Ti) series execute faster than the other models.


2020 ◽  
pp. 23-44
Author(s):  
William J. Kennedy ◽  
James E. Gentle

In complex signal processing applications, Floating Point (FP) arithmetic is a complex, but extremely accurate representation, which needs to be optimizing by architectural modification. This paper describes discrete to fused arithmetic implementation with two, three and four operand FP methodology. Parameters like Area, Power and Delay (APD) are considered for analysis. Exhaustive analysis is carried out here from basic FP component to complete structuring of Four Term Dot Product FP (FTDPFP). Analysis shows that FTDPFP computation improves speed by 89-91% compared to three term and two term computation. Area wise overheads increases in FTDPFP and it is optimized by using new exponent, dual reduction, early normalization, Leading zero participator (LZA), rounding and compounding techniques. Power consumption is optimized with same competency of Two and Three Term Dot Product Floating Point (TTDPFP).


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Xiujie Qu ◽  
Cuimei Ma ◽  
Shixin Zhang ◽  
Sitong Lian

Because of the poor real-time performance of in-place fast Fourier transforms, a reconfigurable radix-4 FFT processor is studied and designed, which is based on decimation-in-time and single floating-point computation. The proposed method adopts “pipeline and parallel” structure for accessing multiple memories to improve the FFT processing speed, and then it is applied to digital pulse compression. The experimental result shows that the proposed FFT based on radix-4 computation can implement digital pulse compression rapidly under no adding hardware resources. The proposed method can be also applied to other radix FFTs.


2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Hasitha Muthumala Waidyasooriya ◽  
Masanori Hariyama ◽  
Yasuhiro Takei ◽  
Michitaka Kameyama

Acceleration of FDTD (finite-difference time-domain) is very important for the fields such as computational electromagnetic simulation. We consider the FDTD simulation model of cylindrical resonator design that requires double precision floating-point and cannot be done using single precision. Conventional FDTD acceleration methods have a common problem of memory-bandwidth limitation due to the large amount of parallel data access. To overcome this problem, we propose a hybrid of single and double precision floating-point computation method that reduces the data-transfer amount. We analyze the characteristics of the FDTD simulation to find out when we can use single precision instead of double precision. According to the experimental results, we achieved over 15 times of speed-up compared to the CPU single-core implementation and over 1.52 times of speed-up compared to the conventional GPU-based implementation.


Robotica ◽  
2013 ◽  
Vol 32 (6) ◽  
pp. 867-887
Author(s):  
Nattee Niparnan ◽  
Thanathorn Phoka ◽  
Yuttana Suttasupa ◽  
Attawith Sudsang

SUMMARYThis paper proposes an efficient implementation of a force-closure test for frictional three-finger grasps. The implementation is based on a condition that transforms force-closure testing into the problem of convex hull intersection in projective space. The proposed implementation further reduces the problem into the problem of computing whether a line segment intersects a convex hull of at most four points. Implementation results are presented along with a thorough performance analysis and comparison with several existing methods. The results are also verified with arbitrary precision floating point computation. This provides comparison of qualitative error resulting from floating point roundoff. The result shows that the proposed implementation outperforms other methods in terms of speed and precision.


Sign in / Sign up

Export Citation Format

Share Document