A General-Purpose Method for Faithfully Rounded Floating-Point Function Approximation in FPGAs

Double precision floating point matrix operations are wildly used in a variety of engineering and scientific computing applications. However, it’s inefficient to achieve these operations using software approaches on general purpose processors. In order to reduce the processing time and satisfy the real-time demand, a reconfigurable coprocessor for double precision floating point matrix algorithms is proposed in this paper. The coprocessor is embedded in a Multi-Processor System on Chip (MPSoC), cooperates with an ARM core and a DSP core for high-performance control and calculation. One algorithm in GPS applications is taken for example to illustrate the efficiency of the coprocessor proposed in this paper. The experiment result shows that the coprocessor can achieve speedup a factor of 50 for the quaternion algorithm of attitude solution in inertial navigation application compare with software execution time of a TI C6713 DSP. The coprocessor is implemented in SMIC 0.13μm CMOS technology, the synthesis time delay is 9.75ns, and the power consumption is 63.69 mW when it works at 100MHz.

Download Full-text

Implementation of Efficient Exponential Function Approximation Algorithm Using Format Converter Based on Floating Point Operation in FPGA

Journal of Institute of Control Robotics and Systems ◽

10.5302/j.icros.2009.15.11.1137 ◽

2009 ◽

Vol 15 (11) ◽

pp. 1137-1143

Keyword(s):

Approximation Algorithm ◽

Exponential Function ◽

Function Approximation ◽

Floating Point ◽

Format Converter

Download Full-text

Bayesian regularization in the problem of point-by-point function approximation using an orthogonalized basis

Mathematical Models and Computer Simulations ◽

10.1134/s2070048212020111 ◽

2012 ◽

Vol 4 (2) ◽

pp. 203-209 ◽

Cited By ~ 1

Author(s):

A. S. Nuzhny

Keyword(s):

Function Approximation ◽

Bayesian Regularization ◽

Point Function

Download Full-text

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Parallel Processing Letters ◽

10.1142/s0129626417500062 ◽

2017 ◽

Vol 27 (03n04) ◽

pp. 1750006 ◽

Cited By ~ 4

Author(s):

Farhad Merchant ◽

Anupam Chattopadhyay ◽

Soumyendu Raha ◽

S. K. Nandy ◽

Ranjani Narayan

Keyword(s):

Linear Algebra ◽

High Performance ◽

Graphics Processing Unit ◽

Building Blocks ◽

General Purpose ◽

Performance Tuning ◽

Floating Point ◽

Processing Unit ◽

Field Programmable ◽

The Impact

Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandwidth of the memory, and structure of the compute resources in the underlying platform. In this paper, we closely investigate the impact of the Floating Point Unit (FPU) micro-architecture for performance tuning of BLAS and LAPACK. We present theoretical analysis for pipeline depth of different floating point operations like multiplier, adder, square root, and divider followed by characterization of BLAS and LAPACK to determine several parameters required in the theoretical framework for deciding optimum pipeline depth of the floating operations. A simple design of a Processing Element (PE) is presented and shown that the PE outperforms the most recent custom realizations of BLAS and LAPACK by 1.1X to 1.5X in GFlops/W, and 1.9X to 2.1X in Gflops/mm2. Compared to multicore, General Purpose Graphics Processing Unit (GPGPU), Field Programmable Gate Array (FPGA), and ClearSpeed CSX700, performance improvement of 1.8-80x is reported in PE.

Download Full-text