floating point arithmetic Latest Research Papers

AbstractThis paper concerns test matrices for numerical linear algebra using an error-free transformation of floating-point arithmetic. For specified eigenvalues given by a user, we propose methods of generating a matrix whose eigenvalues are exactly known based on, for example, Schur or Jordan normal form and a block diagonal form. It is also possible to produce a real matrix with specified complex eigenvalues. Such test matrices with exactly known eigenvalues are useful for numerical algorithms in checking the accuracy of computed results. In particular, exact errors of eigenvalues can be monitored. To generate test matrices, we first propose an error-free transformation for the product of three matrices YSX. We approximate S by ${S^{\prime }}$ S ′ to compute ${YS^{\prime }X}$ Y S ′ X without a rounding error. Next, the error-free transformation is applied to the generation of test matrices with exactly known eigenvalues. Note that the exactly known eigenvalues of the constructed matrix may differ from the anticipated given eigenvalues. Finally, numerical examples are introduced in checking the accuracy of numerical computations for symmetric and unsymmetric eigenvalue problems.

Download Full-text

Floating Point Arithmetic Unit with Multi-Precision for DSP Applications

10.1109/icecct52121.2021.9616759 ◽

2021 ◽

Author(s):

M. VishnuPriya ◽

B. Nancharaiah

Keyword(s):

Floating Point ◽

Arithmetic Unit ◽

Floating Point Arithmetic ◽

Point Arithmetic ◽

Dsp Applications

Download Full-text

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Electronics ◽

10.3390/electronics10161912 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1912

Author(s):

Georgios Flamis ◽

Stavros Kalapothas ◽

Paris Kitsos

Keyword(s):

Best Practices ◽

Back Propagation ◽

The Other ◽

Floating Point ◽

Training Phase ◽

Floating Point Arithmetic ◽

Complete Development ◽

Design Cycle ◽

Point Arithmetic ◽

Trained Network

The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow and what are the critical pieces towards the most optimal performance. The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms. The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. A much lower bit-width and fixed point arithmetic is used aiming a good result with reduced footprint and power consumption. This study follows the survey based process and it is aimed to provide answers such as to clarify all AI edge hardware design aspects from the concept to the final implementation and evaluation. The technology as frameworks and procedures are presented to the order of execution for a complete design cycle with guaranteed success.

Download Full-text

Low Bitwidth CNN Accelerator on FPGA Using Winograd and Block Floating Point Arithmetic

10.1109/isvlsi51109.2021.00048 ◽

2021 ◽

Author(s):

Yuk Wong ◽

Zhenjiang Dong ◽

Wei Zhang

Keyword(s):

Floating Point ◽

Floating Point Arithmetic ◽

Point Arithmetic

Download Full-text

A Comparative Study between Discrete Stochastic Arithmetic and Floating-Point Arithmetic to Validate the Results of Fractional Order Model of Malaria Infection

Mathematics ◽

10.3390/math9121435 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1435

Author(s):

Samad Noeiaghdam ◽

Aliona Dreglea ◽

Hüseyin Işık ◽

Muhammad Suleman

Keyword(s):

Exact Solution ◽

Malaria Infection ◽

Floating Point ◽

Successive Approximations ◽

Order Model ◽

Cestac Method ◽

Stochastic Arithmetic ◽

Floating Point Arithmetic ◽

Fractional Order Model ◽

Point Arithmetic

The researchers aimed to study the nonlinear fractional order model of malaria infection based on the Caputo-Fabrizio fractional derivative. The homotopy analysis transform method (HATM) is applied based on the floating-point arithmetic (FPA) and the discrete stochastic arithmetic (DSA). In the FPA, to show the accuracy of the method we use the absolute error which depends on the exact solution and a positive value ε. Because in real life problems we do not have the exact solution and the optimal value of ε, we need to introduce a new condition and arithmetic to show the efficiency of the method. Thus the CESTAC (Controle et Estimation Stochastique des Arrondis de Calculs) method and the CADNA (Control of Accuracy and Debugging for Numerical Applications) library are applied. The CESTAC method is based on the DSA. Also, a new termination criterion is used which is based on two successive approximations. Using the CESTAC method we can find the optimal approximation, the optimal error and the optimal iteration of the method. The main theorem of the CESTAC method is proved to show that the number of common significant digits (NCSDs) between two successive approximations are almost equal to the NCSDs of the exact and approximate solutions. Plotting several graphs, the regions of convergence are demonstrated for different number of iterations k = 5, 10. The numerical results based on the simulated data show the advantages of the DSA in comparison with the FPA.

Download Full-text

Development of Library Components for Floating Point Processor

Journal of Ultra Scientist of Physical Sciences Section A ◽

10.22147/jusps-a/330402 ◽

2021 ◽

Vol 33 (4) ◽

pp. 42-50

Author(s):

SUBHASH KUMAR SHARMA ◽

◽

SHRI PRAKASH DUBEY ◽

ANIL KUMAR MISHRA ◽

◽

...

Keyword(s):

Hardware Description Language ◽

Floating Point ◽

Description Language ◽

Floating Point Arithmetic ◽

Arithmetic Logic Unit ◽

Hardware Description ◽

Point Arithmetic ◽

Logic Unit

This paper deals with development of an n-bit binary to decimal conversion, decimal to n bit binary conversion and decimal to IEEE-754 conversion for floating point arithmetic logic unit (FPALU) using VHDL. Normally most of the industries now a days are using either 4-bit conversion of ALU or 8-bit conversions of ALU, so we have generalized this, thus we need not to worry about the bit size of conversion of ALU. It has solved all the problems of 4-bit, 8-bit, 16-bit conversions of ALU’s and so on. Hence, we have utilized VHSIC Hardware Description Language and Xilinx in accomplishing this task of development of conversions processes of ALU

Download Full-text

On fixpoints of Higham’s function

Carpathian Journal of Mathematics ◽

10.37193/cjm.2021.02.20 ◽

2021 ◽

Vol 37 (2) ◽

pp. 355-360

Author(s):

RADU T. TRÎMBIŢAŞ

Keyword(s):

Floating Point ◽

Double Precision ◽

Square Roots ◽

Floating Point Arithmetic ◽

Ieee Standard ◽

Point Arithmetic ◽

Floating Point Numbers

We study the strange behavior in floating-point arithmetic of a function proposed by Nicholas Higham, consisting of repeated square roots extraction followed by the same number of times squaring and find its fixpoints. For IEEE standard double precision floating point numbers the fixpoints have the form \[ x \in \left\{\left( 1+k\mathrm{eps}\right) ^{\frac{1}{\mathrm{eps}}},\quad k=\left[ -745:\frac{1}{2}:-\frac{1}{2},0:709\right]\right\} \cup \{0\} , \] where \mathrm{eps} is the machine epsilon."

Download Full-text

floating point arithmetic
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units

NetFC: Enabling Accurate Floating-point Arithmetic on Programmable Switches

Precision Exploration of Floating-Point Arithmetic for Spiking Neural Networks

Generation of test matrices with specified eigenvalues using floating-point arithmetic

Floating Point Arithmetic Unit with Multi-Precision for DSP Applications

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Low Bitwidth CNN Accelerator on FPGA Using Winograd and Block Floating Point Arithmetic

A Comparative Study between Discrete Stochastic Arithmetic and Floating-Point Arithmetic to Validate the Results of Fractional Order Model of Malaria Infection

Development of Library Components for Floating Point Processor

On fixpoints of Higham’s function

Export Citation Format

floating point arithmeticRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units

NetFC: Enabling Accurate Floating-point Arithmetic on Programmable Switches

Precision Exploration of Floating-Point Arithmetic for Spiking Neural Networks

Generation of test matrices with specified eigenvalues using floating-point arithmetic

Floating Point Arithmetic Unit with Multi-Precision for DSP Applications

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Low Bitwidth CNN Accelerator on FPGA Using Winograd and Block Floating Point Arithmetic

A Comparative Study between Discrete Stochastic Arithmetic and Floating-Point Arithmetic to Validate the Results of Fractional Order Model of Malaria Infection

Development of Library Components for Floating Point Processor

On fixpoints of Higham’s function

floating point arithmetic
Recently Published Documents