floating point arithmetic
Recently Published Documents


TOTAL DOCUMENTS

460
(FIVE YEARS 73)

H-INDEX

33
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Penglai Cui ◽  
Heng Pan ◽  
Zhenyu Li ◽  
Jiaoren Wu ◽  
Shengzhuo Zhang ◽  
...  

Author(s):  
Katsuhisa Ozaki ◽  
Takeshi Ogita

AbstractThis paper concerns test matrices for numerical linear algebra using an error-free transformation of floating-point arithmetic. For specified eigenvalues given by a user, we propose methods of generating a matrix whose eigenvalues are exactly known based on, for example, Schur or Jordan normal form and a block diagonal form. It is also possible to produce a real matrix with specified complex eigenvalues. Such test matrices with exactly known eigenvalues are useful for numerical algorithms in checking the accuracy of computed results. In particular, exact errors of eigenvalues can be monitored. To generate test matrices, we first propose an error-free transformation for the product of three matrices YSX. We approximate S by ${S^{\prime }}$ S ′ to compute ${YS^{\prime }X}$ Y S ′ X without a rounding error. Next, the error-free transformation is applied to the generation of test matrices with exactly known eigenvalues. Note that the exactly known eigenvalues of the constructed matrix may differ from the anticipated given eigenvalues. Finally, numerical examples are introduced in checking the accuracy of numerical computations for symmetric and unsymmetric eigenvalue problems.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1912
Author(s):  
Georgios Flamis ◽  
Stavros Kalapothas ◽  
Paris Kitsos

The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow and what are the critical pieces towards the most optimal performance. The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms. The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. A much lower bit-width and fixed point arithmetic is used aiming a good result with reduced footprint and power consumption. This study follows the survey based process and it is aimed to provide answers such as to clarify all AI edge hardware design aspects from the concept to the final implementation and evaluation. The technology as frameworks and procedures are presented to the order of execution for a complete design cycle with guaranteed success.


Mathematics ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 1435
Author(s):  
Samad Noeiaghdam ◽  
Aliona Dreglea ◽  
Hüseyin Işık ◽  
Muhammad Suleman

The researchers aimed to study the nonlinear fractional order model of malaria infection based on the Caputo-Fabrizio fractional derivative. The homotopy analysis transform method (HATM) is applied based on the floating-point arithmetic (FPA) and the discrete stochastic arithmetic (DSA). In the FPA, to show the accuracy of the method we use the absolute error which depends on the exact solution and a positive value ε. Because in real life problems we do not have the exact solution and the optimal value of ε, we need to introduce a new condition and arithmetic to show the efficiency of the method. Thus the CESTAC (Controle et Estimation Stochastique des Arrondis de Calculs) method and the CADNA (Control of Accuracy and Debugging for Numerical Applications) library are applied. The CESTAC method is based on the DSA. Also, a new termination criterion is used which is based on two successive approximations. Using the CESTAC method we can find the optimal approximation, the optimal error and the optimal iteration of the method. The main theorem of the CESTAC method is proved to show that the number of common significant digits (NCSDs) between two successive approximations are almost equal to the NCSDs of the exact and approximate solutions. Plotting several graphs, the regions of convergence are demonstrated for different number of iterations k = 5, 10. The numerical results based on the simulated data show the advantages of the DSA in comparison with the FPA.


2021 ◽  
Vol 33 (4) ◽  
pp. 42-50
Author(s):  
SUBHASH KUMAR SHARMA ◽  
◽  
SHRI PRAKASH DUBEY ◽  
ANIL KUMAR MISHRA ◽  
◽  
...  

This paper deals with development of an n-bit binary to decimal conversion, decimal to n bit binary conversion and decimal to IEEE-754 conversion for floating point arithmetic logic unit (FPALU) using VHDL. Normally most of the industries now a days are using either 4-bit conversion of ALU or 8-bit conversions of ALU, so we have generalized this, thus we need not to worry about the bit size of conversion of ALU. It has solved all the problems of 4-bit, 8-bit, 16-bit conversions of ALU’s and so on. Hence, we have utilized VHSIC Hardware Description Language and Xilinx in accomplishing this task of development of conversions processes of ALU


2021 ◽  
Vol 37 (2) ◽  
pp. 355-360
Author(s):  
RADU T. TRÎMBIŢAŞ

We study the strange behavior in floating-point arithmetic of a function proposed by Nicholas Higham, consisting of repeated square roots extraction followed by the same number of times squaring and find its fixpoints. For IEEE standard double precision floating point numbers the fixpoints have the form \[ x \in \left\{\left( 1+k\mathrm{eps}\right) ^{\frac{1}{\mathrm{eps}}},\quad k=\left[ -745:\frac{1}{2}:-\frac{1}{2},0:709\right]\right\} \cup \{0\} , \] where \mathrm{eps} is the machine epsilon."


Sign in / Sign up

Export Citation Format

Share Document