High-Level Description and Synthesis of Floating-Point Accumulators on FPGA

Author(s):  
Marc-Andre Daigneault ◽  
Jean Pierre David
Keyword(s):  
Author(s):  
Esther Andres ◽  
Maria C. Molina ◽  
Guillermo Botella ◽  
Alberto del Barrio ◽  
Jose M. Mendias

2012 ◽  
Vol 2012 ◽  
pp. 1-14 ◽  
Author(s):  
Daniel Menard ◽  
Nicolas Herve ◽  
Olivier Sentieys ◽  
Hai-Nam Nguyen

Implementing signal processing applications in embedded systems generally requires the use of fixed-point arithmetic. The main problem slowing down the hardware implementation flow is the lack of high-level development tools to target these architectures from algorithmic specification language using floating-point data types. In this paper, a new method to automatically implement a floating-point algorithm into an FPGA or an ASIC using fixed-point arithmetic is proposed. An iterative process on high-level synthesis and data word-length optimization is used to improve both of these dependent processes. Indeed, high-level synthesis requires operator word-length knowledge to correctly execute its allocation, scheduling, and resource binding steps. Moreover, the word-length optimization requires resource binding and scheduling information to correctly group operations. To dramatically reduce the optimization time compared to fixed-point simulation-based methods, the accuracy evaluation is done through an analytical method. Different experiments on signal processing algorithms are presented to show the efficiency of the proposed method. Compared to classical methods, the average architecture area reduction is between 10% and 28%.


2017 ◽  
Vol 65 (6) ◽  
pp. 935-947
Author(s):  
M. Pietras ◽  
P. Klęsk

Abstract This paper presents a programmable system-on-chip implementation to be used for acceleration of computations within hidden Markov models. The high level synthesis (HLS) and “divide-and-conquer” approaches are presented for parallelization of Baum-Welch and Viterbi algorithms. To avoid arithmetic underflows, all computations are performed within the logarithmic space. Additionally, in order to carry out computations efficiently – i.e. directly in an FPGA system or a processor cache – we postulate to reduce the floating-point representations of HMMs. We state and prove a lemma about the length of numerically unsafe sequences for such reduced precision models. Finally, special attention is devoted to the design of a multiple logarithm and exponent approximation unit (MLEAU). Using associative mapping, this unit allows for simultaneous conversions of multiple values and thereby compensates for computational efforts of logarithmic-space operations. Design evaluation reveals absolute stall delay occurring by multiple hardware conversions to logarithms and to exponents, and furthermore the experiments evaluation reveals HMMs computation boundaries related to their probabilities and floating-point representation. The performance differences at each stage of computation are summarized in performance comparison between hardware acceleration using MLEAU and typical software implementation on an ARM or Intel processor.


Sign in / Sign up

Export Citation Format

Share Document