High-Level Description and Synthesis of Floating-Point Accumulators on FPGA

Implementing signal processing applications in embedded systems generally requires the use of fixed-point arithmetic. The main problem slowing down the hardware implementation flow is the lack of high-level development tools to target these architectures from algorithmic specification language using floating-point data types. In this paper, a new method to automatically implement a floating-point algorithm into an FPGA or an ASIC using fixed-point arithmetic is proposed. An iterative process on high-level synthesis and data word-length optimization is used to improve both of these dependent processes. Indeed, high-level synthesis requires operator word-length knowledge to correctly execute its allocation, scheduling, and resource binding steps. Moreover, the word-length optimization requires resource binding and scheduling information to correctly group operations. To dramatically reduce the optimization time compared to fixed-point simulation-based methods, the accuracy evaluation is done through an analytical method. Different experiments on signal processing algorithms are presented to show the efficiency of the proposed method. Compared to classical methods, the average architecture area reduction is between 10% and 28%.

Download Full-text

High-level synthesis of software-customizable floating-point cores

2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) ◽

10.23919/date.2018.8341976 ◽

2018 ◽

Cited By ~ 5

Author(s):

Samridhi Bansal ◽

Hsuan Hsiao ◽

Tomasz Czajkowski ◽

Jason H. Anderson

Keyword(s):

High Level Synthesis ◽

Floating Point ◽

High Level

Download Full-text

FPGA implementation of logarithmic versions of Baum-Welch and Viterbi algorithms for reduced precision hidden Markov models

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.1515/bpasts-2017-0101 ◽

2017 ◽

Vol 65 (6) ◽

pp. 935-947

Author(s):

M. Pietras ◽

P. Klęsk

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Hardware Acceleration ◽

Performance Comparison ◽

Divide And Conquer ◽

Floating Point ◽

Logarithmic Space ◽

On Chip ◽

High Level

Abstract This paper presents a programmable system-on-chip implementation to be used for acceleration of computations within hidden Markov models. The high level synthesis (HLS) and “divide-and-conquer” approaches are presented for parallelization of Baum-Welch and Viterbi algorithms. To avoid arithmetic underflows, all computations are performed within the logarithmic space. Additionally, in order to carry out computations efficiently – i.e. directly in an FPGA system or a processor cache – we postulate to reduce the floating-point representations of HMMs. We state and prove a lemma about the length of numerically unsafe sequences for such reduced precision models. Finally, special attention is devoted to the design of a multiple logarithm and exponent approximation unit (MLEAU). Using associative mapping, this unit allows for simultaneous conversions of multiple values and thereby compensates for computational efforts of logarithmic-space operations. Design evaluation reveals absolute stall delay occurring by multiple hardware conversions to logarithms and to exponents, and furthermore the experiments evaluation reveals HMMs computation boundaries related to their probabilities and floating-point representation. The performance differences at each stage of computation are summarized in performance comparison between hardware acceleration using MLEAU and typical software implementation on an ARM or Intel processor.

Download Full-text