Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

A Memetic Algorithm-Based Design Space Exploration for Datapath Resource Allocation During High-Level Synthesis

Journal of Circuits System and Computers ◽

10.1142/s0218126620500012 ◽

2019 ◽

Vol 29 (01) ◽

pp. 2050001

Author(s):

Shathanaa Rajmohan ◽

N. Ramasubramanian

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Memetic Algorithm ◽

Space Exploration ◽

Memetic Algorithms ◽

Convergence Time ◽

High Level Synthesis ◽

Design Parameters ◽

Levels Of Abstraction ◽

High Level

System designers have started adopting high-level synthesis (HLS) for architectural design because of the higher levels of abstraction offered. The HLS tools provide multiple design choices with tradeoff among different design parameters. Design Space Exploration (DSE) involves optimizing the synthesis options to achieve best tradeoffs among the metrics of interest. With the aim of exploring the design space in a feasible amount of time, we present a novel automated DSE approach. In particular, meeting the constraints presented by different parameters of interest is modeled as a multi-objective problem and solved using Memetic algorithm. The effectiveness of different variations of the Memetic algorithm in solving the DSE problem is studied and a Firefly algorithm-based solution is proposed with a novel probabilistic local search mechanism. The proposed approach is compared with existing solutions and the results prove that the proposed approach outperforms both existing solutions and other variations of Memetic algorithms in terms of convergence time and quality of results. In addition to that, a case study has been included to demonstrate the applicability of the approach. Results show that the proposed approach achieves a 33% improvement in cost, [Formula: see text] improvement in speed and [Formula: see text] improvement in hypervolume.

Download Full-text

ARMv8 micro-architectural design space exploration for high performance computing using fractional factorial

Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems - PMBS '15 ◽

10.1145/2832087.2832095 ◽

2015 ◽

Cited By ~ 3

Author(s):

Roxana Rusitoru

Keyword(s):

High Performance Computing ◽

High Performance ◽

Architectural Design ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Fractional Factorial ◽

Performance Computing

Download Full-text

Layout Modeling and Design Space Exploration in Pss1 System

VLSI Design ◽

10.1155/1997/42849 ◽

1997 ◽

Vol 5 (2) ◽

pp. 211-221

Author(s):

Fur-Shing Tsai ◽

Yu-Chin Hsu

Keyword(s):

Design Methodology ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Design Parameters ◽

Information Feedback ◽

Synthesis System ◽

Hardware Costs ◽

High Level

This paper presents the design methodology used in PSS1, a high level synthesis system designed for computation dominated applications. It includes a behavior synthesizer and an area optimizer. Based on a pre-defined architecture, the behavior synthesizer translates a description into a number of designs with different delays and hardware costs. Based on a two-level layout model, the area optimizer fine-tunes the physical design using the information feedback from the layout tools. All the tools are linked by an X-window interface in which users can traverse among different tools and interactively change the design parameters. The output is linked to Lager system [7], a silicon assembler. The layout model allows a designer to interactively merge/split modules, change the shape of modules, and define the pin positions of modules. Experiments show that a considerable area improvement has been achieved using this methodology.

Download Full-text

Local Memory Design Space Exploration for High-Performance Computing

The Computer Journal ◽

10.1093/comjnl/bxq026 ◽

2010 ◽

Vol 54 (5) ◽

pp. 786-799 ◽

Cited By ~ 7

Author(s):

R. Bertran ◽

M. Gonzalez ◽

X. Martorell ◽

N. Navarro ◽

E. Ayguade

Keyword(s):

High Performance Computing ◽

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Local Memory ◽

Memory Design ◽

Performance Computing

Download Full-text

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

International Journal of Reconfigurable Computing ◽

10.1155/2013/428078 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 18

Author(s):

Bruno da Silva ◽

An Braeken ◽

Erik H. D’Hollander ◽

Abdellah Touhafi

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Performance Model ◽

High Level Synthesis ◽

Performance Expectations ◽

Proposed Model ◽

Roofline Model ◽

High Level ◽

Performance Computing ◽

Selection Of

The potential of FPGAs as accelerators for high-performance computing applications is very large, but many factors are involved in their performance. The design for FPGAs and the selection of the proper optimizations when mapping computations to FPGAs lead to prohibitively long developing time. Alternatives are the high-level synthesis (HLS) tools, which promise a fast design space exploration due to design at high-level or analytical performance models which provide realistic performance expectations, potential impediments to performance, and optimization guidelines. In this paper we propose the combination of both, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer. Our proposed model extends the roofline model, by considering the resource consumption and the parameters used in the HLS tools, to maximize the performance and the resource utilization within the area of the FPGA. The proposed model is applied to optimize the design exploration of a class of window-based image processing applications using two different HLS tools. The results show the accuracy of the model as well as its flexibility to be combined with any HLS tool.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

Implementation and Design Space Exploration of a Turbo Decoder in High-Level Synthesis

2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig) ◽

10.1109/reconfig48160.2019.8994787 ◽

2019 ◽

Author(s):

Wesley Stirk ◽

Jeff Goeders

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Turbo Decoder ◽

High Level

Download Full-text

Core-Level Modeling and Frequency Prediction for DSP Applications on FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2015/784672 ◽

2015 ◽

Vol 2015 ◽

pp. 1-20

Author(s):

Gongyu Wang ◽

Greg Stitt ◽

Herman Lam ◽

Alan George

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Core Level ◽

Prediction Methods ◽

Clock Frequency ◽

Worst Case ◽

Model Based ◽

Dsp Applications

Field-programmable gate arrays (FPGAs) provide a promising technology that can improve performance of many high-performance computing and embedded applications. However, unlike software design tools, the relatively immature state of FPGA tools significantly limits productivity and consequently prevents widespread adoption of the technology. For example, the lengthy design-translate-execute (DTE) process often must be iterated to meet the application requirements. Previous works have enabled model-based, design-space exploration to reduce DTE iterations but are limited by a lack of accurate model-based prediction of key design parameters, the most important of which is clock frequency. In this paper, we present a core-level modeling and design (CMD) methodology that enables modeling of FPGA applications at an abstract level and yet produces accurate predictions of parameters such as clock frequency, resource utilization (i.e., area), and latency. We evaluate CMD’s prediction methods using several high-performance DSP applications on various families of FPGAs and show an average clock-frequency prediction error of 3.6%, with a worst-case error of 20.4%, compared to the best of existing high-level prediction methods, 13.9% average error with 48.2% worst-case error. We also demonstrate how such prediction enables accurate design-space exploration without coding in a hardware-description language (HDL), significantly reducing the total design time.

Download Full-text