scholarly journals Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

2019 ◽  
Vol 29 (01) ◽  
pp. 2050001
Author(s):  
Shathanaa Rajmohan ◽  
N. Ramasubramanian

System designers have started adopting high-level synthesis (HLS) for architectural design because of the higher levels of abstraction offered. The HLS tools provide multiple design choices with tradeoff among different design parameters. Design Space Exploration (DSE) involves optimizing the synthesis options to achieve best tradeoffs among the metrics of interest. With the aim of exploring the design space in a feasible amount of time, we present a novel automated DSE approach. In particular, meeting the constraints presented by different parameters of interest is modeled as a multi-objective problem and solved using Memetic algorithm. The effectiveness of different variations of the Memetic algorithm in solving the DSE problem is studied and a Firefly algorithm-based solution is proposed with a novel probabilistic local search mechanism. The proposed approach is compared with existing solutions and the results prove that the proposed approach outperforms both existing solutions and other variations of Memetic algorithms in terms of convergence time and quality of results. In addition to that, a case study has been included to demonstrate the applicability of the approach. Results show that the proposed approach achieves a 33% improvement in cost, [Formula: see text] improvement in speed and [Formula: see text] improvement in hypervolume.


VLSI Design ◽  
1997 ◽  
Vol 5 (2) ◽  
pp. 211-221
Author(s):  
Fur-Shing Tsai ◽  
Yu-Chin Hsu

This paper presents the design methodology used in PSS1, a high level synthesis system designed for computation dominated applications. It includes a behavior synthesizer and an area optimizer. Based on a pre-defined architecture, the behavior synthesizer translates a description into a number of designs with different delays and hardware costs. Based on a two-level layout model, the area optimizer fine-tunes the physical design using the information feedback from the layout tools. All the tools are linked by an X-window interface in which users can traverse among different tools and interactively change the design parameters. The output is linked to Lager system [7], a silicon assembler. The layout model allows a designer to interactively merge/split modules, change the shape of modules, and define the pin positions of modules. Experiments show that a considerable area improvement has been achieved using this methodology.


2010 ◽  
Vol 54 (5) ◽  
pp. 786-799 ◽  
Author(s):  
R. Bertran ◽  
M. Gonzalez ◽  
X. Martorell ◽  
N. Navarro ◽  
E. Ayguade

2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Bruno da Silva ◽  
An Braeken ◽  
Erik H. D’Hollander ◽  
Abdellah Touhafi

The potential of FPGAs as accelerators for high-performance computing applications is very large, but many factors are involved in their performance. The design for FPGAs and the selection of the proper optimizations when mapping computations to FPGAs lead to prohibitively long developing time. Alternatives are the high-level synthesis (HLS) tools, which promise a fast design space exploration due to design at high-level or analytical performance models which provide realistic performance expectations, potential impediments to performance, and optimization guidelines. In this paper we propose the combination of both, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer. Our proposed model extends the roofline model, by considering the resource consumption and the parameters used in the HLS tools, to maximize the performance and the resource utilization within the area of the FPGA. The proposed model is applied to optimize the design exploration of a class of window-based image processing applications using two different HLS tools. The results show the accuracy of the model as well as its flexibility to be combined with any HLS tool.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


2015 ◽  
Vol 2015 ◽  
pp. 1-20
Author(s):  
Gongyu Wang ◽  
Greg Stitt ◽  
Herman Lam ◽  
Alan George

Field-programmable gate arrays (FPGAs) provide a promising technology that can improve performance of many high-performance computing and embedded applications. However, unlike software design tools, the relatively immature state of FPGA tools significantly limits productivity and consequently prevents widespread adoption of the technology. For example, the lengthy design-translate-execute (DTE) process often must be iterated to meet the application requirements. Previous works have enabled model-based, design-space exploration to reduce DTE iterations but are limited by a lack of accurate model-based prediction of key design parameters, the most important of which is clock frequency. In this paper, we present a core-level modeling and design (CMD) methodology that enables modeling of FPGA applications at an abstract level and yet produces accurate predictions of parameters such as clock frequency, resource utilization (i.e., area), and latency. We evaluate CMD’s prediction methods using several high-performance DSP applications on various families of FPGAs and show an average clock-frequency prediction error of 3.6%, with a worst-case error of 20.4%, compared to the best of existing high-level prediction methods, 13.9% average error with 48.2% worst-case error. We also demonstrate how such prediction enables accurate design-space exploration without coding in a hardware-description language (HDL), significantly reducing the total design time.


Sign in / Sign up

Export Citation Format

Share Document