Mapping high level algorithms onto massively parallel reconfigurable hardware

Author(s):  
I. Damaj ◽  
J. Hawkins ◽  
A. Abdallah
2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Christoph Starke ◽  
Vasco Grossmann ◽  
Lars Wienbrandt ◽  
Sven Koschnicke ◽  
John Carstens ◽  
...  

The hardware structure of a processing element used for optimization of an investment strategy for financial markets is presented. It is shown how this processing element can be multiply implemented on the massively parallel FPGA-machine RIVYERA. This leads to a speedup of a factor of about 17,000 in comparison to one single high-performance PC, while saving more than 99% of the consumed energy. Furthermore, it is shown for a special security and different time periods that the optimized investment strategy delivers an outperformance between 2 and 14 percent in relation to a buy and hold strategy.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1482
Author(s):  
Rafał Kiełbik ◽  
Kamil Rudnicki ◽  
Zbigniew Mudza ◽  
Jarosław Jung

ARUZ is a large scale, massively parallel, FPGA-based reconfigurable computational system dedicated primarily to molecular analysis. This paper presents a methodology for ARUZ firmware development that simplifies the process, offers low-level optimization, and facilitates verification. According to this methodology, firstly an expanded, generic, all-in-one VHDL description of variable Processing Elements (PEs) is developed manually. GCC preprocessing is then used to extract only the desired target functionality. A dedicated software instantiates and connects PEs in form of a scalable network, divides it into subsets for chips and generates its HDL description. As a result, individual HDL-coded specification, optimized for certain analysis, is provided for the synthesis tool. Code reuse and automated generation of up to 81% of the code economizes the workload. Using well-optimized VHDL for core description rather than High Level Synthesis eliminates unnecessary overhead. The PE network can be scaled inversely proportional to PEs complexity, in order to efficiently utilize available resources. Moreover, downscaling the problem makes verification during HDL simulations and testing the prototype systems easier.


2020 ◽  
Vol 10 (5) ◽  
pp. 1656
Author(s):  
Woosuk Shin ◽  
Kwan-Hee Yoo ◽  
Nakhoon Baek

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.


Sign in / Sign up

Export Citation Format

Share Document