scholarly journals Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units

2013 ◽  
Vol 2013 ◽  
pp. 1-20 ◽  
Author(s):  
João Bispo ◽  
Nuno Paulino ◽  
João M. P. Cardoso ◽  
João Canas Ferreira

The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 to 3.69 were achieved for the best alternative using a MicroBlaze processor with local memory.

Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


2015 ◽  
Vol 17 (10) ◽  
pp. 1706-1720 ◽  
Author(s):  
Leibo Liu ◽  
Dong Wang ◽  
Min Zhu ◽  
Yansheng Wang ◽  
Shouyi Yin ◽  
...  

2009 ◽  
Vol 8 (4) ◽  
pp. 93-100 ◽  
Author(s):  
Corliss A. O'Bryan ◽  
Philip G. Crandall ◽  
Katrina Shores-Ellis ◽  
Donald M. Johnson ◽  
Steven C. Ricke ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document