CHiPReP—A Compiler for the HiPReP High-Performance Reconfigurable Processor

This article presents CHiPReP, a C compiler for the HiPReP processor, which is a high-performance Coarse-Grained Reconfigurable Array employing Floating-Point Units. CHiPReP is an extension of the LLVM and CCF compiler frameworks. Its main contributions are (i) a Splitting Algorithm for Data Dependence Graphs, which distributes the computations of a C loop to Address-Generator Units and Processing Elements; (ii) a novel instruction clustering and scheduling heuristic; and (iii) an integrated placement, pipeline balancing and routing optimization method based on Simulated Annealing. The compiler was verified and analyzed using a cycle-accurate HiPReP simulation model.

Download Full-text

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array

Journal of Circuits System and Computers ◽

10.1142/s0218126615500437 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550043 ◽

Cited By ~ 1

Author(s):

Chen Yang ◽

Leibo Liu ◽

Yansheng Wang ◽

Shouyi Yin ◽

Peng Cao ◽

...

Keyword(s):

Input Data ◽

Multimedia System ◽

Coarse Grained ◽

Reconfigurable Processor ◽

Decoding Algorithms ◽

High Profile ◽

Context Switching ◽

Reconfigurable Arrays ◽

Reconfigurable Array ◽

Major Bottleneck

The major bottleneck of coarse-grained reconfigurable arrays (CGRAs) is the excessive configuration overhead; as a result, computing potential cannot be fully utilized. At run-time, the function of CGRAs can be fully and dynamically reconfigured by changing contexts. Therefore, the frequency of context switching on CGRAs is very high. On the other hand, the configuration time of CGRAs is very long. This paper proposes three configuration approaches to reduce interval latency when switching configuration contexts. These proposed approaches include input data relocation (IDR), line-based context switching (LCS), and loop interval minimization (LIM). IDR relocates input data to the first stage of the pipeline; as a result, the delay time for the input data of the next data flow graph (DFG) is reduced. LCS is a LCS mechanism for adjacent independent DFGs to reduce the interval of context switching, thereby expanding the depth of the pipeline. LIM is used to minimize the interval of loops. Simulations on a coarse-grained reconfigurable processor called reconfigurable multimedia system (REMUS) show that 1080 p@30 fps for H.264 high profile video decoding can be achieved under 200 MHz working frequency. As for AVS and MPEG2 decoding algorithms, much higher performance, i.e., 1080 p@39 fps and 1080 p@41 fps, can be achieved respectively.

Download Full-text

H.264/AVC Intra Predictor on a Coarse-Grained Reconfigurable Multi-Media System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.546-547.469 ◽

2012 ◽

Vol 546-547 ◽

pp. 469-474

Author(s):

Xing Wang ◽

Lei Bo Liu ◽

Shou Yi Yin ◽

Min Zhu ◽

Shao Jun Wei

Keyword(s):

High Performance ◽

Coarse Grained ◽

Reconfigurable Architectures ◽

Prediction Algorithm ◽

Worst Case ◽

Reconfigurable Processor ◽

High Profile ◽

Media System ◽

Multi Media ◽

Simulation Results

Coarse-Grained Reconfigurable Architectures (CGRA) have proved to be the potential candidates to meet the high performance, low power and flexibility required by embedded systems. In this paper, we implemented a High Profile Intra Predictor for H.264/AVC decoder on a novel coarse-grained reconfigurable processor- Remus (Reconfigurable Multi-media System). We proposed the pipeline and parallel scheduling process for intra prediction algorithm and the simulation results show that 548 clock cycles are consumed for the worst case of intra macro blocks.

Download Full-text

Hardware Virtualization on Dynamically Reconfigurable Processors

Reconfigurable Embedded Control Systems ◽

10.4018/978-1-60960-086-0.ch004 ◽

2011 ◽

pp. 82-109 ◽

Cited By ~ 1

Author(s):

Christian Plessl ◽

Marco Platzner

Keyword(s):

Hardware Acceleration ◽

Software Tool ◽

Coarse Grained ◽

Future Research ◽

Hardware Virtualization ◽

Reconfigurable Processor ◽

Dynamically Reconfigurable ◽

Reconfigurable Array ◽

Execution Model ◽

Application Specific

Numerous research efforts in reconfigurable embedded processors have shown that augmenting a CPU core with a coarse-grained reconfigurable array for application-specific hardware acceleration can greatly increase performance and energy-efficiency. The traditional execution model for such reconfigurable co-processors however requires the accelerated function to fit onto the reconfigurable array as a whole, which restricts the applicability to rather small functions. In the authors’ research presented in this chapter, the authors have studied hardware virtualization approaches that overcome this restriction by leveraging dynamic reconfiguration. They present two different hardware virtualization methods, virtualized execution and temporal partitioning, and introduce the Zippy reconfigurable processor architecture that has been designed with specific hardware virtualization support. Further, the authors outline the corresponding hardware and software tool flows. Finally, the authors demonstrate the potential provided by hardware virtualization with two case studies and discuss directions for future research.

Download Full-text

A Coarse-Grained Reconfigurable Array for High-Performance Computing Applications

2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig) ◽

10.1109/reconfig.2018.8641720 ◽

2018 ◽

Cited By ~ 3

Author(s):

Philipp S. Kasgen ◽

Markus Weinhardt ◽

Christian Hochberger

Keyword(s):

High Performance Computing ◽

High Performance ◽

Coarse Grained ◽

Reconfigurable Array ◽

Performance Computing

Download Full-text

A Fully Parameterized Virtual Coarse Grained Reconfigurable Array for High Performance Computing Applications

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2016.13 ◽

2016 ◽

Cited By ~ 6

Author(s):

Amit Kulkarni ◽

Elias Vasteenkiste ◽

Dirk Stroobandt ◽

Andreas Brokalakis ◽

Antonios Nikitakis

Keyword(s):

High Performance Computing ◽

High Performance ◽

Coarse Grained ◽

Reconfigurable Array ◽

Performance Computing

Download Full-text

Design space exploration and implementation of a high performance and low area Coarse Grained Reconfigurable Processor

2012 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2012.6412114 ◽

2012 ◽

Cited By ~ 16

Author(s):

Dongkwan Suh ◽

Kiseok Kwon ◽

Sukjin Kim ◽

Soojung Ryu ◽

Jeongwook Kim

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Coarse Grained ◽

Reconfigurable Processor ◽

Low Area

Download Full-text

Address Generator for High Performance Wimaxdeinterleaver

International Journal of Scientific Research ◽

10.15373/22778179/may2014/56 ◽

2012 ◽

Vol 3 (5) ◽

pp. 184-186

Author(s):

AmalDas GH AmalDas GH ◽

Keyword(s):

High Performance ◽

Address Generator

Download Full-text

High-Performance Image Filters via Sparse Approximations

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406182 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Kersten Schuster ◽

Philip Trettner ◽

Leif Kobbelt

Keyword(s):

High Performance ◽

Hardware Acceleration ◽

Optimization Method ◽

Translation Invariant ◽

Approximation Quality ◽

Trade Offs ◽

Sparse Approximations ◽

Image Filters ◽

Good Trade ◽

And Performance

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

Download Full-text