scholarly journals CHiPReP—A Compiler for the HiPReP High-Performance Reconfigurable Processor

Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2590
Author(s):  
Markus Weinhardt ◽  
Mohamed Messelka ◽  
Philipp Käsgen

This article presents CHiPReP, a C compiler for the HiPReP processor, which is a high-performance Coarse-Grained Reconfigurable Array employing Floating-Point Units. CHiPReP is an extension of the LLVM and CCF compiler frameworks. Its main contributions are (i) a Splitting Algorithm for Data Dependence Graphs, which distributes the computations of a C loop to Address-Generator Units and Processing Elements; (ii) a novel instruction clustering and scheduling heuristic; and (iii) an integrated placement, pipeline balancing and routing optimization method based on Simulated Annealing. The compiler was verified and analyzed using a cycle-accurate HiPReP simulation model.

2015 ◽  
Vol 24 (03) ◽  
pp. 1550043 ◽  
Author(s):  
Chen Yang ◽  
Leibo Liu ◽  
Yansheng Wang ◽  
Shouyi Yin ◽  
Peng Cao ◽  
...  

The major bottleneck of coarse-grained reconfigurable arrays (CGRAs) is the excessive configuration overhead; as a result, computing potential cannot be fully utilized. At run-time, the function of CGRAs can be fully and dynamically reconfigured by changing contexts. Therefore, the frequency of context switching on CGRAs is very high. On the other hand, the configuration time of CGRAs is very long. This paper proposes three configuration approaches to reduce interval latency when switching configuration contexts. These proposed approaches include input data relocation (IDR), line-based context switching (LCS), and loop interval minimization (LIM). IDR relocates input data to the first stage of the pipeline; as a result, the delay time for the input data of the next data flow graph (DFG) is reduced. LCS is a LCS mechanism for adjacent independent DFGs to reduce the interval of context switching, thereby expanding the depth of the pipeline. LIM is used to minimize the interval of loops. Simulations on a coarse-grained reconfigurable processor called reconfigurable multimedia system (REMUS) show that 1080 p@30 fps for H.264 high profile video decoding can be achieved under 200 MHz working frequency. As for AVS and MPEG2 decoding algorithms, much higher performance, i.e., 1080 p@39 fps and 1080 p@41 fps, can be achieved respectively.


2012 ◽  
Vol 546-547 ◽  
pp. 469-474
Author(s):  
Xing Wang ◽  
Lei Bo Liu ◽  
Shou Yi Yin ◽  
Min Zhu ◽  
Shao Jun Wei

Coarse-Grained Reconfigurable Architectures (CGRA) have proved to be the potential candidates to meet the high performance, low power and flexibility required by embedded systems. In this paper, we implemented a High Profile Intra Predictor for H.264/AVC decoder on a novel coarse-grained reconfigurable processor- Remus (Reconfigurable Multi-media System). We proposed the pipeline and parallel scheduling process for intra prediction algorithm and the simulation results show that 548 clock cycles are consumed for the worst case of intra macro blocks.


Author(s):  
Christian Plessl ◽  
Marco Platzner

Numerous research efforts in reconfigurable embedded processors have shown that augmenting a CPU core with a coarse-grained reconfigurable array for application-specific hardware acceleration can greatly increase performance and energy-efficiency. The traditional execution model for such reconfigurable co-processors however requires the accelerated function to fit onto the reconfigurable array as a whole, which restricts the applicability to rather small functions. In the authors’ research presented in this chapter, the authors have studied hardware virtualization approaches that overcome this restriction by leveraging dynamic reconfiguration. They present two different hardware virtualization methods, virtualized execution and temporal partitioning, and introduce the Zippy reconfigurable processor architecture that has been designed with specific hardware virtualization support. Further, the authors outline the corresponding hardware and software tool flows. Finally, the authors demonstrate the potential provided by hardware virtualization with two case studies and discuss directions for future research.


Author(s):  
Kersten Schuster ◽  
Philip Trettner ◽  
Leif Kobbelt

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.


Sign in / Sign up

Export Citation Format

Share Document