FPGA Memory Optimization in High-Level Synthesis

Author(s):  
Mingjie Lin ◽  
Juan Escobedo

High-level synthesis (HLS) with FPGA can achieve significant performance improvements through effective memory partitioning and meticulous data reuse. In this chapter, the authors will first explore techniques that have been adopted directly from systems that possess a fixed memory subsystem such as CPUs and GPUs (Section 2). Section 3 will focus on techniques that have been developed specifically for reconfigurable architectures which generate custom memory subsystems to take advantage of the peculiarities of a family of affine code called stencil code. The authors will focus on techniques that exploit memory banking to allow for parallel, conflict-free memory accesses in Section 3.1 and techniques that generate an optimal memory micro-architecture for data reuse in Section 3.2. Finally, Section 4 will explore the technique handling code still belonging to the affine family but the relative distance between the addresses.

Author(s):  
Panadda Solod ◽  
Nattha Jindapetch ◽  
Kiattisak Sengchuai ◽  
Apidet Booranawong ◽  
Pakpoom Hoyingcharoen ◽  
...  

2002 ◽  
Vol 10 (1) ◽  
pp. 45-53 ◽  
Author(s):  
Jie Tao ◽  
Wolfgang Karl ◽  
Martin Schulz

Shared memory applications running transparently on top of NUMA architectures often face severe performance problems due to bad data locality and excessive remote memory accesses. Optimizations with respect to data locality are therefore necessary, but require a fundamental understanding of an application's memory access behavior. The information necessary for this cannot be obtained using simple code instrumentation due to the implicit nature of the communication handled by the NUMA hardware, the large amount of traffic produced at runtime, and the fine access granularity in shared memory codes. In this paper an approach to overcome these problems and thereby to enable an easy and efficient optimization process is presented. Based on a low-level hardware monitoring facility in coordination with a comprehensive visualization tool, it enables the generation of memory access histograms capable of showing all memory accesses across the complete address space of an application's working set. This information can be used to identify access hot spots, to understand the dynamic behavior of shared memory applications, and to optimize applications using an application specific data layout resulting in significant performance improvements.


Sign in / Sign up

Export Citation Format

Share Document