Subblock-Based BPE Scheme to Conquer Mismatch in Memory Access Pattern

This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs. We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer. The approach was validated on scenes of sizes up to 169 GB. We show that only 1–5% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.

Download Full-text

Efficient Rate Conversion Filtering on GPUs with Shared Memory Access Pattern Scrambling

2016 IEEE International Workshop on Signal Processing Systems (SiPS) ◽

10.1109/sips.2016.57 ◽

2016 ◽

Cited By ~ 1

Author(s):

Mrugesh Gajjar ◽

Ismayil Guracar

Keyword(s):

Shared Memory ◽

Memory Access ◽

Access Pattern ◽

Rate Conversion

Download Full-text

Memory Access Pattern Protection for Resource-Constrained Devices

Smart Card Research and Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37288-9_13 ◽

2013 ◽

pp. 188-202 ◽

Cited By ~ 3

Author(s):

Yuto Nakano ◽

Carlos Cid ◽

Shinsaku Kiyomoto ◽

Yutaka Miyake

Keyword(s):

Memory Access ◽

Access Pattern ◽

Resource Constrained ◽

Resource Constrained Devices ◽

Constrained Devices

Download Full-text

Memory access pattern analysis and stream cache design for multimedia applications

Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003. ◽

10.1109/aspdac.2003.1194988 ◽

2003 ◽

Author(s):

Junghee Lee ◽

Chanik Park ◽

Soonhoi Ha

Keyword(s):

Pattern Analysis ◽

Multimedia Applications ◽

Memory Access ◽

Access Pattern ◽

Cache Design

Download Full-text

Trace-Driven Memory Access Pattern Recognition in Computational Kernels

Proceedings of the Second Workshop on Optimizing Stencil Computations - WOSC '14 ◽

10.1145/2686745.2686748 ◽

2014 ◽

Author(s):

Eunjung Park ◽

Christos Kartsaklis ◽

Tomislav Janjusic ◽

John Cavazos

Keyword(s):

Pattern Recognition ◽

Memory Access ◽

Access Pattern

Download Full-text

APPLICATION-ADAPTIVE RECONFIGURATION OF MEMORY ADDRESS SHUFFLER FOR FPGA-EMBEDDED INSTRUCTION-SET PROCESSOR

Journal of Circuits System and Computers ◽

10.1142/s0218126610006748 ◽

2010 ◽

Vol 19 (07) ◽

pp. 1435-1447

Author(s):

YOUNG-SU KWON ◽

NAK-WOONG EUM

Keyword(s):

Memory System ◽

Memory Access ◽

Conflict Graph ◽

The Novel ◽

Access Pattern ◽

Processor Core ◽

Parallel Memory ◽

Memory Address ◽

Viable Solution ◽

Media Applications

Programmability requirement in reconfigurable systems necessitates the integration of soft processors in FPGAs. The extensive memory bandwidth sets a major performance bottleneck in soft processors for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions in media processors, memory access conflicts caused by multiple memory buses limit the overall performance. We propose and evaluate the configurable memory address shuffler integrated in memory access arbiter for the parallel memory system in a soft processor. The novel address shuffling algorithm profiles memory access pattern of the application, produces the access conflict graph, relocates decomposed memory sub-pages based on the access conflict graph, and finally generates a synthesizable code of the address shuffler. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that the amount of simultaneous accesses to the identical physical memory block diminishes. The reconfigurability of the address shuffler enables the adaptive address shuffling depending on the memory access pattern of an application running on the soft processor. The configurable address shuffler removes 80% of access conflicts on average for benchmarks where the hardware overhead of the shuffler is 1592 LUTs which is 14% of LUT size of the processor core.

Download Full-text