scholarly journals On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale

Author(s):  
Aleix Roca Nonell ◽  
Balazs Gerofi ◽  
Leonardo Bautista-Gomez ◽  
Dominique Martinet ◽  
Vicenç Beltran Querol ◽  
...  
Author(s):  
Eduardo H. M. Cruz ◽  
Matthias Diener ◽  
Laércio L. Pilla ◽  
Philippe O. A. Navaux

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.


2021 ◽  
Vol 40 (2) ◽  
pp. 1-17
Author(s):  
Milan Jaroš ◽  
Lubomír Říha ◽  
Petr Strakoš ◽  
Matěj Špeťko

This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs. We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer. The approach was validated on scenes of sizes up to 169 GB. We show that only 1–5% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.


2021 ◽  
Author(s):  
Zhen Yu

With the development of modern computers, memory latencies have become a key bottleneck for the performance of computer systems. Since then, much research work has targeted improving the performance of memory hierarchy. In this thesis, we examine the behavior of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP). DADS and PIAP use dynamic memory management or algorithms with unpredictable behaviour. By simulating some applications of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP), it is found that general cache management policies can not effectively use the treasurable cache resources for DADS and PIAP. We explored the use of mathematical formula applied to signal processing to improve the performance of memory hierarchy.


2021 ◽  
Author(s):  
Zhen Yu

With the development of modern computers, memory latencies have become a key bottleneck for the performance of computer systems. Since then, much research work has targeted improving the performance of memory hierarchy. In this thesis, we examine the behavior of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP). DADS and PIAP use dynamic memory management or algorithms with unpredictable behaviour. By simulating some applications of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP), it is found that general cache management policies can not effectively use the treasurable cache resources for DADS and PIAP. We explored the use of mathematical formula applied to signal processing to improve the performance of memory hierarchy.


2011 ◽  
Vol 374-377 ◽  
pp. 2078-2081
Author(s):  
Guo Fu Feng ◽  
Ming Wang ◽  
Ming Chen ◽  
Tao Chi

Heterogeneous multi-core processors are attractive for power efficient green computing because of their ability to meet varied resource requirements. The multi-level memory hierarchy of Cell Broadband Engine Architecture (CBEA) which requires explicit management by software poses significant challenges to performance increasing and programming. In this paper, with analysis of characteristic of the architecture, we implemented four access methods and a corresponding access library with a uniform memory access interface. Besides getting performance boosts beyond current level technology, the memory access library with uniform access interface could collect profile information of memory management for further performance optimization. Experimental results show the performance of proposed method is better than related works and profile information provided by the method is helpful for programmer to optimize application performance.


2013 ◽  
Vol 41 (3) ◽  
pp. 380-391 ◽  
Author(s):  
Young Hoon Son ◽  
O. Seongil ◽  
Yuhwan Ro ◽  
Jae W. Lee ◽  
Jung Ho Ahn
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document