On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.

Download Full-text

GPU Accelerated Path Tracing of Massive Scenes

ACM Transactions on Graphics ◽

10.1145/3447807 ◽

2021 ◽

Vol 40 (2) ◽

pp. 1-17

Author(s):

Milan Jaroš ◽

Lubomír Říha ◽

Petr Strakoš ◽

Matěj Špeťko

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Access ◽

Minimal Effect ◽

Proof Of Concept ◽

Access Pattern ◽

Multiple Gpus ◽

Management Level ◽

Path Tracing ◽

Memory Accesses

This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs. We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer. The approach was validated on scenes of sizes up to 169 GB. We show that only 1–5% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.

Download Full-text

Memory Management Support for Multi-Programmed Remote Direct Memory Access (RDMA) Systems

2005 IEEE International Conference on Cluster Computing ◽

10.1109/clustr.2005.347031 ◽

2005 ◽

Cited By ~ 2

Author(s):

Kostas Magoutis

Keyword(s):

Memory Management ◽

Direct Memory Access ◽

Memory Access ◽

Management Support

Download Full-text

Memory access behavior of dynamically allocated data structures and programs with irregular access patterns

10.32920/ryerson.14648538.v1 ◽

2021 ◽

Author(s):

Zhen Yu

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Hierarchy ◽

Research Work ◽

Memory Access ◽

Mathematical Formula ◽

Dynamic Memory ◽

Dynamic Memory Management ◽

Management Policies ◽

Access Patterns

With the development of modern computers, memory latencies have become a key bottleneck for the performance of computer systems. Since then, much research work has targeted improving the performance of memory hierarchy. In this thesis, we examine the behavior of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP). DADS and PIAP use dynamic memory management or algorithms with unpredictable behaviour. By simulating some applications of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP), it is found that general cache management policies can not effectively use the treasurable cache resources for DADS and PIAP. We explored the use of mathematical formula applied to signal processing to improve the performance of memory hierarchy.

Download Full-text

Memory access behavior of dynamically allocated data structures and programs with irregular access patterns

10.32920/ryerson.14648538 ◽

2021 ◽

Author(s):

Zhen Yu

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Hierarchy ◽

Research Work ◽

Memory Access ◽

Mathematical Formula ◽

Dynamic Memory ◽

Dynamic Memory Management ◽

Management Policies ◽

Access Patterns

With the development of modern computers, memory latencies have become a key bottleneck for the performance of computer systems. Since then, much research work has targeted improving the performance of memory hierarchy. In this thesis, we examine the behavior of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP). DADS and PIAP use dynamic memory management or algorithms with unpredictable behaviour. By simulating some applications of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP), it is found that general cache management policies can not effectively use the treasurable cache resources for DADS and PIAP. We explored the use of mathematical formula applied to signal processing to improve the performance of memory hierarchy.

Download Full-text

Study on Explicit Memory Management for CBEA Green Computing Architecture

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.374-377.2078 ◽

2011 ◽

Vol 374-377 ◽

pp. 2078-2081

Author(s):

Guo Fu Feng ◽

Ming Wang ◽

Ming Chen ◽

Tao Chi

Keyword(s):

Performance Optimization ◽

Memory Management ◽

Green Computing ◽

Memory Access ◽

Application Performance ◽

Access Methods ◽

Power Efficient ◽

Profile Information ◽

Resource Requirements ◽

Better Than

Heterogeneous multi-core processors are attractive for power efficient green computing because of their ability to meet varied resource requirements. The multi-level memory hierarchy of Cell Broadband Engine Architecture (CBEA) which requires explicit management by software poses significant challenges to performance increasing and programming. In this paper, with analysis of characteristic of the architecture, we implemented four access methods and a corresponding access library with a uniform memory access interface. Besides getting performance boosts beyond current level technology, the memory access library with uniform access interface could collect profile information of memory management for further performance optimization. Experimental results show the performance of proposed method is better than related works and profile information provided by the method is helpful for programmer to optimize application performance.

Download Full-text