On the Detectability of Control Flow Using Memory Access Patterns

Author(s):  
Robert Buhren ◽  
Felicitas Hetzelt ◽  
Niklas Pirnay
2021 ◽  
pp. 054-062
Author(s):  
D.V. Rahozin ◽  
◽  
A.Yu. Doroshenko ◽  

Modern workloads, parallel or sequential, usually suffer from insufficient memory and computing performance. Common trends to improve workload performance include the utilizations of complex functional units or coprocessors, which are able not only to provide accelerated computations but also independently fetch data from memory generating complex address patterns, with or without support of control flow operations. Such coprocessors usually are not adopted by optimizing compilers and should be utilized by special application interfaces by hand. On the other hand, memory bottlenecks may be avoided with proper use of processor prefetch capabilities which load necessary data ahead of actual utilization time, and the prefetch is also adopted only for simple cases making programmers to do it usually by hand. As workloads are fast migrating to embedded applications a problem raises how to utilize all hardware capabilities for speeding up workload at moderate efforts. This requires precise analysis of memory access patterns at program run time and marking hot spots where the vast amount of memory accesses is issued. Precise memory access model can be analyzed via simulators, for example Valgrind, which is capable to run really big workload, for example neural network inference in reasonable time. But simulators and hardware performance analyzers fail to separate the full amount of memory references and cache misses per particular modules as it requires the analysis of program call graph. We are extending Valgrind tool cache simulator, which allows to account memory accesses per software modules and render realistic distribution of hot spot in a program. Additionally the analysis of address sequences in the simulator allows to recover array access patterns and propose effective prefetching schemes. Motivating samples are provided to illustrate the use of Valgrind tool.


2012 ◽  
Vol 21 (02) ◽  
pp. 1240006 ◽  
Author(s):  
RAGAVENDRA NATARAJAN ◽  
VINEETH MEKKAT ◽  
WEI-CHUNG HSU ◽  
ANTONIA ZHAI

For today's increasingly power-constrained multicore systems, integrating simpler and more energy-efficient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated effective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control flow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the effectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often effectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become inefficient in the presence of complex control flow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more difficult for the compiler to insert prefetch instructions, since optimizations that are effective for single-threaded execution may or may not be effective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that otherwise do not exist. Our experiences suggest that dynamic performance tuning techniques that adjust to the behaviors of a program can potentially facilitate the deployment of aggressive optimizations in data mining applications.


2018 ◽  
Vol 78 ◽  
pp. 1-14 ◽  
Author(s):  
Harald Servat ◽  
Jesús Labarta ◽  
Hans-Christian Hoppe ◽  
Judit Giménez ◽  
Antonio J. Peña

2019 ◽  
Vol 16 (3) ◽  
pp. 1-24
Author(s):  
Bingchao Li ◽  
Jizeng Wei ◽  
Jizhou Sun ◽  
Murali Annavaram ◽  
Nam Sung Kim

2020 ◽  
Vol 16 (4) ◽  
pp. 1-27 ◽  
Author(s):  
Leeor Peled ◽  
Uri Weiser ◽  
Yoav Etsion

2014 ◽  
Vol 80 ◽  
pp. 440-456
Author(s):  
Alain Ketterlin ◽  
Philippe Clauss

Sign in / Sign up

Export Citation Format

Share Document