cache line
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 17)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
Vol 15 (2) ◽  
pp. 1-33
Author(s):  
Mikhail Asiatici ◽  
Paolo Ienne

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.


2022 ◽  
Vol 19 (1) ◽  
pp. 1-25
Author(s):  
Muhammad Aditya Sasongko ◽  
Milind Chabbi ◽  
Mandana Bagheri Marzijarani ◽  
Didem Unat

One widely used metric that measures data locality is reuse distance —the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in parallel applications rely on simulators or binary instrumentation tools that incur large performance and memory overheads. Moreover, the existing sampling-based tools are limited to measuring reuse distances of a single thread and discard interactions among threads in multi-threaded programs. In this work, we propose ReuseTracker —a fast and accurate reuse distance analyzer that leverages existing hardware features in commodity CPUs. ReuseTracker is designed for multi-threaded programs and takes cache-coherence effects into account. By utilizing hardware features like performance monitoring units and debug registers, ReuseTracker can accurately profile reuse distance in parallel applications with much lower overheads than existing tools. It introduces only 2.9× runtime and 2.8× memory overheads. Our tool achieves 92% accuracy when verified against a newly developed configurable benchmark that can generate a variety of different reuse distance patterns. We demonstrate the tool’s functionality with two use-case scenarios using PARSEC, Rodinia, and Synchrobench benchmark suites where ReuseTracker guides code refactoring in these benchmarks by detecting spatial reuses in shared caches that are also false sharing and successfully predicts whether some benchmarks in these suites can benefit from adjacent cache line prefetch optimization.


2022 ◽  
Vol 355 ◽  
pp. 03054
Author(s):  
Dehua Wu ◽  
Wan’ang Xiao ◽  
Shan Gao ◽  
Wanlin Gao

The Spectre attacks exploit the speculative execution vulnerabilities to exfiltrate private information by building a leakage channel. Creation of a leakage channel is the basic element for spectre attacks, among which the cache-tag side channel is considered to be the most serious one. To block the leakage channels, a novel cache applies Dynamic Mapping technology, named DmCache, is presented in this paper. DmCache applies a dynamic mapping mechanism to temporarily store all the cache lines polluted by speculative execution and keep invisible when accessing. Then it monitors the head of the reorder buffer to determine which polluted cache line can become visible. In this paper, we demonstrated that Spectre attacks exerted no impact on a processor system equipped with DmCache based on the analysis of the processor’s circuit behaviour, which equipped with the DmCache and under the Spectre attack.


2021 ◽  
pp. 2150010
Author(s):  
Shane Carroll ◽  
Wei-Ming Lin

In a CPU cache utilizing least recently used (LRU) replacement, cache sets manage a buffer which orders all cache lines in the set from LRU to most recently used (MRU). When a cache line is brought into cache, it is placed at the MRU and the LRU line is evicted. When re-accessed, a line is promoted to the MRU position. LRU replacement provides a simple heuristic to predict the optimal cache line to evict. However, LRU utilizes only simple, short-term access patterns. In this paper, we propose a method that uses a buffer called the history queue to record longer-term access-eviction patterns than the LRU buffer can capture. Using this information, we make a simple modification to LRU insertion policy such that recently-recalled blocks have priority over others. As lines are evicted, their addresses are recorded in a FIFO history queue. Incoming lines that have recently been evicted and now recalled (those in the history queue at recall time) remain in the MRU for an extended period of time as non-recalled lines entering the cache thereafter are placed below the MRU. We show that the proposed LRU insertion prioritization increases performance in single-threaded and multi-threaded workloads in simulations with simple adjustments to baseline LRU.


Sign in / Sign up

Export Citation Format

Share Document