cache line Latest Research Papers

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Download Full-text

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3484199 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-25

Author(s):

Muhammad Aditya Sasongko ◽

Milind Chabbi ◽

Mandana Bagheri Marzijarani ◽

Didem Unat

Keyword(s):

Performance Monitoring ◽

State Of The Art ◽

Data Locality ◽

Parallel Applications ◽

Use Case ◽

Memory Location ◽

Reuse Distance ◽

Shared Caches ◽

Code Refactoring ◽

Cache Line

One widely used metric that measures data locality is reuse distance —the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in parallel applications rely on simulators or binary instrumentation tools that incur large performance and memory overheads. Moreover, the existing sampling-based tools are limited to measuring reuse distances of a single thread and discard interactions among threads in multi-threaded programs. In this work, we propose ReuseTracker —a fast and accurate reuse distance analyzer that leverages existing hardware features in commodity CPUs. ReuseTracker is designed for multi-threaded programs and takes cache-coherence effects into account. By utilizing hardware features like performance monitoring units and debug registers, ReuseTracker can accurately profile reuse distance in parallel applications with much lower overheads than existing tools. It introduces only 2.9× runtime and 2.8× memory overheads. Our tool achieves 92% accuracy when verified against a newly developed configurable benchmark that can generate a variety of different reuse distance patterns. We demonstrate the tool’s functionality with two use-case scenarios using PARSEC, Rodinia, and Synchrobench benchmark suites where ReuseTracker guides code refactoring in these benchmarks by detecting spatial reuses in shared caches that are also false sharing and successfully predicts whether some benchmarks in these suites can benefit from adjacent cache line prefetch optimization.

Download Full-text

A novel cache based on dynamic mapping against speculative execution attacks

MATEC Web of Conferences ◽

10.1051/matecconf/202235503054 ◽

2022 ◽

Vol 355 ◽

pp. 03054

Author(s):

Dehua Wu ◽

Wan’ang Xiao ◽

Shan Gao ◽

Wanlin Gao

Keyword(s):

Private Information ◽

Basic Element ◽

Speculative Execution ◽

Side Channel ◽

Dynamic Mapping ◽

Mapping Technology ◽

Cache Line ◽

Mapping Mechanism

The Spectre attacks exploit the speculative execution vulnerabilities to exfiltrate private information by building a leakage channel. Creation of a leakage channel is the basic element for spectre attacks, among which the cache-tag side channel is considered to be the most serious one. To block the leakage channels, a novel cache applies Dynamic Mapping technology, named DmCache, is presented in this paper. DmCache applies a dynamic mapping mechanism to temporarily store all the cache lines polluted by speculative execution and keep invisible when accessing. Then it monitors the head of the reorder buffer to determine which polluted cache line can become visible. In this paper, we demonstrated that Spectre attacks exerted no impact on a processor system equipped with DmCache based on the analysis of the processor’s circuit behaviour, which equipped with the DmCache and under the Spectre attack.

Download Full-text

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Microprocessors and Microsystems ◽

10.1016/j.micpro.2021.104301 ◽

2021 ◽

pp. 104301

Author(s):

Bingchao Li ◽

Jizeng Wei ◽

Nam Sung Kim

Keyword(s):

Cache Line

Download Full-text

Exploiting Long-Term Temporal Cache Access Patterns for LRU Insertion Prioritization

Parallel Processing Letters ◽

10.1142/s0129626421500109 ◽

2021 ◽

pp. 2150010

Author(s):

Shane Carroll ◽

Wei-Ming Lin

Keyword(s):

Extended Period ◽

Short Term ◽

Simple Modification ◽

Cache Access ◽

Simple Heuristic ◽

Recall Time ◽

Cache Line ◽

Access Patterns

In a CPU cache utilizing least recently used (LRU) replacement, cache sets manage a buffer which orders all cache lines in the set from LRU to most recently used (MRU). When a cache line is brought into cache, it is placed at the MRU and the LRU line is evicted. When re-accessed, a line is promoted to the MRU position. LRU replacement provides a simple heuristic to predict the optimal cache line to evict. However, LRU utilizes only simple, short-term access patterns. In this paper, we propose a method that uses a buffer called the history queue to record longer-term access-eviction patterns than the LRU buffer can capture. Using this information, we make a simple modification to LRU insertion policy such that recently-recalled blocks have priority over others. As lines are evicted, their addresses are recorded in a FIFO history queue. Incoming lines that have recently been evicted and now recalled (those in the history queue at recall time) remain in the MRU for an extended period of time as non-recalled lines entering the cache thereafter are placed below the MRU. We show that the proposed LRU insertion prioritization increases performance in single-threaded and multi-threaded workloads in simulations with simple adjustments to baseline LRU.

Download Full-text

Cache Tag Array Fault Tolerance Method Based on Redundancy and Similarity of Adjacent Cache Line Tag Bits

10.1145/3474198.3478212 ◽

2021 ◽

Author(s):

Xiaozhi Du ◽

Honglei Dong ◽

Hehe Yue

Keyword(s):

Fault Tolerance ◽

Cache Line ◽

Tolerance Method

Download Full-text

BCD deduplication: effective memory compression using partial cache-line deduplication

Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems ◽

10.1145/3445814.3446722 ◽

2021 ◽

Author(s):

Sungbo Park ◽

Ingab Kang ◽

Yaebin Moon ◽

Jung Ho Ahn ◽

G. Edward Suh

Keyword(s):

Memory Compression ◽

Cache Line

Download Full-text

DPCLS: Improving Partial Cache Line Sparing with Dynamics for Memory Error Prevention

2020 IEEE 38th International Conference on Computer Design (ICCD) ◽

10.1109/iccd50377.2020.00045 ◽

2020 ◽

Author(s):

Xiaoming Du ◽

Cong Li

Keyword(s):

Error Prevention ◽

Memory Error ◽

Cache Line

Download Full-text

Isle-Tree: A B+-Tree with Intra-Cache Line Sorted Leaves for Non-volatile Memory

2020 IEEE 38th International Conference on Computer Design (ICCD) ◽

10.1109/iccd50377.2020.00101 ◽

2020 ◽

Author(s):

Chundong Wang ◽

Sudipta Chattopadhyay

Keyword(s):

Non Volatile Memory ◽

Cache Line ◽

Volatile Memory

Download Full-text

Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems

Proceedings of the 17th ACM International Conference on Computing Frontiers ◽

10.1145/3387902.3392631 ◽

2020 ◽

Author(s):

Matheus Cavalcante ◽

Andreas Kurth ◽

Fabian Schuiki ◽

Luca Benini

Keyword(s):

Open Source ◽

Memory Systems ◽

Cache Line

Download Full-text

cache line
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

A novel cache based on dynamic mapping against speculative execution attacks

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Exploiting Long-Term Temporal Cache Access Patterns for LRU Insertion Prioritization

Cache Tag Array Fault Tolerance Method Based on Redundancy and Similarity of Adjacent Cache Line Tag Bits

BCD deduplication: effective memory compression using partial cache-line deduplication

DPCLS: Improving Partial Cache Line Sparing with Dynamics for Memory Error Prevention

Isle-Tree: A B+-Tree with Intra-Cache Line Sorted Leaves for Non-volatile Memory

Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems

Export Citation Format

cache lineRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

A novel cache based on dynamic mapping against speculative execution attacks

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

Exploiting Long-Term Temporal Cache Access Patterns for LRU Insertion Prioritization

Cache Tag Array Fault Tolerance Method Based on Redundancy and Similarity of Adjacent Cache Line Tag Bits

BCD deduplication: effective memory compression using partial cache-line deduplication

DPCLS: Improving Partial Cache Line Sparing with Dynamics for Memory Error Prevention

Isle-Tree: A B+-Tree with Intra-Cache Line Sorted Leaves for Non-volatile Memory

Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems

cache line
Recently Published Documents