Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

THE PROBLEM OF PROVIDING CACHE COHERENCE IN MULTIPROCESSOR SYSTEMS WITH MANY PROCESSORS

Issues of radio electronics ◽

10.21778/2218-5453-2018-5-47-53 ◽

2018 ◽

pp. 47-53

Author(s):

B. Z. Shmeylin ◽

E. A. Alekseeva

Keyword(s):

Cache Coherence ◽

Bloom Filters ◽

Multiprocessor Systems ◽

Cache Line ◽

Maintenance Systems ◽

Processor Caches ◽

Conventional Systems ◽

Additional Hardware

In this paper the tasks of managing the directory in coherence maintenance systems in multiprocessor systems with a large number of processors are solved. In microprocessor systems with a large number of processors (MSLP) the problem of maintaining the coherence of processor caches is significantly complicated. This is due to increased traffic on the memory buses and increased complexity of interprocessor communications. This problem is solved in various ways. In this paper, we propose the use of Bloom filters used to accelerate the determination of an element’s belonging to a certain array. In this article, such filters are used to establish the fact that the processor belongs to some subset of the processors and determine if the processor has a cache line in the set. In the paper, the processes of writing and reading information in the data shared between processors are discussed in detail, as well as the process of data replacement from private caches. The article also shows how the addresses of cache lines and processor numbers are removed from the Bloom filters. The system proposed in this paper allows significantly speeding up the implementation of operations to maintain cache coherence in the MSLP as compared to conventional systems. In terms of performance and additional hardware and software costs, the proposed system is not inferior to the most efficient of similar systems, but on some applications and significantly exceeds them.

Download Full-text

Evaluating Cache Line Behavior Predictors for Energy Efficient Processors

Communications in Computer and Information Science - High Performance Computing Systems ◽

10.1007/978-3-030-41050-6_12 ◽

2020 ◽

pp. 185-197

Author(s):

Rodrigo Machniewicz Sokulski ◽

Emmanuell Diaz Carreno ◽

Marco Antonio Zanata Alves

Keyword(s):

Energy Efficient ◽

Cache Line

Download Full-text

Cache Line Sharing and Communication in ECP Proxy Applications

OpenMP: Conquering the Full Hardware Spectrum - Lecture Notes in Computer Science ◽

10.1007/978-3-030-28596-8_21 ◽

2019 ◽

pp. 306-319

Author(s):

Joshua Randall ◽

Alejandro Rico ◽

Jose A. Joao

Keyword(s):

Cache Line

Download Full-text

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Download Full-text

Enabling Partial-Cache Line Prefetching through Data Compression

High-Performance Computing ◽

10.1002/0471732710.ch9 ◽

2006 ◽

pp. 183-201

Author(s):

Youtao Zhang ◽

Rajiv Gupta

Keyword(s):

Data Compression ◽

Cache Line

Download Full-text

CAWBT: NVM-Based B+Tree Index Structure Using Cache Line Sized Atomic Write

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2019edp7034 ◽

2019 ◽

Vol E102.D (12) ◽

pp. 2441-2450

Author(s):

Dokeun LEE ◽

Seongjin LEE ◽

Youjip WON

Keyword(s):

Index Structure ◽

Cache Line ◽

Tree Index

Download Full-text

NVCL: Exploiting NVRAM in Cache-Line Granularity Differential Logging

2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA) ◽

10.1109/nvmsa.2018.00011 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mingzhe Zhang ◽

Xin Yao ◽

Cho-Li Wang

Keyword(s):

Cache Line

Download Full-text

Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power

Power-Aware Computer Systems - Lecture Notes in Computer Science ◽

10.1007/3-540-44572-2_7 ◽

2001 ◽

pp. 82-96 ◽

Cited By ~ 9

Author(s):

Stefanos Kaxiras ◽

Zhigang Hu ◽

Girija Narlikar ◽

Rae McLellan

Keyword(s):

Leakage Power ◽

Cache Line

Download Full-text

xB+-Tree: Access-Pattern-Aware Cache-Line-Based Tree for Non-volatile Main Memory Architecture

2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC) ◽

10.1109/compsac.2017.267 ◽

2017 ◽

Cited By ~ 1

Author(s):

Li-Zheng Liang ◽

Ming-Chang Yang ◽

Yuan-Hao Chang ◽

Tseng-Yi Chen ◽

Shuo-Han Chen ◽

...

Keyword(s):

Main Memory ◽

Memory Architecture ◽

Access Pattern ◽

Cache Line

Download Full-text

Capturing and optimizing the interactions between prefetching and cache line turnoff

Microprocessors and Microsystems ◽

10.1016/j.micpro.2008.05.003 ◽

2008 ◽

Vol 32 (7) ◽

pp. 394-404 ◽

Cited By ~ 2

Author(s):

Ismail Kadayif ◽

Ayhan Zorlubas ◽

Selcuk Koyuncu ◽

Olcay Kabal ◽

Davut Akcicek ◽

...

Keyword(s):

Cache Line

Download Full-text