cache block Latest Research Papers

Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches

Electronics ◽

10.3390/electronics11020240 ◽

2022 ◽

Vol 11 (2) ◽

pp. 240

Author(s):

Beomjun Kim ◽

Yongtae Kim ◽

Prashant Nair ◽

Seokin Hong

Keyword(s):

Random Access ◽

Spin Transfer Torque ◽

Main Memory ◽

Spin Transfer ◽

Access Memory ◽

Low Leakage ◽

Block Placement ◽

On Chip ◽

Hybrid Caches ◽

Cache Block

STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based on-chip caches. Due to its high density and low leakage power, STT-RAM can be used to build massive capacity last-level caches (LLC). Unfortunately, STT-RAM has a much longer write latency and a much greater write energy than SRAM. Researchers developed hybrid caches made up of SRAM and STT-RAM regions to cope with these challenges. In order to store as many write-intensive blocks in the SRAM region as possible in hybrid caches, an intelligent block placement policy is essential. This paper proposes an adaptive block placement framework for hybrid caches that incorporates metadata embedding (ADAM). When a cache block is evicted from the LLC, ADAM embeds metadata (i.e., write intensity) into the block. Metadata embedded in the cache block are then extracted and used to determine the block’s write intensity when it is fetched from main memory. Our research demonstrates that ADAM can enhance performance by 26% (on average) when compared to a baseline block placement scheme.

Byte-Select Compression

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3462209 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-27

Author(s):

Matthew Tomei ◽

Shomit Das ◽

Mohammad Seyedzadeh ◽

Philip Bedoukian ◽

Bradford Beckmann ◽

...

Keyword(s):

Compression Ratio ◽

Ad Hoc ◽

Training Data ◽

Data Sets ◽

Effective Technique ◽

Compression Algorithms ◽

Previous Algorithm ◽

Data Transfers ◽

Cache Compression ◽

Cache Block

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.

Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms

«System analysis and applied information science» ◽

10.21122/2309-4923-2021-3-40-50 ◽

2021 ◽

pp. 40-50

Author(s):

A. A. Prihozhy

Keyword(s):

Shortest Paths ◽

Main Memory ◽

Conflict Graph ◽

Data Allocation ◽

Complete Subgraph ◽

Cache Size ◽

First Case ◽

The Matrix ◽

Random Algorithm ◽

Cache Block

This paper is devoted to the reduction of data transfer between the main memory and direct mapped cache for blocked shortest paths algorithms (BSPA), which represent data by a D[M×M] matrix of blocks. For large graphs, the cache size S = δ×M2, δ < 1 is smaller than the matrix size. The cache assigns a group of main memory blocks to a single cache block. BSPA performs multiple recalculations of a block over one or two other blocks and may access up to three blocks simultaneously. If the blocks are assigned to the same cache block, conflicts occur among the blocks, which imply active transfer of data between memory levels. The distribution of blocks on groups and the block conflict count strongly depends on the allocation and ordering of the matrix blocks in main memory. To solve the problem of optimal block allocation, the paper introduces a block conflict weighted graph and recognizes two cases of block mapping: non-conflict and minimum-conflict. In first case, it formulates an equitable color-class-size constrained coloring problem on the conflict graph and solves it by developing deterministic and random algorithms. In second case, the paper formulates a problem of weighted defective color-count constrained coloring of the conflict graph and solves it by developing a random algorithm. Experimental results show that the equitable random algorithm provides an upper bound of the cache size that is very close to the lower bound estimated over the size of a complete subgraph, and show that a non-conflict matrix allocation is possible at δ = 0.5 for M = 4 and at δ = 0.1 for M = 20. For a low cache size, the weighted defective algorithm gives the number of remaining conflicts that is up to 8.8 times less than the original BSPA gives. The proposed model and algorithms are applicable to set-associative cache as well.

On Adapting the Cache Block Size in SSD Caches

10.1109/nas51552.2021.9605462 ◽

2021 ◽

Author(s):

Nikolaus Jeremic ◽

Helge Parzyjegla ◽

Gero Muhl

Keyword(s):

Block Size ◽

Cache Block

Implementation of data cache block (DCB) in shared processor using field-programmable gate array (FPGA)

Journal of the National Science Foundation of Sri Lanka ◽

10.4038/jnsfsr.v48i4.10340 ◽

2020 ◽

Vol 48 (4) ◽

pp. 475

Author(s):

R Karthick ◽

P Meenalochini

Keyword(s):

Field Programmable Gate Array ◽

Data Cache ◽

Field Programmable ◽

Gate Array ◽

Cache Block

Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data Caches

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2019.2937807 ◽

2020 ◽

Vol 39 (10) ◽

pp. 2333-2346

Author(s):

Wei Zhang ◽

Nan Guan ◽

Lei Ju ◽

Yue Tang ◽

Weichen Liu ◽

...

Keyword(s):

Delay Analysis ◽

Data Caches ◽

Cache Block

CA-Based Detection of Coherence Exploiting Hardware Trojans

Journal of Circuits System and Computers ◽

10.1142/s0218126620501200 ◽

2019 ◽

Vol 29 (08) ◽

pp. 2050120

Author(s):

Suvadip Hazra ◽

Mamata Dalui

Keyword(s):

Integrated Circuits ◽

System Performance ◽

Chip Multiprocessors ◽

Current Trend ◽

Hardware Trojan ◽

Hardware Trojans ◽

Cache Line ◽

Hardware Designs ◽

Cache System ◽

Cache Block

Nowadays, Hardware Trojan threats have become inevitable due to the growing complexities of Integrated Circuits (ICs) as well as the current trend of Intellectual Property (IP)-based hardware designs. An adversary can insert a Hardware Trojan during any of its life cycle phases — the design, fabrication or even at manufacturing phase. Once a Trojan is inserted into a system, it can cause an unwanted modification to system functionality which may degrade system performance or sometimes Trojans are implanted with the target to leak secret information. Once Trojans are implanted, they are hard to detect and impossible to remove from the system as they are already fabricated into the chip. In this paper, we propose three stealthy Trojan models which affect the coherence mechanism of Chip Multiprocessors’ (CMPs) cache system by arbitrarily modifying the cache block state which in turn may leave the cache line states as incoherent. We have evaluated the payload of such modeled Trojans and proposed a cellular automaton (CA)-based solution for detection of such Trojans.

POSTER: Variable Sized Cache-Block Compaction

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT) ◽

10.1109/pact.2019.00050 ◽

2019 ◽

Author(s):

Sayantan Ray ◽

Madhu Mutyam

Keyword(s):

Cache Block

Scope-Aware Useful Cache Block Analysis for Data Cache Related Preemption Delay

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) ◽

10.1109/rtas.2017.35 ◽

2017 ◽

Author(s):

Wei Zhang ◽

Fan Gong ◽

Lei Ju ◽

Nan Guan ◽

Zhiping Jia

Keyword(s):

Data Cache ◽

Cache Block

An Advanced LRU Cache Replacement Strategy for Content-Centric Network

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.884 ◽

2013 ◽

Vol 462-463 ◽

pp. 884-890

Author(s):

Bin Tang ◽

Guo Yin Zhang ◽

Zhi Jing Xing ◽

Yan Xia Wu ◽

Xiang Hui Wang

Keyword(s):

Network Performance ◽

Cache Replacement ◽

Effective Utilization ◽

Replacement Strategy ◽

Replacement Algorithm ◽

Replacement Process ◽

Content Centric Network ◽

Key Aspects ◽

Data Request ◽

Cache Block

In-network caching is one of the key aspects of content-centric networks (CCN), while the cache replacement algorithm of LRU does not consider the relation between the contents of the cache and its neighbor nodes in the cache replacement process, which bring worthless cache block in the cache and reduce the efficiency of the cache. An enhanced LRU cache replacement strategy has been proposed, which can replace the cache block in time that is not requested from other nodes and improve the rate of effective utilization of the cache space. Simulation results show that the A-LRU strategy increases cache hit rate, shortens the data request delay and improves overall network performance, verifies the validity of the A-LRU strategies in CCN.

cache block
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches

Byte-Select Compression

Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms

On Adapting the Cache Block Size in SSD Caches

Implementation of data cache block (DCB) in shared processor using field-programmable gate array (FPGA)

Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data Caches

CA-Based Detection of Coherence Exploiting Hardware Trojans

POSTER: Variable Sized Cache-Block Compaction

Scope-Aware Useful Cache Block Analysis for Data Cache Related Preemption Delay

An Advanced LRU Cache Replacement Strategy for Content-Centric Network

Export Citation Format

cache blockRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches

Byte-Select Compression

Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms

On Adapting the Cache Block Size in SSD Caches

Implementation of data cache block (DCB) in shared processor using field-programmable gate array (FPGA)

Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data Caches

CA-Based Detection of Coherence Exploiting Hardware Trojans

POSTER: Variable Sized Cache-Block Compaction

Scope-Aware Useful Cache Block Analysis for Data Cache Related Preemption Delay

An Advanced LRU Cache Replacement Strategy for Content-Centric Network

cache block
Recently Published Documents