cache block
Recently Published Documents


TOTAL DOCUMENTS

20
(FIVE YEARS 8)

H-INDEX

3
(FIVE YEARS 0)

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 240
Author(s):  
Beomjun Kim ◽  
Yongtae Kim ◽  
Prashant Nair ◽  
Seokin Hong

STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based on-chip caches. Due to its high density and low leakage power, STT-RAM can be used to build massive capacity last-level caches (LLC). Unfortunately, STT-RAM has a much longer write latency and a much greater write energy than SRAM. Researchers developed hybrid caches made up of SRAM and STT-RAM regions to cope with these challenges. In order to store as many write-intensive blocks in the SRAM region as possible in hybrid caches, an intelligent block placement policy is essential. This paper proposes an adaptive block placement framework for hybrid caches that incorporates metadata embedding (ADAM). When a cache block is evicted from the LLC, ADAM embeds metadata (i.e., write intensity) into the block. Metadata embedded in the cache block are then extracted and used to determine the block’s write intensity when it is fetched from main memory. Our research demonstrates that ADAM can enhance performance by 26% (on average) when compared to a baseline block placement scheme.


2021 ◽  
Vol 18 (4) ◽  
pp. 1-27
Author(s):  
Matthew Tomei ◽  
Shomit Das ◽  
Mohammad Seyedzadeh ◽  
Philip Bedoukian ◽  
Bradford Beckmann ◽  
...  

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.


Author(s):  
A. A. Prihozhy

This paper is devoted to the reduction of data transfer between the main memory and direct mapped cache for blocked shortest paths algorithms (BSPA), which represent data by a D[M×M] matrix of blocks. For large graphs, the cache size S = δ×M2, δ < 1 is smaller than the matrix size. The cache assigns a group of main memory blocks to a single cache block. BSPA performs multiple recalculations of a block over one or two other blocks and may access up to three blocks simultaneously. If the blocks are assigned to the same cache block, conflicts occur among the blocks, which imply active transfer of data between memory levels. The distribution of blocks on groups and the block conflict count strongly depends on the allocation and ordering of the matrix blocks in main memory. To solve the problem of optimal block allocation, the paper introduces a block conflict weighted graph and recognizes two cases of block mapping: non-conflict and minimum-conflict. In first case, it formulates an equitable color-class-size constrained coloring problem on the conflict graph and solves it by developing deterministic and random algorithms. In second case, the paper formulates a problem of weighted defective color-count constrained coloring of the conflict graph and solves it by developing a random algorithm. Experimental results show that the equitable random algorithm provides an upper bound of the cache size that is very close to the lower bound estimated over the size of a complete subgraph, and show that a non-conflict matrix allocation is possible at δ = 0.5 for M = 4 and at δ = 0.1 for M = 20. For a low cache size, the weighted defective algorithm gives the number of remaining conflicts that is up to 8.8 times less than the original BSPA gives. The proposed model and algorithms are applicable to set-associative cache as well.


2021 ◽  
Author(s):  
Nikolaus Jeremic ◽  
Helge Parzyjegla ◽  
Gero Muhl
Keyword(s):  

2019 ◽  
Vol 29 (08) ◽  
pp. 2050120
Author(s):  
Suvadip Hazra ◽  
Mamata Dalui

Nowadays, Hardware Trojan threats have become inevitable due to the growing complexities of Integrated Circuits (ICs) as well as the current trend of Intellectual Property (IP)-based hardware designs. An adversary can insert a Hardware Trojan during any of its life cycle phases — the design, fabrication or even at manufacturing phase. Once a Trojan is inserted into a system, it can cause an unwanted modification to system functionality which may degrade system performance or sometimes Trojans are implanted with the target to leak secret information. Once Trojans are implanted, they are hard to detect and impossible to remove from the system as they are already fabricated into the chip. In this paper, we propose three stealthy Trojan models which affect the coherence mechanism of Chip Multiprocessors’ (CMPs) cache system by arbitrarily modifying the cache block state which in turn may leave the cache line states as incoherent. We have evaluated the payload of such modeled Trojans and proposed a cellular automaton (CA)-based solution for detection of such Trojans.


2013 ◽  
Vol 462-463 ◽  
pp. 884-890
Author(s):  
Bin Tang ◽  
Guo Yin Zhang ◽  
Zhi Jing Xing ◽  
Yan Xia Wu ◽  
Xiang Hui Wang

In-network caching is one of the key aspects of content-centric networks (CCN), while the cache replacement algorithm of LRU does not consider the relation between the contents of the cache and its neighbor nodes in the cache replacement process, which bring worthless cache block in the cache and reduce the efficiency of the cache. An enhanced LRU cache replacement strategy has been proposed, which can replace the cache block in time that is not requested from other nodes and improve the rate of effective utilization of the cache space. Simulation results show that the A-LRU strategy increases cache hit rate, shortens the data request delay and improves overall network performance, verifies the validity of the A-LRU strategies in CCN.


Sign in / Sign up

Export Citation Format

Share Document