cache access
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 5)

H-INDEX

8
(FIVE YEARS 0)

Author(s):  
Siamak Biglari Ardabili ◽  
Gholamreza Zare Fatin

As the number of streaming multiprocessors (SMs) in GPUs increases, in order to gain better performance, the reply network faces heavy traffic. This causes congestion on Network-on-Chip (NoC) routers and memory controller’s (MC) buffers. By taking advantage of cooperative thread arrays (CTAs) that are scheduled locally in clusters, there is a high probability of finding the same copy of data in other SM’s [Formula: see text] cache in the same cluster. In order to make this feasible, it is necessary for the SMs to have access to local [Formula: see text] cache of the neighboring SMs. There is a considerable congestion in NoC due to unique traffic pattern called many-to-few-to-many. Thanks to the reduced number of requests that is attained by our proposed Intra-Cluster Locality-Aware (ICLA) unit, this congested replying network traffic becomes many-to-many traffic pattern and the replied data goes through the less-utilized core-to-core communication that mitigates the NoC traffic. The proposed architecture in this paper has been evaluated using 15 different workloads from CUDA SDK, Rodinia, and ISPASS2009 benchmarks. The proposed ICLA unit has been modeled and simulated in the GPGPU-Sim. The results show about 23.79% (up to 49.82%) reduction in average network latency, 15.49% (up to 36.82%) reduction in average [Formula: see text] cache access, and 18.18% (up to 58.1%) average improvement in the instruction per cycle (IPC).



2021 ◽  
pp. 2150010
Author(s):  
Shane Carroll ◽  
Wei-Ming Lin

In a CPU cache utilizing least recently used (LRU) replacement, cache sets manage a buffer which orders all cache lines in the set from LRU to most recently used (MRU). When a cache line is brought into cache, it is placed at the MRU and the LRU line is evicted. When re-accessed, a line is promoted to the MRU position. LRU replacement provides a simple heuristic to predict the optimal cache line to evict. However, LRU utilizes only simple, short-term access patterns. In this paper, we propose a method that uses a buffer called the history queue to record longer-term access-eviction patterns than the LRU buffer can capture. Using this information, we make a simple modification to LRU insertion policy such that recently-recalled blocks have priority over others. As lines are evicted, their addresses are recorded in a FIFO history queue. Incoming lines that have recently been evicted and now recalled (those in the history queue at recall time) remain in the MRU for an extended period of time as non-recalled lines entering the cache thereafter are placed below the MRU. We show that the proposed LRU insertion prioritization increases performance in single-threaded and multi-threaded workloads in simulations with simple adjustments to baseline LRU.



2021 ◽  
Author(s):  
Lubna Badri Mohammed ◽  
Alagan Anpalagan ◽  
Muhammad Jaseemuddin

<div><div><div><p>Future wireless networks provide research challenges with many fold increase of smart devices and the exponential growth in mobile data traffic. The advent of highly computational and real-time applications cause huge expansion in traffic volume. The emerging need to bring data closer to users and minimizing the traffic off the macrocell base station (MBS) introduces the use of caches at the edge of the networks. Storing most popular files at the edge of mobile edge networks (MENs) in user terminals (UTs) and small base stations (SBSs) caches is a promising approach to the challenges that face data-rich wireless networks. Caching at the mobile UT allows to obtain requested contents directly from its nearby UTs caches through the device-to- device (D2D) communication.</p><p>In this survey article, solutions for mobile edge computing and caching challenges in terms of energy and latency are presented. Caching in MENs and comparisons between different caching techniques in MENs are presented. An illustration of the research in cache development for wireless networks that apply intelligent and learning techniques (ILTs) in a specific domain in their design is presented. We summarize the challenges that face the design of caching system in MENs. Finally, some future research directions are discussed for the development of cache placement and cache access and delivery in MENs.</p></div></div></div>



2021 ◽  
Author(s):  
Lubna Badri Mohammed ◽  
Alagan Anpalagan ◽  
Muhammad Jaseemuddin

<div><div><div><p>Future wireless networks provide research challenges with many fold increase of smart devices and the exponential growth in mobile data traffic. The advent of highly computational and real-time applications cause huge expansion in traffic volume. The emerging need to bring data closer to users and minimizing the traffic off the macrocell base station (MBS) introduces the use of caches at the edge of the networks. Storing most popular files at the edge of mobile edge networks (MENs) in user terminals (UTs) and small base stations (SBSs) caches is a promising approach to the challenges that face data-rich wireless networks. Caching at the mobile UT allows to obtain requested contents directly from its nearby UTs caches through the device-to- device (D2D) communication.</p><p>In this survey article, solutions for mobile edge computing and caching challenges in terms of energy and latency are presented. Caching in MENs and comparisons between different caching techniques in MENs are presented. An illustration of the research in cache development for wireless networks that apply intelligent and learning techniques (ILTs) in a specific domain in their design is presented. We summarize the challenges that face the design of caching system in MENs. Finally, some future research directions are discussed for the development of cache placement and cache access and delivery in MENs.</p></div></div></div>



2020 ◽  
Vol 34 (23) ◽  
pp. 2050242
Author(s):  
Yao Wang ◽  
Lijun Sun ◽  
Haibo Wang ◽  
Lavanya Gopalakrishnan ◽  
Ronald Eaton

Cache sharing technique is critical in multi-core and multi-threading systems. It potentially delays the execution of real-time applications and makes the prediction of the worst-case execution time (WCET) of real-time applications more challenging. Prioritized cache has been demonstrated as a promising approach to address this challenge. Instead of the conventional prioritized cache schemes realized at the architecture level by using cache controllers, this work presents two prioritized least recently used (LRU) cache replacement circuits that directly accomplish the prioritization inside the cache circuits, hence significantly reduces the cache access latency. The performance, hardware and power overheads due to the proposed prioritized LRU circuits are investigated based on a 65 nm CMOS technology. It shows that the proposed circuits have very low overhead compared to conventional cache circuits. The presented techniques will lead to more effective prioritized shared cache implementations and benefit the development of high-performance real-time systems.



2019 ◽  
Vol 29 (2) ◽  
pp. 17-27
Author(s):  
D. V. Znamenskiy ◽  
V. N. Kutsevol

Increasing complexity of modern microprocessors, combined with semiconductor technology progress slowdown, make a further increase in performance more difficult. Under these circumstances, the relevance of the performance estimations of prospective microprocessors by dint of cycle-accurate simulation prior to their production in silicon is of growing importance. The approach to implementation of cycle-accurate simulator of core memory subsystem for Elbrus architecture, controlled by the existing functional simulator of this architecture, is presented herein. The method for validation of a cycleaccurate simulator by comparison with modeling of the RTL description of the prospective microprocessor is considered. The data on the speed of the cycle-accurate simulator and the main optimization methods, which were used to achieve acceptable performance, are presented. The preliminary estimates of the impact on the performance of some changes in the prospective processor core, including the cache access latency and hardware support for virtualization, obtained with the help of the cycle-accurate simulator are given. These assessments are important for making architectural decisions when designing the prospective Elbrus architecture processors.



IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 42984-42996 ◽  
Author(s):  
Zicong Wang ◽  
Xiaowen Chen ◽  
Zhonghai Lu ◽  
Yang Guo
Keyword(s):  
3D Mesh ◽  


Author(s):  
◽  

Data or instructions that are regularly used are saved in cache so that it is very easy to retrieve for the purpose of increase the cache performance. Evaluating the execution of multi-core systems the part of the cache memory is very important. A multicore processor is shared circuit in which two or more processors are joined to enhance the performance and perform multiple tasks. This paper describes the performance of cache memory based on cache access time, miss rate and miss penalty. Cache mapping methods are defined to increase the performance of cache but it face many difficulties. Some methods and algorithms are used to decrease these difficulties. In this paper describes the study of recent competing processors to evaluate the cache memory performance.



Sign in / Sign up

Export Citation Format

Share Document