cache capacity Latest Research Papers

Locality-Aware CTA Scheduling for Gaming Applications

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3477497 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-26

Author(s):

Aditya Ukarande ◽

Suryakant Patidar ◽

Ram Rangan

Keyword(s):

Capacity Increase ◽

Two Generations ◽

Working Set ◽

Bandwidth Savings ◽

Nvidia Gpu ◽

Design Simplicity ◽

Operational Aspects ◽

Bandwidth Demand ◽

Cache Capacity

The compute work rasterizer or the GigaThread Engine of a modern NVIDIA GPU focuses on maximizing compute work occupancy across all streaming multiprocessors in a GPU while retaining design simplicity. In this article, we identify the operational aspects of the GigaThread Engine that help it meet those goals but also lead to less-than-ideal cache locality for texture accesses in 2D compute shaders, which are an important optimization target for gaming applications. We develop three software techniques, namely LargeCTAs , Swizzle , and Agents , to show that it is possible to effectively exploit the texture data working set overlap intrinsic to 2D compute shaders. We evaluate these techniques on gaming applications across two generations of NVIDIA GPUs, RTX 2080 and RTX 3080, and find that they are effective on both GPUs. We find that the bandwidth savings from all our software techniques on RTX 2080 is much higher than the bandwidth savings on baseline execution from inter-generational cache capacity increase going from RTX 2080 to RTX 3080. Our best-performing technique, Agents , records up to a 4.7% average full-frame speedup by reducing bandwidth demand of targeted shaders at the L1-L2 and L2-DRAM interfaces by 23% and 32%, respectively, on the latest generation RTX 3080. These results acutely highlight the sensitivity of cache locality to compute work rasterization order and the importance of locality-aware cooperative thread array scheduling for gaming applications.

GPU Domain Specialization via Composable On-Package Architecture

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3484505 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-23

Author(s):

Yaosheng Fu ◽

Evgeny Bolotin ◽

Niladrish Chatterjee ◽

David Nellans ◽

Stephen W. Keckler

Keyword(s):

Deep Learning ◽

Memory System ◽

Design Reuse ◽

Application Domain ◽

Precision Matrix ◽

Practical Solution ◽

Optimal Configurations ◽

Gpu Architecture ◽

With Memory ◽

Cache Capacity

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging architectural requirements between FP32 (or larger)-based HPC and FP16 (or smaller)-based DL workloads results in sub-optimal configurations for either of the application domains. We argue that a C omposable O n- PA ckage GPU (COPA-GPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4× higher off-die bandwidth, 32× larger on-package cache, and 2.3× higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16× larger cache capacity and 1.6× higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35%, respectively, and reduces the number of GPU instances by 50% in scale-out training scenarios.

IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection

Applied Sciences ◽

10.3390/app12010344 ◽

2021 ◽

Vol 12 (1) ◽

pp. 344

Author(s):

Salman Rashid ◽

Shukor Abd Razak ◽

Fuad A. Ghaleb

Keyword(s):

Network Performance ◽

Rapid Change ◽

Time Frame ◽

Replacement Policy ◽

Cache Replacement ◽

Content Centric Networking ◽

Content Selection ◽

Data Freshness ◽

Cache Hit Ratio ◽

Cache Capacity

In-network caching is the essential part of Content-Centric Networking (CCN). The main aim of a CCN caching module is data distribution within the network. Each CCN node can cache content according to its placement policy. Therefore, it is fully equipped to meet the requirements of future networks demands. The placement strategy decides to cache the content at the optimized location and minimize content redundancy within the network. When cache capacity is full, the content eviction policy decides which content should stay in the cache and which content should be evicted. Hence, network performance and cache hit ratio almost equally depend on the content placement and replacement policies. Content eviction policies have diverse requirements due to limited cache capacity, higher request rates, and the rapid change of cache states. Many replacement policies follow the concept of low or high popularity and data freshness for content eviction. However, when content loses its popularity after becoming very popular in a certain period, it remains in the cache space. Moreover, content is evicted from the cache space before it becomes popular. To handle the above-mentioned issue, we introduced the concept of maturity/immaturity of the content. The proposed policy, named Immature Used (IMU), finds the content maturity index by using the content arrival time and its frequency within a specific time frame. Also, it determines the maturity level through a maturity classifier. In the case of a full cache, the least immature content is evicted from the cache space. We performed extensive simulations in the simulator (Icarus) to evaluate the performance (cache hit ratio, path stretch, latency, and link load) of the proposed policy with different well-known cache replacement policies in CCN. The obtained results, with varying popularity and cache sizes, indicate that our proposed policy can achieve up to 14.31% more cache hits, 5.91% reduced latency, 3.82% improved path stretch, and 9.53% decreased link load, compared to the recently proposed technique. Moreover, the proposed policy performed significantly better compared to other baseline approaches.

Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks

Wireless Communications and Mobile Computing ◽

10.1155/2020/8842694 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Chenyu Wu ◽

Shuo Shi ◽

Shushi Gu ◽

Lingyan Zhang ◽

Xuemai Gu

Keyword(s):

Reinforcement Learning ◽

Mobile Users ◽

Trajectory Design ◽

Design Algorithm ◽

Achievable Throughput ◽

Storage Resource ◽

Significant Performance ◽

Content Placement ◽

And Storage ◽

Cache Capacity

Cache-enabled unmanned aerial vehicles (UAVs) have been envisioned as a promising technology for many applications in future urban wireless communication. However, to utilize UAVs properly is challenging due to limited endurance and storage capacity as well as the continuous roam of the mobile users. To meet the diversity of urban communication services, it is essential to exploit UAVs’ potential of mobility and storage resource. Toward this end, we consider an urban cache-enabled communication network where the UAVs serve mobile users with energy and cache capacity constraints. We formulate an optimization problem to maximize the sum achievable throughput in this system. To solve this problem, we propose a deep reinforcement learning-based joint content placement and trajectory design algorithm (DRL-JCT), whose progress can be divided into two stages: offline content placement stage and online user tracking stage. First, we present a link-based scheme to maximize the cache hit rate of all users’ file requirements under cache capacity constraint. The NP-hard problem is solved by approximation and convex optimization. Then, we leverage the Double Deep Q-Network (DDQN) to track mobile users online with their instantaneous two-dimensional coordinate under energy constraint. Numerical results show that our algorithm converges well after a small number of iterations. Compared with several benchmark schemes, our algorithm adapts to the dynamic conditions and provides significant performance in terms of sum achievable throughput.

Exploring the Relation between Monolithic 3D L1 GPU Cache Capacity and Warp Scheduling Efficiency

2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) ◽

10.1109/islped.2019.8824947 ◽

2019 ◽

Author(s):

Cong Thuan Do ◽

Young-Ho Gong ◽

Cheol Hong Kim ◽

Seon Wook Kim ◽

Sung Woo Chung

Keyword(s):

Cache Capacity

Energy Efficient Caching in Backhaul-Aware Cellular Networks with Dynamic Content Popularity

Wireless Communications and Mobile Computing ◽

10.1155/2018/7532049 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Jiequ Ji ◽

Kun Zhu ◽

Ran Wang ◽

Bing Chen ◽

Chen Dai

Keyword(s):

Energy Efficiency ◽

Cellular Networks ◽

Energy Efficient ◽

Base Stations ◽

Noise Model ◽

Traffic Demand ◽

Hit Rate ◽

Dynamic Content ◽

Content Popularity ◽

Cache Capacity

Caching popular contents at base stations (BSs) has been regarded as an effective approach to alleviate the backhaul load and to improve the quality of service. To meet the explosive data traffic demand and to save energy consumption, energy efficiency (EE) has become an extremely important performance index for the 5th generation (5G) cellular networks. In general, there are two ways for improving the EE for caching, that is, improving the cache-hit rate and optimizing the cache size. In this work, we investigate the energy efficient caching problem in backhaul-aware cellular networks jointly considering these two approaches. Note that most existing works are based on the assumption that the content catalog and popularity are static. However, in practice, content popularity is dynamic. To timely estimate the dynamic content popularity, we propose a method based on shot noise model (SNM). Then we propose a distributed caching policy to improve the cache-hit rate in such a dynamic environment. Furthermore, we analyze the tradeoff between energy efficiency and cache capacity for which an optimization is formulated. We prove its convexity and derive a closed-form optimal cache capacity for maximizing the EE. Simulation results validate the proposed scheme and show that EE can be improved with appropriate choice of cache capacity.

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Tradeoff

IEEE Transactions on Communications ◽

10.1109/tcomm.2016.2638841 ◽

2017 ◽

Vol 65 (2) ◽

pp. 806-815 ◽

Cited By ~ 42

Author(s):

Mohammad Mohammadi Amiri ◽

Deniz Gunduz

Keyword(s):

Delivery Rate ◽

Fundamental Limits ◽

Coded Caching ◽

Cache Capacity

Research on Workstation Buffer Cache Capacity Problem in Virtual Cell Scheduling Based on Blocking Flow

Machinery, Materials Science and Energy Engineering (ICMMSEE 2015) ◽

10.1142/9789814719391_0011 ◽

2015 ◽

Author(s):

Wenming Han ◽

Xinbing Zhang ◽

Mancheng Wu

Keyword(s):

Virtual Cell ◽

Cell Scheduling ◽

Buffer Cache ◽

Cache Capacity

Static energy reduction by performance linked cache capacity management in tiled CMPs

Proceedings of the 30th Annual ACM Symposium on Applied Computing - SAC '15 ◽

10.1145/2695664.2695763 ◽

2015 ◽

Cited By ~ 8

Author(s):

Hemangee K. Kapoor ◽

Shirshendu Das ◽

Shounak Chakraborty

Keyword(s):

Capacity Management ◽

Energy Reduction ◽

Static Energy ◽

Cache Capacity

Cache capacity-aware content centric networking under flash crowds

Journal of Network and Computer Applications ◽

10.1016/j.jnca.2014.06.008 ◽

2015 ◽

Vol 50 ◽

pp. 101-113 ◽

Cited By ~ 13

Author(s):

Dabin Kim ◽

Sung-Won Lee ◽

Young-Bae Ko ◽

Jae-Hoon Kim

Keyword(s):

Content Centric Networking ◽

Flash Crowds ◽

Cache Capacity

cache capacity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Locality-Aware CTA Scheduling for Gaming Applications

GPU Domain Specialization via Composable On-Package Architecture

IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection

Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks

Exploring the Relation between Monolithic 3D L1 GPU Cache Capacity and Warp Scheduling Efficiency

Energy Efficient Caching in Backhaul-Aware Cellular Networks with Dynamic Content Popularity

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Tradeoff

Research on Workstation Buffer Cache Capacity Problem in Virtual Cell Scheduling Based on Blocking Flow

Static energy reduction by performance linked cache capacity management in tiled CMPs

Cache capacity-aware content centric networking under flash crowds

Export Citation Format

cache capacityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Locality-Aware CTA Scheduling for Gaming Applications

GPU Domain Specialization via Composable On-Package Architecture

IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection

Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks

Exploring the Relation between Monolithic 3D L1 GPU Cache Capacity and Warp Scheduling Efficiency

Energy Efficient Caching in Backhaul-Aware Cellular Networks with Dynamic Content Popularity

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Tradeoff

Research on Workstation Buffer Cache Capacity Problem in Virtual Cell Scheduling Based on Blocking Flow

Static energy reduction by performance linked cache capacity management in tiled CMPs

Cache capacity-aware content centric networking under flash crowds

cache capacity
Recently Published Documents