Dynamic Partition of Shared Cache for Multi-Thread Application in Multi-Core System

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1587 ◽

2010 ◽

Vol 439-440 ◽

pp. 1587-1594

Author(s):

Shuo Li ◽

Feng Wu

Keyword(s):

System Performance ◽

Scientific Computing ◽

Chip Multiprocessor ◽

Core System ◽

Cache Partitioning ◽

Shared Cache ◽

Rate Information ◽

Partitioning Algorithm ◽

Cache Partition

In a chip-multiprocessor with a shared cache structure , the competing accesses from different applications degrade the system performance.The accesses degrade the performance and result in non-predicting executing time. Cache partitioning techniques can exclusively partition the shared cache among multiple competing applications. In this paper, the authors design the framework of Process priority-based Multithread Cache Partitioning(PP-MCP),a dynamic shared cache partitioning mechanism to improve the performance of multi-threaded multi-programmed workloads. The framework includes a miss rate monitor , called Application-oriented Miss Rate Monitor (AMRM) , which dynamically collect s miss rate information of multiple multi-threaded applications on different cache partitions , and process priority-based weighted cache partitioning algorithm ,which extends traditional miss rate oriented cache partition algorithms.The algorithm allocates Cache in sequence of the value of the process priority and it ensures that the highest priority process will get enough Cache space; and the applications with more threads tend to get more shared cache in order to improve t he overall system performance. Experiments show that PP-MCP has better IPC throughput and weighted speedup. Specifically , for multi-threaded multi-programmed scientific computing workloads , PP-MCP-1 improves throughput by up to 20% and on average 10 % over PP-MCP-0.

Download Full-text

The Review of Cache Partitioning in Multi-Core Processor

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1223 ◽

2010 ◽

Vol 439-440 ◽

pp. 1223-1229

Author(s):

Shuo Li ◽

Gao Chao Xu ◽

Yu Shuang Dong ◽

Feng Wu

Keyword(s):

System Performance ◽

Optimal Performance ◽

Chip Multiprocessor ◽

Future Research ◽

Multicore Processor ◽

Cache Partitioning ◽

Shared Cache ◽

Microelectronics Technology ◽

Multi Core Processor ◽

Cache Pollution

With the development of microelectronics technology, Chip Multi-Processor (CMP) or multi-core design has become a mainstream choice for major microprocessor vendors. But in a chip-multiprocessor with a shared cache structure , the competing accesses from different applications degrade the system performance , resulting in non-optimal performance and non-predicting executing time. Cache partitioning techniques can exclusively partition the shared cache among multiple competing applications. In this paper, we first introduce the problems caused by Cache pollution in multicore processor structure; then present the different methods of Cache partitioning in multicore processor structure¬ －－categorizing them based on the different metrics. And finally, we discuss some possible directions for future research in the area.

Download Full-text

Access Adaptive and Thread-Aware Cache Partitioning in Multicore Systems

Electronics ◽

10.3390/electronics7090172 ◽

2018 ◽

Vol 7 (9) ◽

pp. 172 ◽

Cited By ~ 2

Author(s):

Kai Huang ◽

Ke Wang ◽

Dandan Zheng ◽

Xiaoxu Zhang ◽

Xiaolang Yan

Keyword(s):

Lower Energy ◽

Energy Savings ◽

Average Energy ◽

Multicore Systems ◽

Core System ◽

Cache Partitioning ◽

Shared Cache ◽

Single Thread ◽

Private Data ◽

Successful Technique

Cache partitioning is a successful technique for saving energy for a shared cache and all the existing studies focus on multi-program workloads running in multicore systems. In this paper, we are motivated by the fact that a multi-thread application generally executes faster than its single-thread counterpart and its cache accessing behavior is quite different. Based on this observation, we study applications running in multi-thread mode and classify data of the multi-thread applications into shared and private categories, which helps reduce the interferences among shared and private data and contributes to constructing a more efficient cache partitioning scheme. We also propose a hardware structure to support these operations. Then, an access adaptive and thread-aware cache partitioning (ATCP) scheme is proposed, which assigns separate cache portions to shared and private data to avoid the evictions caused by the conflicts from the data of different categories in the shared cache. The proposed ATCP achieves a lower energy consumption, meanwhile improving the performance of applications compared with the least recently used (LRU) managed, core-based evenly partitioning (EVEN) and utility-based cache partitioning (UCP) schemes. The experimental results show that ATCP can achieve 29.6% and 19.9% average energy savings compared with LRU and UCP schemes in a quad-core system. Moreover, the average speedup of multi-thread ATCP with respect to single-thread LRU is at 1.89.

Download Full-text

Distributed Reinforcement Learning for Power Limited Many-Core System Performance Optimization

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 ◽

10.7873/date.2015.0992 ◽

2015 ◽

Cited By ~ 37

Author(s):

Zhuo Chen ◽

Diana Marculescu

Keyword(s):

Reinforcement Learning ◽

Performance Optimization ◽

System Performance ◽

Core System ◽

Many Core ◽

Distributed Reinforcement

Download Full-text

Avoiding common scalability pitfalls in shared-cache chip multiprocessor design

2019 International Conference on Engineering and Telecommunication (EnT) ◽

10.1109/ent47717.2019.9030579 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yuri A. Nedbailo

Keyword(s):

Chip Multiprocessor ◽

Shared Cache

Download Full-text

Shared-Cache Simulation for Multi-core System with LRU2-MRU Collaborative Cache Replacement Algorithm

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing ◽

10.1109/snpd.2012.112 ◽

2012 ◽

Author(s):

Shan Ding ◽

Shiya Lui ◽

Yuanyuan Li

Keyword(s):

Cache Replacement ◽

Core System ◽

Shared Cache ◽

Replacement Algorithm ◽

Cache Simulation

Download Full-text

A GA-based low-power Cache Partitioning algorithm for multi-programmed systems

IEEE Conference Anthology ◽

10.1109/anthology.2013.6784713 ◽

2013 ◽

Author(s):

Wei Xiong ◽

Jian-ping Yin ◽

Jun Long ◽

Guang Suo

Keyword(s):

Low Power ◽

Cache Partitioning ◽

Partitioning Algorithm

Download Full-text

A Weighted Dynamic Shared Cache Partitioning Mechanism for Multi-Threaded Multi-Programmed Workloads

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2008.01938 ◽

2009 ◽

Vol 31 (11) ◽

pp. 1938-1947 ◽

Cited By ~ 4

Author(s):

Guang SUO ◽

Xue-Jun YANG

Keyword(s):

Cache Partitioning ◽

Shared Cache

Download Full-text

Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management

2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) ◽

10.1109/isvlsi.2017.89 ◽

2017 ◽

Cited By ~ 1

Author(s):

Vasilios Kelefouras ◽

Georgios Keramidas ◽

Nikolaos Voros

Keyword(s):

Cache Management ◽

Cache Partitioning ◽

Loop Tiling ◽

Shared Cache

Download Full-text

Chip size and performance evaluations of shared cache for on-chip multiprocessor

Systems and Computers in Japan ◽

10.1002/scj.20244 ◽

2005 ◽

Vol 36 (9) ◽

pp. 1-13

Author(s):

Takahiro Sasaki ◽

Tomohiro Inoue ◽

Nobuhiko Omori ◽

Tetsuo Hironaka ◽

Hans J. Mattausch ◽

...

Keyword(s):

Performance Evaluations ◽

Chip Multiprocessor ◽

Shared Cache ◽

Chip Size ◽

And Performance ◽

On Chip

Download Full-text

CHAM: Improving Prefetch Efficiency Using a Composite Hierarchy-Aware Method

Journal of Circuits System and Computers ◽

10.1142/s0218126618501141 ◽

2018 ◽

Vol 27 (07) ◽

pp. 1850114

Author(s):

Cheng Qian ◽

Libo Huang ◽

Qi Yu ◽

Zhiying Wang

Keyword(s):

System Performance ◽

Data Transfer ◽

Memory Systems ◽

Adaptive Method ◽

Middle Level ◽

Replacement Policy ◽

Cache Replacement ◽

Core System ◽

Hit Rate ◽

Processor Performance

Hardware prefetching has always been a crucial mechanism to improve processor performance. However, an efficient prefetch operation requires a guarantee of high prefetch accuracy; otherwise, it may degrade system performance. Prior studies propose an adaptive priority controlling method to make better use of prefetch accesses, which improves performance in two-level cache systems. However, this method does not perform well in a more complex memory hierarchy, such as a three-level cache system. Thus, it is still necessary to explore the efficiency of prefetch, in particular, in complex hierarchical memory systems. In this paper, we propose a composite hierarchy-aware method called CHAM, which works at the middle level cache (MLC). By using prefetch accuracy as an evaluation criterion, CHAM improves the efficiency of prefetch accesses based on (1) a dynamic adaptive prefetch control mechanism to schedule the priority and data transfer of prefetch accesses across the cache hierarchical levels in the runtime and (2) a prefetch efficiency-oriented hybrid cache replacement policy to select the most suitable policy. To demonstrate its effectiveness, we have performed extensive experiments on 28 benchmarks from SPEC CPU2006 and two benchmarks from BioBench. Compared with a similar adaptive method, CHAM improves the MLC demand hit rate by 9.2% and an improvement of 1.4% in system performance on average in a single-core system. On a 4-core system, CHAM improves the demand hit rate by 33.06% and improves system performance by 10.1% on average.

Download Full-text