Dynamic Partition of Shared Cache for Multi-Thread Application in Multi-Core System

2010 ◽  
Vol 439-440 ◽  
pp. 1587-1594
Author(s):  
Shuo Li ◽  
Feng Wu

In a chip-multiprocessor with a shared cache structure , the competing accesses from different applications degrade the system performance.The accesses degrade the performance and result in non-predicting executing time. Cache partitioning techniques can exclusively partition the shared cache among multiple competing applications. In this paper, the authors design the framework of Process priority-based Multithread Cache Partitioning(PP-MCP),a dynamic shared cache partitioning mechanism to improve the performance of multi-threaded multi-programmed workloads. The framework includes a miss rate monitor , called Application-oriented Miss Rate Monitor (AMRM) , which dynamically collect s miss rate information of multiple multi-threaded applications on different cache partitions , and process priority-based weighted cache partitioning algorithm ,which extends traditional miss rate oriented cache partition algorithms.The algorithm allocates Cache in sequence of the value of the process priority and it ensures that the highest priority process will get enough Cache space; and the applications with more threads tend to get more shared cache in order to improve t he overall system performance. Experiments show that PP-MCP has better IPC throughput and weighted speedup. Specifically , for multi-threaded multi-programmed scientific computing workloads , PP-MCP-1 improves throughput by up to 20% and on average 10 % over PP-MCP-0.

2010 ◽  
Vol 439-440 ◽  
pp. 1223-1229
Author(s):  
Shuo Li ◽  
Gao Chao Xu ◽  
Yu Shuang Dong ◽  
Feng Wu

With the development of microelectronics technology, Chip Multi-Processor (CMP) or multi-core design has become a mainstream choice for major microprocessor vendors. But in a chip-multiprocessor with a shared cache structure , the competing accesses from different applications degrade the system performance , resulting in non-optimal performance and non-predicting executing time. Cache partitioning techniques can exclusively partition the shared cache among multiple competing applications. In this paper, we first introduce the problems caused by Cache pollution in multicore processor structure; then present the different methods of Cache partitioning in multicore processor structure¬ --categorizing them based on the different metrics. And finally, we discuss some possible directions for future research in the area.


Electronics ◽  
2018 ◽  
Vol 7 (9) ◽  
pp. 172 ◽  
Author(s):  
Kai Huang ◽  
Ke Wang ◽  
Dandan Zheng ◽  
Xiaoxu Zhang ◽  
Xiaolang Yan

Cache partitioning is a successful technique for saving energy for a shared cache and all the existing studies focus on multi-program workloads running in multicore systems. In this paper, we are motivated by the fact that a multi-thread application generally executes faster than its single-thread counterpart and its cache accessing behavior is quite different. Based on this observation, we study applications running in multi-thread mode and classify data of the multi-thread applications into shared and private categories, which helps reduce the interferences among shared and private data and contributes to constructing a more efficient cache partitioning scheme. We also propose a hardware structure to support these operations. Then, an access adaptive and thread-aware cache partitioning (ATCP) scheme is proposed, which assigns separate cache portions to shared and private data to avoid the evictions caused by the conflicts from the data of different categories in the shared cache. The proposed ATCP achieves a lower energy consumption, meanwhile improving the performance of applications compared with the least recently used (LRU) managed, core-based evenly partitioning (EVEN) and utility-based cache partitioning (UCP) schemes. The experimental results show that ATCP can achieve 29.6% and 19.9% average energy savings compared with LRU and UCP schemes in a quad-core system. Moreover, the average speedup of multi-thread ATCP with respect to single-thread LRU is at 1.89.


2005 ◽  
Vol 36 (9) ◽  
pp. 1-13
Author(s):  
Takahiro Sasaki ◽  
Tomohiro Inoue ◽  
Nobuhiko Omori ◽  
Tetsuo Hironaka ◽  
Hans J. Mattausch ◽  
...  

2018 ◽  
Vol 27 (07) ◽  
pp. 1850114
Author(s):  
Cheng Qian ◽  
Libo Huang ◽  
Qi Yu ◽  
Zhiying Wang

Hardware prefetching has always been a crucial mechanism to improve processor performance. However, an efficient prefetch operation requires a guarantee of high prefetch accuracy; otherwise, it may degrade system performance. Prior studies propose an adaptive priority controlling method to make better use of prefetch accesses, which improves performance in two-level cache systems. However, this method does not perform well in a more complex memory hierarchy, such as a three-level cache system. Thus, it is still necessary to explore the efficiency of prefetch, in particular, in complex hierarchical memory systems. In this paper, we propose a composite hierarchy-aware method called CHAM, which works at the middle level cache (MLC). By using prefetch accuracy as an evaluation criterion, CHAM improves the efficiency of prefetch accesses based on (1) a dynamic adaptive prefetch control mechanism to schedule the priority and data transfer of prefetch accesses across the cache hierarchical levels in the runtime and (2) a prefetch efficiency-oriented hybrid cache replacement policy to select the most suitable policy. To demonstrate its effectiveness, we have performed extensive experiments on 28 benchmarks from SPEC CPU2006 and two benchmarks from BioBench. Compared with a similar adaptive method, CHAM improves the MLC demand hit rate by 9.2% and an improvement of 1.4% in system performance on average in a single-core system. On a 4-core system, CHAM improves the demand hit rate by 33.06% and improves system performance by 10.1% on average.


Sign in / Sign up

Export Citation Format

Share Document