HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES

2011 ◽  
Vol 21 (01) ◽  
pp. 85-106 ◽  
Author(s):  
MARCO A. Z. ALVES ◽  
HENRIQUE C. FREITAS ◽  
PHILIPPE O. A. NAVAUX

Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.

2019 ◽  
Vol 98 ◽  
pp. 424-433 ◽  
Author(s):  
Pengfei Yang ◽  
Quan Wang ◽  
Hongwei Ye ◽  
Zhiqiang Zhang

Author(s):  
Lavanya Dhanesh ◽  
P. Murugesan

Scheduling of tasks based on real time requirement is a major issue in the heterogeneous multicore systemsfor micro-grid power management . Heterogeneous multicore processor schedules the serial tasks in the high performance core and parallel tasks are executed on the low performance cores. The aim of this paper is to implement a scheduling algorithm based on fuzzy logic for heterogeneous multicore processor for effective micro-grid application. Real – time tasks generally have different execution time and dead line. The main idea is to use two fuzzy logic based scheduling algorithm, first is to assign priority based on execution time and deadline of the task. Second , the task which has assigned higher priority get allotted for execution in high performance core and remaining tasks which are assigned low priority get allotted in low performance cores. The main objective of this scheduling algorithm is to increase the throughput and to improve CPU utilization there by reducing the overall power consumption of the micro-grid power management systems. Test cases with different task execution time and deadline were generated to evaluate the algorithms using  MATLAB software.


2016 ◽  
Vol 6 (6) ◽  
pp. 1241-1244 ◽  
Author(s):  
M. Faridi Masouleh ◽  
M. A. Afshar Kazemi ◽  
M. Alborzi ◽  
A. Toloie Eshlaghy

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.


2005 ◽  
Vol 36 (9) ◽  
pp. 1-13
Author(s):  
Takahiro Sasaki ◽  
Tomohiro Inoue ◽  
Nobuhiko Omori ◽  
Tetsuo Hironaka ◽  
Hans J. Mattausch ◽  
...  

2005 ◽  
Vol 14 (03) ◽  
pp. 605-617 ◽  
Author(s):  
SUNG WOO CHUNG ◽  
HYONG-SHIK KIM ◽  
CHU SHIK JHON

In scalable CC-NUMA multiprocessors, it is crucial to reduce the average memory access time. For applications where the second-level (L2) cache is large enough, we propose a split L2 cache to utilize the surplus space. The split L2 cache is composed of a traditional LRU cache and an RVC (Remote Victim Cache) which only stores the data of remote memory address range. Thus, it reduces the average L2 cache miss time by keeping remote blocks that would be discarded otherwise. Though the split cache does not reduce the miss rates, it is observed to reduce the total execution time effectively by up to 27%.It even outperform an LRU cache of double size.


2016 ◽  
Vol 25 (06) ◽  
pp. 1650062 ◽  
Author(s):  
Gang Chen ◽  
Kai Huang ◽  
Long Cheng ◽  
Biao Hu ◽  
Alois Knoll

Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.


Sign in / Sign up

Export Citation Format

Share Document