HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES

Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.

Download Full-text

High-Speed Optical Cache Memory as Single-Level Shared Cache in Chip-Multiprocessor Architectures

2015 Workshop on Exploiting Silicon Photonics for Energy-Efficient High Performance Computing ◽

10.1109/siphotonics.2015.10 ◽

2015 ◽

Cited By ~ 1

Author(s):

Pavlos Maniotis ◽

Savvas Gitzenis ◽

Leandros Tassiulas ◽

Nikos Pleros

Keyword(s):

High Speed ◽

Chip Multiprocessor ◽

Cache Memory ◽

Single Level ◽

Shared Cache ◽

Multiprocessor Architectures

Download Full-text

Avoiding common scalability pitfalls in shared-cache chip multiprocessor design

2019 International Conference on Engineering and Telecommunication (EnT) ◽

10.1109/ent47717.2019.9030579 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yuri A. Nedbailo

Keyword(s):

Chip Multiprocessor ◽

Shared Cache

Download Full-text

On cache memory hierarchy for Chip-Multiprocessor

ACM SIGARCH Computer Architecture News ◽

10.1145/773365.773370 ◽

2003 ◽

Vol 31 (1) ◽

pp. 39-48 ◽

Cited By ~ 4

Author(s):

Mohamed M. Zahran

Keyword(s):

Memory Hierarchy ◽

Chip Multiprocessor ◽

Cache Memory

Download Full-text

Worst-Case Execution Time Analysis for Many-Core Architectures with NoC

Lecture Notes in Computer Science - Formal Modeling and Analysis of Timed Systems ◽

10.1007/978-3-319-44878-7_13 ◽

2016 ◽

pp. 211-227 ◽

Cited By ~ 5

Author(s):

Stefanos Skalistis ◽

Alena Simalatsar

Keyword(s):

Execution Time ◽

Time Analysis ◽

Worst Case ◽

Worst Case Execution Time ◽

Many Core

Download Full-text

Partially shared cache and adaptive replacement algorithm for NoC-based many-core systems

Journal of Systems Architecture ◽

10.1016/j.sysarc.2019.05.002 ◽

2019 ◽

Vol 98 ◽

pp. 424-433 ◽

Cited By ~ 1

Author(s):

Pengfei Yang ◽

Quan Wang ◽

Hongwei Ye ◽

Zhiqiang Zhang

Keyword(s):

Shared Cache ◽

Replacement Algorithm ◽

Many Core

Download Full-text

A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multicore Processor with Fuzzy Logic Technique For Micro-grid Power Management

International Journal of Power Electronics and Drive Systems (IJPEDS) ◽

10.11591/ijpeds.v9.i1.pp80-88 ◽

2018 ◽

Vol 9 (1) ◽

pp. 80

Author(s):

Lavanya Dhanesh ◽

P. Murugesan

Keyword(s):

Fuzzy Logic ◽

Real Time ◽

Power Management ◽

Execution Time ◽

High Performance ◽

Scheduling Algorithm ◽

Multicore Processor ◽

Heterogeneous Multicore ◽

Micro Grid ◽

Low Performance

Scheduling of tasks based on real time requirement is a major issue in the heterogeneous multicore systemsfor micro-grid power management . Heterogeneous multicore processor schedules the serial tasks in the high performance core and parallel tasks are executed on the low performance cores. The aim of this paper is to implement a scheduling algorithm based on fuzzy logic for heterogeneous multicore processor for effective micro-grid application. Real – time tasks generally have different execution time and dead line. The main idea is to use two fuzzy logic based scheduling algorithm, first is to assign priority based on execution time and deadline of the task. Second , the task which has assigned higher priority get allotted for execution in high performance core and remaining tasks which are assigned low priority get allotted in low performance cores. The main objective of this scheduling algorithm is to increase the throughput and to improve CPU utilization there by reducing the overall power consumption of the micro-grid power management systems. Test cases with different task execution time and deadline were generated to evaluate the algorithms using MATLAB software.

Download Full-text

Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.849 ◽

2016 ◽

Vol 6 (6) ◽

pp. 1241-1244 ◽

Cited By ~ 2

Author(s):

M. Faridi Masouleh ◽

M. A. Afshar Kazemi ◽

M. Alborzi ◽

A. Toloie Eshlaghy

Keyword(s):

Big Data ◽

Data Warehouse ◽

Cache Memory ◽

Data Warehouses ◽

Data Bases ◽

Shared Cache ◽

Optimization Management ◽

Implementation Time ◽

The Creation ◽

Management Improvement

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

Download Full-text

Chip size and performance evaluations of shared cache for on-chip multiprocessor

Systems and Computers in Japan ◽

10.1002/scj.20244 ◽

2005 ◽

Vol 36 (9) ◽

pp. 1-13

Author(s):

Takahiro Sasaki ◽

Tomohiro Inoue ◽

Nobuhiko Omori ◽

Tetsuo Hironaka ◽

Hans J. Mattausch ◽

...

Keyword(s):

Performance Evaluations ◽

Chip Multiprocessor ◽

Shared Cache ◽

Chip Size ◽

And Performance ◽

On Chip

Download Full-text

A SPLIT L2 DATA CACHE FOR SCALABLE CC-NUMA MULTIPROCESSORS

Journal of Circuits System and Computers ◽

10.1142/s021812660500243x ◽

2005 ◽

Vol 14 (03) ◽

pp. 605-617 ◽

Cited By ~ 2

Author(s):

SUNG WOO CHUNG ◽

HYONG-SHIK KIM ◽

CHU SHIK JHON

Keyword(s):

Execution Time ◽

Memory Access ◽

Remote Memory ◽

Access Time ◽

Data Cache ◽

Total Execution Time ◽

Memory Address ◽

L2 Cache ◽

Cache Miss

In scalable CC-NUMA multiprocessors, it is crucial to reduce the average memory access time. For applications where the second-level (L2) cache is large enough, we propose a split L2 cache to utilize the surplus space. The split L2 cache is composed of a traditional LRU cache and an RVC (Remote Victim Cache) which only stores the data of remote memory address range. Thus, it reduces the average L2 cache miss time by keeping remote blocks that would be discarded otherwise. Though the split cache does not reduce the miss rates, it is observed to reduce the total execution time effectively by up to 27%.It even outperform an LRU cache of double size.

Download Full-text

Dynamic Partitioned Cache Memory for Real-Time MPSoCs with Mixed Criticality

Journal of Circuits System and Computers ◽

10.1142/s0218126616500626 ◽

2016 ◽

Vol 25 (06) ◽

pp. 1650062 ◽

Cited By ~ 1

Author(s):

Gang Chen ◽

Kai Huang ◽

Long Cheng ◽

Biao Hu ◽

Alois Knoll

Keyword(s):

Real Time ◽

Cache Memory ◽

Time System ◽

Worst Case ◽

Real Time System ◽

Average Performance ◽

Shared Cache ◽

Worst Case Execution Time ◽

The One ◽

Major Factors

Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.

Download Full-text