Novel prioritized LRU circuits for shared cache in computer systems

2020 ◽  
Vol 34 (23) ◽  
pp. 2050242
Author(s):  
Yao Wang ◽  
Lijun Sun ◽  
Haibo Wang ◽  
Lavanya Gopalakrishnan ◽  
Ronald Eaton

Cache sharing technique is critical in multi-core and multi-threading systems. It potentially delays the execution of real-time applications and makes the prediction of the worst-case execution time (WCET) of real-time applications more challenging. Prioritized cache has been demonstrated as a promising approach to address this challenge. Instead of the conventional prioritized cache schemes realized at the architecture level by using cache controllers, this work presents two prioritized least recently used (LRU) cache replacement circuits that directly accomplish the prioritization inside the cache circuits, hence significantly reduces the cache access latency. The performance, hardware and power overheads due to the proposed prioritized LRU circuits are investigated based on a 65 nm CMOS technology. It shows that the proposed circuits have very low overhead compared to conventional cache circuits. The presented techniques will lead to more effective prioritized shared cache implementations and benefit the development of high-performance real-time systems.

Author(s):  
Federico Reghenzani

AbstractThe difficulties in estimating the Worst-Case Execution Time (WCET) of applications make the use of modern computing architectures limited in real-time systems. Critical embedded systems require the tasks of hard real-time applications to meet their deadlines, and formal proofs on the validity of this condition are usually required by certification authorities. In the last decade, researchers proposed the use of probabilistic measurement-based methods to estimate the WCET instead of traditional static methods. In this chapter, we summarize recent theoretical and quantitative results on the use of probabilistic approaches to estimate the WCET presented in the PhD thesis of the author, including possible exploitation scenarios, open challenges, and future directions.


High-performance VLSI systems are essential in real-time applications, in order to increase the performance of the VLSI systems, an approximate computing technique is followed where the performance of the circuit is enhanced by trading off it with a slight loss in the accuracy. These approximate circuits are used in error-tolerant applications, where output need not be accurate. This paper concentrates mainly on approximate adders, as they are major building blocks of DSP systems. The analysis of the Lower-part OR Adder for 4-bit addition and comparison of it with the precise adder i.e., Ripple Carry Adder using the mentor graphics tool in 90 nm CMOS technology are presented in this paper. Our experimental results show that there is 17%-70% savings in power dissipation, 4%-32% saving in the area, and 19%-84% savings in time due to approximate adder. As the LOA-2 and LOA-3 are performing optimally these two adders can be used for error-tolerant applications and based on the requirement LOA-2 or LOA-3 can be selected.


2021 ◽  
Author(s):  
Jessica Junia Santillo Costa ◽  
Romulo Silva de Oliveira ◽  
Luis Fernando Arcaro

2003 ◽  
Vol 4 (4) ◽  
pp. 437-455 ◽  
Author(s):  
Jakob Engblom ◽  
Andreas Ermedahl ◽  
Mikael Sjödin ◽  
Jan Gustafsson ◽  
Hans Hansson

Author(s):  
Laurent George ◽  
Pierre Courbin

In this chapter the authors focus on the problem of reconfiguring embedded real-time systems. Such reconfiguration can be decided either off-line to determine if a given application can be run on a different platform, while preserving the timeliness constraints imposed by the application, or on-line, where a reconfiguration should be done to adapt the system to the context of execution or to handle hardware or software faults. The task model considered in this chapter is the classical sporadic task model defined by a Worst Case Execution Time (WCET), a minimum inter-arrival time (also denoted the minimum Period) and a late termination deadline. The authors consider two preemptive scheduling strategies: Fixed Priority highest priority first (FP) and Earliest Deadline First (EDF). They propose a sensitivity analysis to handle reconfiguration issues. Sensitivity analysis aims at determining acceptable deviations from the specifications of a problem due to evolutions in system characteristics (reconfiguration or performance tuning). They present a state of the art for sensitivity analysis in the case of WCETs, Periods and Deadlines reconfigurations and study to what extent sensitivity analysis can be used to decide on the possibility of reconfiguring a system.


2010 ◽  
Vol 46 (2) ◽  
pp. 251-300 ◽  
Author(s):  
Heiko Falk ◽  
Paul Lokuciejewski

Abstract The current practice to design software for real-time systems is tedious. There is almost no tool support that assists the designer in automatically deriving safe bounds of the worst-case execution time (WCET) of a system during code generation and in systematically optimizing code to reduce WCET. This article presents concepts and infrastructures for WCET-aware code generation and optimization techniques for WCET reduction. All together, they help to obtain code explicitly optimized for its worst-case timing, to automate large parts of the real-time software design flow, and to reduce costs of a real-time system by allowing to use tailored hardware.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 184
Author(s):  
Alba Pedro-Zapater ◽  
Clemente Rodríguez ◽  
Juan Segarra ◽  
Rubén Gran Tejero ◽  
Víctor Viñals-Yúfera

Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix transposition on real hardware with pseudo-LRU (PLRU) caches. Our analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound. In general and high-performance computation, our analysis enables us to restrict the cache resources devoted to matrix transposition with no negative impact, in order to reduce both the energy consumption and the pollution to other computations.


2014 ◽  
Vol 651-653 ◽  
pp. 624-629
Author(s):  
Liang Liang Kong ◽  
Lin Xiang Shi ◽  
Lin Chen

Most embedded systems are real-time systems, so real-time is an important performance metric for embedded systems. The worst-case execution time (WCET) estimation for embedded programs could satisfy the requirement of hard real-time evaluation, so it is widely used in embedded systems evaluation. Based on sufficient survey on the progress of WCET estimation around the world, it proposes a new classification of WCET estimation. After introducing the principle of WCET estimation, it mainly demonstrates various types of technologies to estimate WCET and classifies them into two main streams, namely, static and dynamic WCET estimations. Finally, it shows the development of WCET analysis tools.


2016 ◽  
Vol 25 (06) ◽  
pp. 1650062 ◽  
Author(s):  
Gang Chen ◽  
Kai Huang ◽  
Long Cheng ◽  
Biao Hu ◽  
Alois Knoll

Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.


Sign in / Sign up

Export Citation Format

Share Document