Bounding the execution time of parallel applications on unrelated multiprocessors

AbstractHeterogeneous multiprocessors can offer high performance at low energy expenditures. However, to be able to use them in hard real-time systems, timing guarantees need to be provided, and the main challenge is to determine the worst-case schedule length (also known as makespan) of an application. Previous works that estimate the makespan focus mainly on the independent-task application model or the related multiprocessor model that limits the applicability of the makespan. On the other hand, the directed acyclic graph (DAG) application model and the unrelated multiprocessor model are general and can cover most of today’s platforms and applications. In this work, we propose a simple work-conserving scheduling method of the tasks in a DAG and two new approaches to finding the makespan. A set of representative OpenMP task-based parallel applications from the BOTS benchmark suite and synthetic DAGs are used to evaluate the proposed method. Based on the empirical results, the proposed approach calculates the makespan close to the exhaustive method and with low pessimism compared to a lower bound of the actual makespan calculation.

Download Full-text

Reliability-Aware Proactive Energy Management in Hard Real-Time Systems

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/jaras.2010100101 ◽

2010 ◽

Vol 1 (4) ◽

pp. 1-11

Author(s):

Satyakiran Munaga ◽

Francky Catthoor

Keyword(s):

Energy Management ◽

Ad Hoc ◽

Failure Criteria ◽

Worst Case ◽

Time Control ◽

Run Time ◽

Cost Penalty ◽

Hard Real Time ◽

The Cost ◽

Time Systems

Advanced technologies such as sub-45nm CMOS and 3D integration are known to have more accelerated and increased number of reliability failure mechanisms. Classical reliability assessment methodology, which assumes ad-hoc failure criteria and worst-case for all influencing dynamic aspects, is no longer viable in these technologies. In this paper, the authors advocate that managing temperature and reliability at run-time is necessary to overcome this reliability-wall without incurring significant cost penalty. Nonlinear nature of modern systems, however, makes the run-time control very challenging. The authors suggest that full cost-consciousness requires a truly proactive controller that can efficiently manage system slack with future in perspective. This paper introduces the concept of “gas-pedal,” which enhances the effectiveness of the proactive controller in minimizing the cost without sacrificing the hard guarantees required by the constraints. Reliability-aware dynamic energy management of a processor running AVC motion compensation task is used as a motivational case study to illustrate the proposed concepts.

Download Full-text

Best-case analysis for improving the worst-case schedulability test for distributed hard real-time systems

Proceeding. 10th EUROMICRO Workshop on Real-Time Systems (Cat. No.98EX168) ◽

10.1109/emwrts.1998.684945 ◽

2002 ◽

Cited By ~ 25

Author(s):

J.C. Palencia Gutierrez ◽

J.J. Gutierrez Garcia ◽

M. Gonzalez Harbour

Keyword(s):

Real Time ◽

Case Analysis ◽

Real Time Systems ◽

Worst Case ◽

Schedulability Test ◽

Hard Real Time ◽

Time Systems

Download Full-text

Design and Implementation of the Multimedia Operation Mechanism for Responsive Multithreaded Processor

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2005.p0456 ◽

2005 ◽

Vol 17 (4) ◽

pp. 456-462 ◽

Cited By ~ 3

Author(s):

Tsutomu Itou ◽

◽

Nobuyuki Yamasaki ◽

Keyword(s):

Real Time ◽

High Performance ◽

Multimedia Data ◽

Multimedia Processing ◽

Operation Mechanism ◽

Vector Operation ◽

Multithreaded Processor ◽

Hard Real Time ◽

Time Systems ◽

Processing Architecture

Responsive Multithreaded (RMT) Processor is designed for distributed real-time systems. This paper focuses on the multimedia processing architecture of RMT Processor. Multimedia processing requires high-throughput calculation for bulky data processing. RMT Processor architecture is based on eight-way prioritized simultaneous multithreading, which executes each thread in order of priority. Since the priority of hard real-time threads is higher than that of multimedia processing threads, instruction issue slots used by the multimedia processing threads are few in RMT Processor when hard real-time threads are executed simultaneously. Therefore multimedia processing threads need to utilize instruction issue slots effectively to achieve high performance. We have designed a novel vector operation mechanism to process multimedia data efficiently in parallel. Because the same instructions are iterated in multimedia processing, the compound operation mechanism is designed to calculate more data per instruction in multimedia processing.

Download Full-text

Evaluation of a Connectionless Technique for System-on-Chip Interconnection

Journal of Circuits System and Computers ◽

10.1142/s0218126616300051 ◽

2016 ◽

Vol 25 (10) ◽

pp. 1630005 ◽

Cited By ~ 2

Author(s):

Marcelo Daniel Berejuck ◽

Antônio A. Fröhlich

Keyword(s):

Real Time ◽

High Performance ◽

Communication Channel ◽

Resource Reservation ◽

Multimedia Applications ◽

Worst Case ◽

Average Latency ◽

On Chip ◽

Hard Real Time

We present the design and evaluation of a high-performance network-on-chip (NoC) focused on telecommunication and multimedia applications that tolerate latency and bandwidth variations. The design is based on a connectionless strategy in which flits from different communication flows are interleaved in the same communication channel. Each flit carries routing information that is used by routers to perform arbitration and scheduling of the corresponding output ports in order to balance channel utilization. In order to compare our approach with others, we introduce an analytic model for the worst-case latency (WCL) of our NoC and recall those of related approaches. Analytic comparisons and experimental data show that our approach keeps average WCL lower for variable-bit-rate multimedia applications than a network based on resource reservation. For these applications, the overall throughput is larger than that of networks that perform resource reservation. A case study based on the proposed NoC shows that the average latency was 28% lower than the WCL expected for the experiment. Indeed, hard real-time flows designed considering the absolute WCL of the network will always meet the requirements of the associated hard real-time tasks, so no deadline can be lost due to network contention.

Download Full-text

Novel prioritized LRU circuits for shared cache in computer systems

Modern Physics Letters B ◽

10.1142/s0217984920502425 ◽

2020 ◽

Vol 34 (23) ◽

pp. 2050242

Author(s):

Yao Wang ◽

Lijun Sun ◽

Haibo Wang ◽

Lavanya Gopalakrishnan ◽

Ronald Eaton

Keyword(s):

Real Time ◽

High Performance ◽

Cmos Technology ◽

Cache Replacement ◽

Worst Case ◽

Shared Cache ◽

Cache Access ◽

Worst Case Execution Time ◽

Real Time Applications ◽

Time Systems

Cache sharing technique is critical in multi-core and multi-threading systems. It potentially delays the execution of real-time applications and makes the prediction of the worst-case execution time (WCET) of real-time applications more challenging. Prioritized cache has been demonstrated as a promising approach to address this challenge. Instead of the conventional prioritized cache schemes realized at the architecture level by using cache controllers, this work presents two prioritized least recently used (LRU) cache replacement circuits that directly accomplish the prioritization inside the cache circuits, hence significantly reduces the cache access latency. The performance, hardware and power overheads due to the proposed prioritized LRU circuits are investigated based on a 65 nm CMOS technology. It shows that the proposed circuits have very low overhead compared to conventional cache circuits. The presented techniques will lead to more effective prioritized shared cache implementations and benefit the development of high-performance real-time systems.

Download Full-text

Interactive WCET Prediction with Warning for Timeout Risk

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500124 ◽

2017 ◽

Vol 31 (05) ◽

pp. 1750012 ◽

Cited By ~ 5

Author(s):

Fanqi Meng ◽

Xiaohong Su ◽

Zhaoyang Qu

Keyword(s):

Execution Time ◽

Control Flow ◽

Real Time Systems ◽

Wcet Analysis ◽

Running Speed ◽

Worst Case ◽

Worst Case Execution Time ◽

Level Model ◽

Hard Real Time ◽

Time Systems

Worst case execution time (WCET) analysis is essential for exposing timeliness defects when developing hard real-time systems. However, it is too late to fix timeliness defects cheaply since developers generally perform WCET analysis in a final verification phase. To help developers quickly identify real timeliness defects in an early programming phase, a novel interactive WCET prediction with warning for timeout risk is proposed. The novelty is that the approach not only fast estimates WCET based on a control flow tree (CFT), but also assesses the estimated WCET with a trusted level by a lightweight false path analysis. According to the trusted levels, corresponding warnings will be triggered once the estimated WCET exceeds a preset safe threshold. Hence developers can identify real timeliness defects more timely and efficiently. To this end, we first analyze the reasons of the overestimation of CFT-based WCET calculation; then we propose a trusted level model of timeout risks; for recognizing the structural patterns of timeout risks, we develop a risk data counting algorithm; and we also give some tactics for applying our approach more effectively. Experimental results show that our approach has almost the same running speed compared with the fast and interactive WCET analysis, but it saves more time in identifying real timeliness defects.

Download Full-text

Ideal and Predictable Hit Ratio for Matrix Transposition in Data Caches

Mathematics ◽

10.3390/math8020184 ◽

2020 ◽

Vol 8 (2) ◽

pp. 184

Author(s):

Alba Pedro-Zapater ◽

Clemente Rodríguez ◽

Juan Segarra ◽

Rubén Gran Tejero ◽

Víctor Viñals-Yúfera

Keyword(s):

Energy Consumption ◽

Real Time ◽

High Performance ◽

Negative Impact ◽

Data Cache ◽

Real Time Systems ◽

Worst Case ◽

Tile Size ◽

Matrix Transposition ◽

Time Systems

Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix transposition on real hardware with pseudo-LRU (PLRU) caches. Our analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound. In general and high-performance computation, our analysis enables us to restrict the cache resources devoted to matrix transposition with no negative impact, in order to reduce both the energy consumption and the pollution to other computations.

Download Full-text

An Overview of Worst-Case Execution Time Estimation for Embedded Programs

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.651-653.624 ◽

2014 ◽

Vol 651-653 ◽

pp. 624-629

Author(s):

Liang Liang Kong ◽

Lin Xiang Shi ◽

Lin Chen

Keyword(s):

Embedded Systems ◽

Real Time ◽

Execution Time ◽

Time Estimation ◽

Worst Case ◽

Performance Metric ◽

Worst Case Execution Time ◽

Hard Real Time ◽

Time Systems

Most embedded systems are real-time systems, so real-time is an important performance metric for embedded systems. The worst-case execution time (WCET) estimation for embedded programs could satisfy the requirement of hard real-time evaluation, so it is widely used in embedded systems evaluation. Based on sufficient survey on the progress of WCET estimation around the world, it proposes a new classification of WCET estimation. After introducing the principle of WCET estimation, it mainly demonstrates various types of technologies to estimate WCET and classifies them into two main streams, namely, static and dynamic WCET estimations. Finally, it shows the development of WCET analysis tools.

Download Full-text

PBench: A Parallel, Real-Time Benchmark Suite

Academic Perspective Procedia ◽

10.33793/acperpro.01.01.37 ◽

2018 ◽

Vol 1 (1) ◽

pp. 178-186 ◽

Cited By ~ 1

Author(s):

Sevil Serttaş ◽

Veysel Harun Şahin

Keyword(s):

Real Time ◽

Execution Time ◽

Matrix Multiplication ◽

Multiprocessor Systems ◽

Time Analysis ◽

Worst Case ◽

Analysis Methods ◽

Worst Case Execution Time ◽

Benchmark Suite ◽

Time Systems

Real-time systems are widely used from the automotive industry to the aerospace industry. The scientists, researchers, and engineers who develop real-time platforms, worst-case execution time analysis methods and tools need to compare their solutions to alternatives. For this purpose, they use benchmark applications. Today many of our computing systems are multicore and/or multiprocessor systems. Therefore, to be able to compare the effectiveness of real-time platforms, worst-case execution time analysis methods and tools, the research community need multi-threaded benchmark applications which scale on multicore and/or multiprocessor systems. In this paper, we present the first version of PBench, a parallel, real-time benchmark suite. PBench includes different types of multi-threaded applications which implement various algorithms from searching to sorting, matrix multiplication to probability distribution calculation. In addition, PBench provides single-threaded versions of all programs to allow side by side comparisons.

Download Full-text

Beyond the Traditional Analyses and Resource Management in Real-Time Systems

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_6 ◽

2022 ◽

pp. 67-77

Author(s):

Federico Reghenzani

Keyword(s):

Real Time ◽

Real Time Systems ◽

Formal Proofs ◽

Worst Case ◽

Worst Case Execution Time ◽

Quantitative Results ◽

Real Time Applications ◽

Hard Real Time ◽

Phd Thesis ◽

Time Systems

AbstractThe difficulties in estimating the Worst-Case Execution Time (WCET) of applications make the use of modern computing architectures limited in real-time systems. Critical embedded systems require the tasks of hard real-time applications to meet their deadlines, and formal proofs on the validity of this condition are usually required by certification authorities. In the last decade, researchers proposed the use of probabilistic measurement-based methods to estimate the WCET instead of traditional static methods. In this chapter, we summarize recent theoretical and quantitative results on the use of probabilistic approaches to estimate the WCET presented in the PhD thesis of the author, including possible exploitation scenarios, open challenges, and future directions.

Download Full-text