scholarly journals Bounding the execution time of parallel applications on unrelated multiprocessors

2021 ◽  
Author(s):  
Petros Voudouris ◽  
Per Stenström ◽  
Risat Pathan

AbstractHeterogeneous multiprocessors can offer high performance at low energy expenditures. However, to be able to use them in hard real-time systems, timing guarantees need to be provided, and the main challenge is to determine the worst-case schedule length (also known as makespan) of an application. Previous works that estimate the makespan focus mainly on the independent-task application model or the related multiprocessor model that limits the applicability of the makespan. On the other hand, the directed acyclic graph (DAG) application model and the unrelated multiprocessor model are general and can cover most of today’s platforms and applications. In this work, we propose a simple work-conserving scheduling method of the tasks in a DAG and two new approaches to finding the makespan. A set of representative OpenMP task-based parallel applications from the BOTS benchmark suite and synthetic DAGs are used to evaluate the proposed method. Based on the empirical results, the proposed approach calculates the makespan close to the exhaustive method and with low pessimism compared to a lower bound of the actual makespan calculation.

Author(s):  
Satyakiran Munaga ◽  
Francky Catthoor

Advanced technologies such as sub-45nm CMOS and 3D integration are known to have more accelerated and increased number of reliability failure mechanisms. Classical reliability assessment methodology, which assumes ad-hoc failure criteria and worst-case for all influencing dynamic aspects, is no longer viable in these technologies. In this paper, the authors advocate that managing temperature and reliability at run-time is necessary to overcome this reliability-wall without incurring significant cost penalty. Nonlinear nature of modern systems, however, makes the run-time control very challenging. The authors suggest that full cost-consciousness requires a truly proactive controller that can efficiently manage system slack with future in perspective. This paper introduces the concept of “gas-pedal,” which enhances the effectiveness of the proactive controller in minimizing the cost without sacrificing the hard guarantees required by the constraints. Reliability-aware dynamic energy management of a processor running AVC motion compensation task is used as a motivational case study to illustrate the proposed concepts.


2005 ◽  
Vol 17 (4) ◽  
pp. 456-462 ◽  
Author(s):  
Tsutomu Itou ◽  
◽  
Nobuyuki Yamasaki ◽  

<I>Responsive Multithreaded (RMT) Processor</I> is designed for distributed real-time systems. This paper focuses on the multimedia processing architecture of <I>RMT Processor</I>. Multimedia processing requires high-throughput calculation for bulky data processing. <I>RMT Processor</I> architecture is based on eight-way prioritized simultaneous multithreading, which executes each thread in order of priority. Since the priority of hard real-time threads is higher than that of multimedia processing threads, instruction issue slots used by the multimedia processing threads are few in <I>RMT Processor</I> when hard real-time threads are executed simultaneously. Therefore multimedia processing threads need to utilize instruction issue slots effectively to achieve high performance. We have designed a novel vector operation mechanism to process multimedia data efficiently in parallel. Because the same instructions are iterated in multimedia processing, the compound operation mechanism is designed to calculate more data per instruction in multimedia processing.


2016 ◽  
Vol 25 (10) ◽  
pp. 1630005 ◽  
Author(s):  
Marcelo Daniel Berejuck ◽  
Antônio A. Fröhlich

We present the design and evaluation of a high-performance network-on-chip (NoC) focused on telecommunication and multimedia applications that tolerate latency and bandwidth variations. The design is based on a connectionless strategy in which flits from different communication flows are interleaved in the same communication channel. Each flit carries routing information that is used by routers to perform arbitration and scheduling of the corresponding output ports in order to balance channel utilization. In order to compare our approach with others, we introduce an analytic model for the worst-case latency (WCL) of our NoC and recall those of related approaches. Analytic comparisons and experimental data show that our approach keeps average WCL lower for variable-bit-rate multimedia applications than a network based on resource reservation. For these applications, the overall throughput is larger than that of networks that perform resource reservation. A case study based on the proposed NoC shows that the average latency was 28% lower than the WCL expected for the experiment. Indeed, hard real-time flows designed considering the absolute WCL of the network will always meet the requirements of the associated hard real-time tasks, so no deadline can be lost due to network contention.


2020 ◽  
Vol 34 (23) ◽  
pp. 2050242
Author(s):  
Yao Wang ◽  
Lijun Sun ◽  
Haibo Wang ◽  
Lavanya Gopalakrishnan ◽  
Ronald Eaton

Cache sharing technique is critical in multi-core and multi-threading systems. It potentially delays the execution of real-time applications and makes the prediction of the worst-case execution time (WCET) of real-time applications more challenging. Prioritized cache has been demonstrated as a promising approach to address this challenge. Instead of the conventional prioritized cache schemes realized at the architecture level by using cache controllers, this work presents two prioritized least recently used (LRU) cache replacement circuits that directly accomplish the prioritization inside the cache circuits, hence significantly reduces the cache access latency. The performance, hardware and power overheads due to the proposed prioritized LRU circuits are investigated based on a 65 nm CMOS technology. It shows that the proposed circuits have very low overhead compared to conventional cache circuits. The presented techniques will lead to more effective prioritized shared cache implementations and benefit the development of high-performance real-time systems.


Author(s):  
Fanqi Meng ◽  
Xiaohong Su ◽  
Zhaoyang Qu

Worst case execution time (WCET) analysis is essential for exposing timeliness defects when developing hard real-time systems. However, it is too late to fix timeliness defects cheaply since developers generally perform WCET analysis in a final verification phase. To help developers quickly identify real timeliness defects in an early programming phase, a novel interactive WCET prediction with warning for timeout risk is proposed. The novelty is that the approach not only fast estimates WCET based on a control flow tree (CFT), but also assesses the estimated WCET with a trusted level by a lightweight false path analysis. According to the trusted levels, corresponding warnings will be triggered once the estimated WCET exceeds a preset safe threshold. Hence developers can identify real timeliness defects more timely and efficiently. To this end, we first analyze the reasons of the overestimation of CFT-based WCET calculation; then we propose a trusted level model of timeout risks; for recognizing the structural patterns of timeout risks, we develop a risk data counting algorithm; and we also give some tactics for applying our approach more effectively. Experimental results show that our approach has almost the same running speed compared with the fast and interactive WCET analysis, but it saves more time in identifying real timeliness defects.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 184
Author(s):  
Alba Pedro-Zapater ◽  
Clemente Rodríguez ◽  
Juan Segarra ◽  
Rubén Gran Tejero ◽  
Víctor Viñals-Yúfera

Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix transposition on real hardware with pseudo-LRU (PLRU) caches. Our analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound. In general and high-performance computation, our analysis enables us to restrict the cache resources devoted to matrix transposition with no negative impact, in order to reduce both the energy consumption and the pollution to other computations.


2014 ◽  
Vol 651-653 ◽  
pp. 624-629
Author(s):  
Liang Liang Kong ◽  
Lin Xiang Shi ◽  
Lin Chen

Most embedded systems are real-time systems, so real-time is an important performance metric for embedded systems. The worst-case execution time (WCET) estimation for embedded programs could satisfy the requirement of hard real-time evaluation, so it is widely used in embedded systems evaluation. Based on sufficient survey on the progress of WCET estimation around the world, it proposes a new classification of WCET estimation. After introducing the principle of WCET estimation, it mainly demonstrates various types of technologies to estimate WCET and classifies them into two main streams, namely, static and dynamic WCET estimations. Finally, it shows the development of WCET analysis tools.


2018 ◽  
Vol 1 (1) ◽  
pp. 178-186 ◽  
Author(s):  
Sevil Serttaş ◽  
Veysel Harun Şahin

Real-time systems are widely used from the automotive industry to the aerospace industry. The scientists, researchers, and engineers who develop real-time platforms, worst-case execution time analysis methods and tools need to compare their solutions to alternatives. For this purpose, they use benchmark applications. Today many of our computing systems are multicore and/or multiprocessor systems. Therefore, to be able to compare the effectiveness of real-time platforms, worst-case execution time analysis methods and tools, the research community need multi-threaded benchmark applications which scale on multicore and/or multiprocessor systems. In this paper, we present the first version of PBench, a parallel, real-time benchmark suite. PBench includes different types of multi-threaded applications which implement various algorithms from searching to sorting, matrix multiplication to probability distribution calculation. In addition, PBench provides single-threaded versions of all programs to allow side by side comparisons.


Author(s):  
Federico Reghenzani

AbstractThe difficulties in estimating the Worst-Case Execution Time (WCET) of applications make the use of modern computing architectures limited in real-time systems. Critical embedded systems require the tasks of hard real-time applications to meet their deadlines, and formal proofs on the validity of this condition are usually required by certification authorities. In the last decade, researchers proposed the use of probabilistic measurement-based methods to estimate the WCET instead of traditional static methods. In this chapter, we summarize recent theoretical and quantitative results on the use of probabilistic approaches to estimate the WCET presented in the PhD thesis of the author, including possible exploitation scenarios, open challenges, and future directions.


Sign in / Sign up

Export Citation Format

Share Document