Optimizing Job Coscheduling by Adaptive Deadlock-Free Scheduler

It is ubiquitous that multiple jobs coexist on the same machine, because tens or hundreds of cores are able to reside on the same chip. To run multiple jobs efficiently, the schedulers should provide flexible scheduling logic. Besides, corunning jobs may compete for the shared resources, which may lead to performance degradation. While many scheduling algorithms have been proposed for supporting different scheduling logic schemes and alleviating this contention, job coscheduling without performance degradation on the same machine remains a challenging problem. In this paper, we propose a novel adaptive deadlock-free scheduler, which provides flexible scheduling logic schemes and adopts optimistic lock control mechanism to coordinate resource competition among corunning jobs. This scheduler exposes all underlying resource information to corunning jobs and gives them necessary utensils to make use of that information to compete resource in a free-for-all manner. To further relieve performance degradation of coscheduling, this scheduler enables the automated control over the number of active utensils when frequent conflict becomes the performance bottleneck. We justify our adaptive deadlock-free scheduling and present simulation results for synthetic and real-world workloads, in which we compare our proposed scheduler with two prevalent schedulers. It indicates that our proposed approach outperforms the compared schedulers in scheduling efficiency and scalability. Our results also manifest that the adaptive deadlock-free control facilitates significant improvements on the parallelism of node-level scheduling and the performance for workloads.

Download Full-text

Performance Degradation in Parallel-Server Systems with Shared Resources

Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools ◽

10.1145/3388831.3388853 ◽

2020 ◽

Author(s):

Esa Hyytiä ◽

Rhonda Righter

Keyword(s):

Performance Degradation ◽

Shared Resources ◽

Parallel Server ◽

Server Systems

Download Full-text

Memory-Aware Scheduling Parallel Real-Time Tasks for Multicore Systems

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021400106 ◽

2021 ◽

Vol 31 (04) ◽

pp. 613-634

Author(s):

Zhenyang Lei ◽

Xiangdong Lei ◽

Jun Long

Keyword(s):

Real Time ◽

Data Access ◽

Main Memory ◽

Second Phase ◽

Shared Resources ◽

Access Memory ◽

Worst Case ◽

Scheduling Policy ◽

The Core ◽

Level Scheduling

Shared resources on the multicore chip, such as main memory, are increasingly becoming a point of contention. Traditional real-time task scheduling policies focus on solely on the CPU, and do not take in account memory access and cache effects. In this paper, we propose parallel real-time tasks scheduling (PRTTS) policy on multicore platforms. Each set of tasks is represented as a directed acyclic graph (DAG). The priorities of tasks are assigned according to task periods Rate Monotonic (RM). Each task is composed of three phases. The first phase is read memory stage, the second phase is execution phase and the third phase is write memory phase. The tasks use locks and critical sections to protect data access. The global scheduler maintains the task pool in which tasks are ready to be executed which can run on any core. PRTTS scheduling policy consists of two levels: the first level scheduling schedules ready real-time tasks in the task pool to cores, and the second level scheduling schedules real-time tasks on cores. Tasks can preempt the core on running tasks of low priority. The priorities of tasks which want to access memory are dynamically increased above all tasks that do not access memory. When the data accessed by a task is in the cache, the priority of the task is raised to the highest priority, and the task is scheduled immediately to preempt the core on running the task not accessing memory. After accessing memory, the priority of these tasks is restored to the original priority and these tasks are pended, the preempted task continues to run on the core. This paper analyzes the schedulability of PRTTS scheduling policy. We derive an upper-bound on the worst-case response-time for parallel real-time tasks. A series of extensive simulation experiments have been performed to evaluate the performance of proposed PRTTS scheduling policy. The results of simulation experiment show that PRTTS scheduling policy offers better performance in terms of core utilization and schedulability rate of tasks.

Download Full-text

The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid

Scientific Programming ◽

10.1155/2000/319291 ◽

2000 ◽

Vol 8 (3) ◽

pp. 111-126 ◽

Cited By ~ 58

Author(s):

Henri Casanova ◽

Graziano Obertelli ◽

Francine Berman ◽

Richard Wolski

Keyword(s):

Shared Resources ◽

Performance Potential ◽

Parameter Spaces ◽

Grid Middleware ◽

Shared Data ◽

Current Implementation ◽

Data Files ◽

Level Scheduling ◽

Grid Resources ◽

Application Execution

The Computational Grid is a promising platform for the efficient execution ofparameter sweep applicationsover large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize re-use, and so that the application execution can adapt to the deliverable performance potential of target heterogeneous, distributed and shared resources. Parameter sweep applications are an important class of applications and would greatly benefit from the development ofGrid middlewarethat embeds a scheduler for performance and targets Grid resources transparently. In this paper we describe a user-level Grid middleware project, the AppLeS Parameter Sweep Template (APST), that uses application-level scheduling techniques [1] and various Grid technologies to allow the efficient deployment of parameter sweep applications over the Grid. We discuss several possible scheduling algorithms and detail our software design. We then describe our current implementation of APST using systems like Globus [2], NetSolve [3] and the Network Weather Service [4], and present experimental results.

Download Full-text

Ephedrine QoS: An Antidote to Slow, Congested, Bufferless NoCs

The Scientific World JOURNAL ◽

10.1155/2014/691865 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11

Author(s):

Juan Fang ◽

Zhicheng Yao ◽

Xiufeng Sui ◽

Yungang Bao

Keyword(s):

Control Mechanism ◽

System Throughput ◽

Shared Resources ◽

Core System ◽

Networks On Chip ◽

Previous State ◽

Routing Mechanism ◽

Congestion Control Mechanism ◽

On Chip

Datacenters consolidate diverse applications to improve utilization. However when multiple applications are colocated on such platforms, contention for shared resources like networks-on-chip (NoCs) can degrade the performance of latency-critical online services (high-priority applications). Recently proposed bufferless NoCs (Nychis et al.) have the advantages of requiring less area and power, but they pose challenges in quality-of-service (QoS) support, which usually relies on buffer-based virtual channels (VCs). We propose QBLESS, a QoS-aware bufferless NoC scheme for datacenters. QBLESS consists of two components: a routing mechanism (QBLESS-R) that can substantially reduce flit deflection for high-priority applications and a congestion-control mechanism (QBLESS-CC) that guarantees performance for high-priority applications and improves overall system throughput. We use trace-driven simulation to model a 64-core system, finding that, when compared to BLESS, a previous state-of-the-art bufferless NoC design, QBLESS, improves performance of high-priority applications by an average of 33.2% and reduces network-hops by an average of 42.8%.

Download Full-text

Using Machine Learning Techniques for Performance Prediction on Multi-Cores

Applications and Developments in Grid, Cloud, and High Performance Computing ◽

10.4018/978-1-4666-2065-0.ch017 ◽

2013 ◽

pp. 259-273 ◽

Cited By ~ 1

Author(s):

Jitendra Kumar Rai ◽

Atul Negi ◽

Rajeev Wankar

Keyword(s):

Machine Learning ◽

Performance Studies ◽

Performance Prediction ◽

Memory Hierarchy ◽

Performance Degradation ◽

Machine Learning Techniques ◽

System Level ◽

Shared Resources ◽

Learning Techniques ◽

Multi Core Processor

Sharing of resources by the cores of multi-core processors brings performance issues for the system. Majority of the shared resources belong to memory hierarchy sub-system of the processors such as last level caches, prefetchers and memory buses. Programs co-running on the cores of a multi-core processor may interfere with each other due to usage of such shared resources. Such interference causes co-running programs to suffer with performance degradation. Previous research works include efforts to characterize and classify the memory behaviors of programs to predict the performance. Such knowledge could be useful to create workloads to perform performance studies on multi-core processors. It could also be utilized to form policies at system level to mitigate the interference between co-running programs due to use of shared resources. In this work, machine learning techniques are used to predict the performance on multi-core processors. The main contribution of the study is enumeration of solo-run program attributes, which can be used to predict concurrent-run performance despite change in the number of co-running programs sharing the resources. The concurrent-run involves the interference between co-running programs due to use of shared resources.

Download Full-text

SIMULATION OF HIERARCHICAL RESOURCE MANAGEMENT FOR META-COMPUTING SYSTEMS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054101000692 ◽

2001 ◽

Vol 12 (05) ◽

pp. 629-643 ◽

Cited By ~ 6

Author(s):

J. SANTOSO ◽

G. D. VAN ALBADA ◽

P. M. A. SLOOT ◽

B. A. A. NAZIEF

Keyword(s):

Resource Management ◽

Local Level ◽

Research Question ◽

Computing Systems ◽

Resource Information ◽

Significant Performance ◽

High System ◽

Level Scheduling ◽

Computing Environments ◽

Global And Local

Optimal scheduling in meta-computing environments still is an open research question. Various resource management (RM) architectures have been proposed in the literature (e.g. [2][13][12]). In the present paper we explore, through simulation, various multi-level scheduling strategies for compound computing environments comprising several clusters of workstations. We study global and local RM and their interaction. The local RM comprises both the cluster management and operating system schedulers. Each level refines the scheduling decisions of the layer above it, taking into account the latest resource information. Our experiments explore conventional strategies like First Come, First Served (FCFS) and Shortest Job First (SJF) at the global RM level. At all levels, the schedulers strive to maintain a good load balance. The unit of load balancing at the global level is the job consisting of one or more parallel tasks; at the local level it is the task. The results of our simulations indicate that, especially at high system loads, the use of a global RM can result in a significant performance gain.

Download Full-text

Performance degradation in parallel-server systems with shared resources and lack of coordination

Performance Evaluation ◽

10.1016/j.peva.2021.102260 ◽

2021 ◽

pp. 102260

Author(s):

Esa Hyytiä ◽

Rhonda Righter

Keyword(s):

Performance Degradation ◽

Shared Resources ◽

Parallel Server ◽

Server Systems

Download Full-text

The Paradox of the Plankton: Coexistence of Structured Microbial Communities

10.1101/2021.09.13.460068 ◽

2021 ◽

Author(s):

Alberto Scarampi

Keyword(s):

Competitive Exclusion ◽

Resource Competition ◽

Exclusion Principle ◽

Limited Resources ◽

Shared Resources ◽

Competitive Exclusion Principle ◽

Biophysical Models ◽

Competition Models ◽

Model Match ◽

Trade Offs

In the framework of resource-competition models, it has been argued that the number of species stably coexisting in an ecosystem cannot exceed the number of shared resources. However, plankton seems to be an exception of this so-called "competitive-exclusion principle". In planktic ecosystems, a large number of different species stably coexist in an environment with limited resources. This contradiction between theoretical expectations and empirical observations is often referred to as "The Paradox of the Plankton". This project aims to investigate biophysical models that can account for the large biodiversity observed in real ecosystems in order to resolve this paradox. A model is proposed that combines classical resource competition models, metabolic trade-offs and stochastic ecosystem assembly. Simulations of the model match empirical observations, while relaxing some unrealistic assumptions from previous models.

Download Full-text

Using Machine Learning Techniques for Performance Prediction on Multi-Cores

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2011100102 ◽

2011 ◽

Vol 3 (4) ◽

pp. 14-28 ◽

Cited By ~ 1

Author(s):

Jitendra Kumar Rai ◽

Atul Negi ◽

Rajeev Wankar

Keyword(s):

Machine Learning ◽

Performance Studies ◽

Performance Prediction ◽

Memory Hierarchy ◽

Performance Degradation ◽

Machine Learning Techniques ◽

System Level ◽

Shared Resources ◽

Learning Techniques ◽

Multi Core Processor

Download Full-text

Predicting Composition of Genetic Circuits with Resource Competition: Demand and Sensitivity

10.1101/2021.05.26.445862 ◽

2021 ◽

Author(s):

Cameron D McBride ◽

Domitilla Del Vecchio

Keyword(s):

Experimental Approach ◽

Resource Competition ◽

Genetic Circuits ◽

Shared Resources ◽

Resource Demand ◽

Sensor Module ◽

Resource Loading ◽

Genetic Module

The design of genetic circuits typically relies on characterization of constituent modules in isolation to predict the behavior of modules' composition. However, it has been shown that the behavior of a genetic module changes when other modules are in the cell due to competition for shared resources. In order to engineer multi-module circuits that behave as intended, it is thus necessary to predict changes in the behavior of a genetic module when other modules load cellular resources. Here, we introduce two characteristics of circuit modules: the demand for cellular resources and the sensitivity to resource loading. When both are known for every genetic module in a circuit, they can be used to predict any module's behavior upon addition of any other module to the cell. We develop an experimental approach to measure both characteristics for any circuit module using a resource sensor module. Using the measured resource demand and sensitivity for each module in a library, the outputs of the modules can be accurately predicted when they are inserted in the cell in arbitrary combinations. These resource competition characteristics may be used to inform the design of genetic circuits that perform as predicted despite resource competition.

Download Full-text