Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

In this work, we empirically derive the scheduler's behavior under concurrent workloads for NVIDIA's Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM's local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.

Download Full-text

Dynamic Round Robin CPU Scheduling Algorithm Based on K-Means Clustering Technique

Applied Sciences ◽

10.3390/app10155134 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5134

Author(s):

Samih M. Mostafa ◽

Hirofumi Amano

Keyword(s):

Operating System ◽

Waiting Time ◽

Scheduling Algorithm ◽

Scheduling Algorithms ◽

Turnaround Time ◽

Round Robin ◽

Time Cost ◽

Cpu Scheduling ◽

Scheduling Policy ◽

Service Period

Minimizing time cost in time-shared operating system is the main aim of the researchers interested in CPU scheduling. CPU scheduling is the basic job within any operating system. Scheduling criteria (e.g., waiting time, turnaround time and number of context switches (NCS)) are used to compare CPU scheduling algorithms. Round robin (RR) is the most common preemptive scheduling policy used in time-shared operating systems. In this paper, a modified version of the RR algorithm is introduced to combine the advantageous of favor short process and low scheduling overhead of RR for the sake of minimizing average waiting time, turnaround time and NCS. The proposed work starts by clustering the processes into clusters where each cluster contains processes that are similar in attributes (e.g., CPU service period, weights and number of allocations to CPU). Every process in a cluster is assigned the same time slice depending on the weight of its cluster and its CPU service period. The authors performed comparative study of the proposed approach and popular scheduling algorithms on nine groups of processes vary in their attributes. The evaluation was measured in terms of waiting time, turnaround time, and NCS. The experiments showed that the proposed approach gives better results.

Download Full-text

Round Robin Load Balancer for Node Swarm Clusters Running a Chatting Service on the Cloud

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1837.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 999-1003

Keyword(s):

Distributed System ◽

Execution Time ◽

Round Robin ◽

Ip Addresses ◽

Private Network ◽

Load Balancer

In this paper, we have created a chat application which uses socket programming for communication and all of the messages are saved in mongoDB. We have taken Docker application and hosted it on a three-node swarm cluster. This cluster uses Docker swarm technology to create a private network through which each of the nodes can talk to each other along a specified RPC port. The application runs in each node as a service and all load coming to the application has been balanced across three IP addresses in the swarm. This creates a distributed system and each node can act as a manager or a worker in the system. This technique helps to decrease the execution time to run servers on the cloud and can help improve the feasibility of online servers provided by the IT companies.

Download Full-text

AN INTELLIGENT SCHEDULER APPROACH TO MULTIPROCESSOR SCHEDULING OF APERIODIC TASKS

International Journal of Electronics Signals and Systems ◽

10.47893/ijess.2012.1063 ◽

2012 ◽

pp. 42-46

Author(s):

INDURAJ. P. R

Keyword(s):

Real Time ◽

Prior Knowledge ◽

Resource Availability ◽

Execution Time ◽

Arrival Time ◽

Multiprocessor Scheduling ◽

Multiprocessor System ◽

Aperiodic Tasks

This paper presents a new scheduler capable of scheduling aperiodic tasks at real time in multiprocessor system. The algorithm proposes a new way to determine dynamically tasks of high priority and low priority finding the elapsed execution time and remaining execution time, and the amount of resource availability and deadline of task, with no prior knowledge of task arrival time and also ensures that no processor remains ideal thus utilizing processors at all times.

Download Full-text

Static round-robin scheduling algorithms for scalable switches

10.14711/thesis-b777647 ◽

2002 ◽

Author(s):

Kong Hong Pun

Keyword(s):

Scheduling Algorithms ◽

Round Robin

Download Full-text

Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/71.993209 ◽

2002 ◽

Vol 13 (3) ◽

pp. 308-323 ◽

Cited By ~ 122

Author(s):

A. Dogan ◽

F. Ozguner

Keyword(s):

Failure Probability ◽

Execution Time ◽

Heterogeneous Computing ◽

Scheduling Algorithms

Download Full-text

Priority based round robin (PBRR) CPU scheduling algorithm

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i1.pp190-202 ◽

2019 ◽

Vol 9 (1) ◽

pp. 190

Author(s):

Sonia Zouaoui ◽

Lotfi Boussaid ◽

Abdellatif Mtibaa

Keyword(s):

Operating System ◽

Real Time ◽

Waiting Time ◽

Scheduling Algorithm ◽

Scheduling Algorithms ◽

Turnaround Time ◽

Round Robin ◽

New Approach ◽

Real Time Operating System ◽

Cpu Scheduling

<p>This paper introduce a new approach for scheduling algorithms which aim to improve real time operating system CPU performance. This new approach of CPU Scheduling algorithm is based on the combination of round-robin (RR) and Priority based (PB) scheduling algorithms. This solution maintains the advantage of simple round robin scheduling algorithm, which is reducing starvation and integrates the advantage of priority scheduling. The proposed algorithm implements the concept of time quantum and assigning as well priority index to the processes. Existing round robin CPU scheduling algorithm cannot be dedicated to real time operating system due to their large waiting time, large response time, large turnaround time and less throughput. This new algorithm improves all the drawbacks of round robin CPU scheduling algorithm. In addition, this paper presents analysis comparing proposed algorithm with existing round robin scheduling algorithm focusing on average waiting time and average turnaround time.</p>

Download Full-text

Analisis Perbandingan Algoritma Static Round-Robin dengan Least-Connection Terhadap Efisiensi Load Balancing pada Load Balancer Haproxy

InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan) ◽

10.30743/infotekjar.v4i1.1688 ◽

2019 ◽

Vol 4 (1) ◽

pp. 70-75

Author(s):

Hasta Triangga ◽

Ilham Faisal ◽

Imran Lubis

Keyword(s):

Load Balancing ◽

Operating Systems ◽

Web Server ◽

Scheduling Algorithms ◽

Round Robin ◽

Average Throughput ◽

Total Delay ◽

Average Percentage ◽

Cpu Usage ◽

Load Balancer

In IT networking, load balancing used to share the traffic between backend servers. The idea is to make effective and efficient load sharing. Load balancing uses scheduling algorithms in the process includes Static round-robin and Least-connection algorithm. Haproxy is a load balancer that can be used to perform the load balancing technique and run by Linux operating systems. In this research, Haproxy uses 4 Nginx web server as backend servers. Haproxy act as a reverse proxy which accessed by the client while the backend servers handle HTTP requests. The experiment involves 20 Client PCs that are used to perform HTTP requests simultaneously, using the Static round-robin algorithm and Least-connection on the haproxy load balancer alternately. When using Static round-robin algorithm, the results obtained average percentages of CPU usage successively for 1 minute; 5 minutes; and 15 minutes are; 0.1%; 0.25%; and 1.15% with average throughput produced is 14.74 kbps. Average total delay produced 64.3 kbps. The average total delay and jitter is 181.3 ms and 11.1 ms, respectively. As for the Least-connection algorithm average percentage obtained successively for 1 minute; 5 minutes; and 15 minutes are 0.1%; 0.3%; and 1.25% with the average throughput produced is 14.66 kbps. The average total delay and jitter is 350.3 ms and 24.5 ms, respectively. It means Static round-robin algorithm is more efficient than the algorithms Least-connection because it can produce a greater throughput with less CPU load and less total delay.

Download Full-text

A Round Robin Scheduling Policy for Ada

Lecture Notes in Computer Science - Reliable Software Technologies — Ada-Europe 2003 ◽

10.1007/3-540-44947-7_25 ◽

2003 ◽

pp. 334-343 ◽

Cited By ~ 7

Author(s):

A. Burns ◽

M. González Harbour ◽

A. J. Wellings

Keyword(s):

Round Robin ◽

Scheduling Policy

Download Full-text

Implementation Of Hybrid Scheduler In Hadoop

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.11084 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 868

Author(s):

B Thirumala Rao ◽

M Susmitha ◽

T Swathi ◽

G Akhil

Keyword(s):

Scheduling Algorithm ◽

Scheduling Algorithms ◽

Round Robin ◽

Priority Scheduling

The paper focusses on priority based round robin scheduling algorithm for scheduling jobs in Hadoop environment. By Using this Proposed Scheduling Algorithm it reduces the starvation of jobs. And the advantage of priority scheduling is that the process with the highest priority will be executed first. Combining the both strategies of round robin and priority scheduling algorithm a optimized algorithm is to be implemented. Which works more efficiently even after considering all the parameters of scheduling algorithm. This proposed algorithm is also compared with existing round robin and priority scheduling algorithms.

Download Full-text

Comparison of Different Scheduling Algorithms for WiMAX Base Station: Deficit Round-Robin vs. Proportional Fair vs. Weighted Deficit Round-Robin

2008 IEEE Wireless Communications and Networking Conference ◽

10.1109/wcnc.2008.354 ◽

2008 ◽

Cited By ~ 19

Author(s):

Jani Lakkakorpi ◽

Alexander Sayenko ◽

Jani Moilanen

Keyword(s):

Scheduling Algorithms ◽

Round Robin ◽

Base Station ◽

Proportional Fair ◽

Deficit Round Robin

Download Full-text