Influence of Job Runtime Prediction on Scheduling Quality

2021 ◽  
Vol 42 (11) ◽  
pp. 2562-2570
Author(s):  
G. I. Savin ◽  
D. S. Lyakhovets ◽  
A. V. Baranov
Keyword(s):  
2019 ◽  
Vol 35 (18) ◽  
pp. 3453-3460 ◽  
Author(s):  
Anastasia Tyryshkina ◽  
Nate Coraor ◽  
Anton Nekrutenko

Abstract Motivation One of the many technical challenges that arises when scheduling bioinformatics analyses at scale is determining the appropriate amount of memory and processing resources. Both over- and under-allocation leads to an inefficient use of computational infrastructure. Over allocation locks resources that could otherwise be used for other analyses. Under-allocation causes job failure and requires analyses to be repeated with a larger memory or runtime allowance. We address this challenge by using a historical dataset of bioinformatics analyses run on the Galaxy platform to demonstrate the feasibility of an online service for resource requirement estimation. Results Here we introduced the Galaxy job run dataset and tested popular machine learning models on the task of resource usage prediction. We include three popular forest models: the extra trees regressor, the gradient boosting regressor and the random forest regressor, and find that random forests perform best in the runtime prediction task. We also present two methods of choosing walltimes for previously unseen jobs. Quantile regression forests are more accurate in their predictions, and grant the ability to improve performance by changing the confidence of the estimates. However, the sizes of the confidence intervals are variable and cannot be absolutely constrained. Random forest classifiers address this problem by providing control over the size of the prediction intervals with an accuracy that is comparable to that of the regressor. We show that estimating the memory requirements of a job is possible using the same methods, which as far as we know, has not been done before. Such estimation can be highly beneficial for accurate resource allocation. Availability and implementation Source code available at https://github.com/atyryshkina/algorithm-performance-analysis, implemented in Python. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 6 (7) ◽  
Author(s):  
Peifeng Li ◽  
Qiaoming Zhu ◽  
Qin Ji ◽  
Xiaoxu Zhu
Keyword(s):  

2021 ◽  
Vol 11 (20) ◽  
pp. 9448
Author(s):  
Qiqi Wang ◽  
Hongjie Zhang ◽  
Cheng Qu ◽  
Yu Shen ◽  
Xiaohui Liu ◽  
...  

The job scheduler plays a vital role in high-performance computing platforms. It determines the execution order of the jobs and the allocation of resources, which in turn affect the resource utilization of the entire system. As the scale and complexity of HPC continue to grow, job scheduling is becoming increasingly important and difficult. Existing studies relied on user-specified or regression techniques to give fixed runtime prediction values and used the values in static heuristic scheduling algorithms. However, these approaches require very accurate runtime predictions to produce better results, and fixed heuristic scheduling strategies cannot adapt to changes in the workload. In this work, we propose RLSchert, a job scheduler based on deep reinforcement learning and remaining runtime prediction. Firstly, RLSchert estimates the state of the system by using a dynamic job remaining runtime predictor, thereby providing an accurate spatiotemporal view of the cluster status. Secondly, RLSchert learns the optimal policy to select or kill jobs according to the status through imitation learning and the proximal policy optimization algorithm. Extensive experiments on real-world job logs at the USTC Supercomputing Center showed that RLSchert is superior to static heuristic policies and outperforms the learning-based scheduler DeepRM. In addition, the dynamic predictor gives a more accurate remaining runtime prediction result, which is essential for most learning-based schedulers.


Sign in / Sign up

Export Citation Format

Share Document