workload manager Latest Research Papers

AbstractContainerisation demonstrates its efficiency in application deployment in Cloud Computing. Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters. Singularity, initially designed for HPC systems, has become their de facto standard container runtime. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We introduce a Torque-Operator which serves as a bridge between HPC workload manager (TORQUE) and container orchestrator (Kubernetes). We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where container orchestration is performed on two levels.

Download Full-text

Union: An Automatic Workload Manager for Accelerating Network Simulation

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps47924.2020.00089 ◽

2020 ◽

Author(s):

Xin Wang ◽

Misbah Mubarak ◽

Yao Kang ◽

Robert B. Ross ◽

Zhiling Lan

Keyword(s):

Network Simulation ◽

Workload Manager

Download Full-text

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201921408004 ◽

2019 ◽

Vol 214 ◽

pp. 08004 ◽

Cited By ~ 1

Author(s):

R. Du ◽

J. Shi ◽

J. Zou ◽

X. Jiang ◽

Z. Sun ◽

...

Keyword(s):

Resource Utilization ◽

High Performance ◽

High Energy Physics ◽

Job Scheduling ◽

High Energy ◽

The Other ◽

Workload Manager ◽

High Degree ◽

Performance Computing ◽

Energy Physics

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.

Download Full-text

A Workload Manager: The Pre-assessment in Sincere Software Agent Environment

2018 International Symposium on Agent, Multi-Agent Systems and Robotics (ISAMSR) ◽

10.1109/isamsr.2018.8540547 ◽

2018 ◽

Author(s):

Nur Huda Jaafar ◽

Azhana Ahmad ◽

Mohd Sharifuddin Ahmad ◽

Nurzeatul Hamimah Abdul Hamid

Keyword(s):

Software Agent ◽

Workload Manager

Download Full-text

Design characteristics of a workload manager to aid drivers in safety–critical situations

Cognition Technology & Work ◽

10.1007/s10111-018-0490-2 ◽

2018 ◽

Vol 20 (3) ◽

pp. 401-412 ◽

Cited By ~ 3

Author(s):

Evona Teh ◽

Samantha Jamson ◽

Oliver Carsten

Keyword(s):

Safety Critical ◽

Design Characteristics ◽

Workload Manager ◽

Critical Situations

Download Full-text

A GPU-based high performance computing infrastructure for specialized NGS analyses

10.7287/peerj.preprints.2175v1 ◽

2016 ◽

Author(s):

Andrea Manconi ◽

Marco Moscatelli ◽

Matteo Gnocchi ◽

Giuliano Armano ◽

Luciano Milanesi

Keyword(s):

Gpu Computing ◽

Scientific Workflow ◽

Biological Data ◽

Single Server ◽

Web Based ◽

Continuous Increase ◽

Gpu Cluster ◽

Workflow System ◽

The Galaxy ◽

Workload Manager

Motivation Recent advances in genome sequencing and biological data analysis technologies used in bioinformatics have led to a fast and continuous increase in biological data. The difficulty of managing the huge amounts of data currently available to researchers and the need to have results within a reasonable time have led to the use of distributed and parallel computing infrastructures for their analysis. Recently, bioinformatics is exploring new approaches based on the use of hardware accelerators as GPUs. From an architectural perspective, GPUs are very different from traditional CPUs. Indeed, the latter are devices composed of few cores with lots of cache memory able to handle a few software threads at a time. Conversely, the former are devices equipped with hundreds of cores able to handle thousands of threads simultaneously, so that a very high level of parallelism can be reached. Use of GPUs over the last years has resulted in significant increases in the performance of certain applications. Despite GPUs are increasingly used in bioinformatics most laboratories do not have access to a GPU cluster or server. In this context, it is very important to provide useful services to use these tools. Methods A web-based platform has been implemented with the aim to enable researchers to perform their analysis through dedicated GPU-based computing resources. To this end, a GPU cluster equipped with 16 NVIDIA Tesla k20c cards has been configured. The infrastructure has been built upon the Galaxy technology [1]. Galaxy is an open web-based scientific workflow system for data intensive biomedical research accessible to researchers that do not have programming experience. Let us recall that Galaxy provides a public server, but it does not provide support to GPU-computing. By default, Galaxy is designed to run jobs on local systems. However, it can also be configured to run jobs on a cluster. The front-end Galaxy application runs on a single server, but tools are run on cluster nodes instead. To this end, Galaxy supports different distributed resource managers with the aim to enable different clusters. For the specific case, in our opinion SLURM [2] represents the most suitable workload manager to manage and control jobs. SLURM is a highly configurable workload and resource manager and it is currently used on six of the ten most powerful computers in the world including the Piz Daint, utilizing over 5000 NVIDIA Tesla K20 GPUs. Results GPU-based tools [3] devised by our group for quality control of NGS data have been used to test the infrastructure. Initially, this activity required to make changes to the tools with the aim to optimize the parallelization on the cluster according to the adopted workload manager. Successively, the tools have been converted into web-based services accessible through the Galaxy portal. Abstract truncated at 3,000 characters - the full version is available in the pdf file.

Download Full-text

A GPU-based high performance computing infrastructure for specialized NGS analyses

10.7287/peerj.preprints.2175 ◽

2016 ◽

Author(s):

Andrea Manconi ◽

Marco Moscatelli ◽

Matteo Gnocchi ◽

Giuliano Armano ◽

Luciano Milanesi

Keyword(s):

Gpu Computing ◽

Scientific Workflow ◽

Biological Data ◽

Single Server ◽

Web Based ◽

Continuous Increase ◽

Gpu Cluster ◽

Workflow System ◽

The Galaxy ◽

Workload Manager

Motivation Recent advances in genome sequencing and biological data analysis technologies used in bioinformatics have led to a fast and continuous increase in biological data. The difficulty of managing the huge amounts of data currently available to researchers and the need to have results within a reasonable time have led to the use of distributed and parallel computing infrastructures for their analysis. Recently, bioinformatics is exploring new approaches based on the use of hardware accelerators as GPUs. From an architectural perspective, GPUs are very different from traditional CPUs. Indeed, the latter are devices composed of few cores with lots of cache memory able to handle a few software threads at a time. Conversely, the former are devices equipped with hundreds of cores able to handle thousands of threads simultaneously, so that a very high level of parallelism can be reached. Use of GPUs over the last years has resulted in significant increases in the performance of certain applications. Despite GPUs are increasingly used in bioinformatics most laboratories do not have access to a GPU cluster or server. In this context, it is very important to provide useful services to use these tools. Methods A web-based platform has been implemented with the aim to enable researchers to perform their analysis through dedicated GPU-based computing resources. To this end, a GPU cluster equipped with 16 NVIDIA Tesla k20c cards has been configured. The infrastructure has been built upon the Galaxy technology [1]. Galaxy is an open web-based scientific workflow system for data intensive biomedical research accessible to researchers that do not have programming experience. Let us recall that Galaxy provides a public server, but it does not provide support to GPU-computing. By default, Galaxy is designed to run jobs on local systems. However, it can also be configured to run jobs on a cluster. The front-end Galaxy application runs on a single server, but tools are run on cluster nodes instead. To this end, Galaxy supports different distributed resource managers with the aim to enable different clusters. For the specific case, in our opinion SLURM [2] represents the most suitable workload manager to manage and control jobs. SLURM is a highly configurable workload and resource manager and it is currently used on six of the ten most powerful computers in the world including the Piz Daint, utilizing over 5000 NVIDIA Tesla K20 GPUs. Results GPU-based tools [3] devised by our group for quality control of NGS data have been used to test the infrastructure. Initially, this activity required to make changes to the tools with the aim to optimize the parallelization on the cluster according to the adopted workload manager. Successively, the tools have been converted into web-based services accessible through the Galaxy portal. Abstract truncated at 3,000 characters - the full version is available in the pdf file.

Download Full-text

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15 ◽

10.1145/2749246.2749249 ◽

2015 ◽

Cited By ~ 19

Author(s):

Ke Wang ◽

Xiaobing Zhou ◽

Kan Qiao ◽

Michael Lang ◽

Benjamin McClelland ◽

...

Keyword(s):

Workload Manager

Download Full-text

General Workload Manager: A task manager as a service

2015 IEEE International Conference on Communication Workshop (ICCW) ◽

10.1109/iccw.2015.7247451 ◽

2015 ◽

Author(s):

G. Indalecio ◽

F. Gomez-Folgar ◽

A.J. Garcia-Loureiro

Keyword(s):

Workload Manager

Download Full-text

The Crew Workload Manager: An Open-loop Adaptive System Design for Next Generation Flight Decks

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181311551004 ◽

2011 ◽

Vol 55 (1) ◽

pp. 16-20 ◽

Cited By ~ 1

Author(s):

M. C. Dorneich ◽

B. Passinger ◽

C. Hamblin ◽

C. Keinrath ◽

J. Vasek ◽

...

Keyword(s):

System Design ◽

Adaptive System ◽

Open Loop ◽

Next Generation ◽

Flight Decks ◽

Workload Manager

Download Full-text

workload manager
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Container orchestration on HPC systems through Kubernetes

Union: An Automatic Workload Manager for Accelerating Network Simulation

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

A Workload Manager: The Pre-assessment in Sincere Software Agent Environment

Design characteristics of a workload manager to aid drivers in safety–critical situations

A GPU-based high performance computing infrastructure for specialized NGS analyses

A GPU-based high performance computing infrastructure for specialized NGS analyses

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing

General Workload Manager: A task manager as a service

The Crew Workload Manager: An Open-loop Adaptive System Design for Next Generation Flight Decks

Export Citation Format

workload managerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Container orchestration on HPC systems through Kubernetes

Union: An Automatic Workload Manager for Accelerating Network Simulation

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

A Workload Manager: The Pre-assessment in Sincere Software Agent Environment

Design characteristics of a workload manager to aid drivers in safety–critical situations

A GPU-based high performance computing infrastructure for specialized NGS analyses

A GPU-based high performance computing infrastructure for specialized NGS analyses

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing

General Workload Manager: A task manager as a service

The Crew Workload Manager: An Open-loop Adaptive System Design for Next Generation Flight Decks

workload manager
Recently Published Documents