HPC cloud services based on the Proxmox VE platform

Author(s):  
А.В. Баранов ◽  
Е.А. Киселёв

Организация облачных сервисов для высокопроизводительных вычислений затруднена, во-первых, по причине высоких накладных расходов на виртуализацию, во-вторых, из-за специфики систем управления заданиями и ресурсами в научных суперкомпьютерных центрах. В настоящей работе рассмотрен подход к построению облачных сервисов видов PaaS и SaaS, основанных на совместном функционировании облачной платформы Proxmox VE и системы управления прохождением параллельных заданий, применяемой в качестве менеджера ресурсов в Межведомственном суперкомпьютерном центре РАН. Purpose. The purpose of this paper is to develop methods and technologies for building high-performance computing cloud services in scientific supercomputer centers. Methodology.To build a cloud environment for high-performance scientific calculations (HPC), the corresponding three-level model and the method of combining flows of supercomputer tasks of various types were applied. Results.A high-level HPC cloud services technology based on the free Proxmox VE software platform has been developed. The Proxmox VE platform has been integrated with the domestic supercomputer job management system called SUPPZ. Experimental estimates of the overheads introduced in the high-performance computing process by the Proxmox components are obtained. Findings.An approach to the integration a supercomputer job management system and a virtualization platform is proposed. The presented approach is based on the representation of the supercomputer jobs as virtual machines or containers. Using the Proxmox VE platform as an example, the influence of a virtual environment on the execution time of parallel programs is investigated experimentally. The possibility of applying the proposed approach to building cloud services of the PaaS and SaaS type in scientific supercomputing centers of collective use is substantiated for a class of applications for which the overhead costs introduced by the Proxmox components are acceptable.

2021 ◽  
Vol 11 (3) ◽  
pp. 923
Author(s):  
Guohua Li ◽  
Joon Woo ◽  
Sang Boem Lim

The complexity of high-performance computing (HPC) workflows is an important issue in the provision of HPC cloud services in most national supercomputing centers. This complexity problem is especially critical because it affects HPC resource scalability, management efficiency, and convenience of use. To solve this problem, while exploiting the advantage of bare-metal-level high performance, container-based cloud solutions have been developed. However, various problems still exist, such as an isolated environment between HPC and the cloud, security issues, and workload management issues. We propose an architecture that reduces this complexity by using Docker and Singularity, which are the container platforms most often used in the HPC cloud field. This HPC cloud architecture integrates both image management and job management, which are the two main elements of HPC cloud workflows. To evaluate the serviceability and performance of the proposed architecture, we developed and implemented a platform in an HPC cluster experiment. Experimental results indicated that the proposed HPC cloud architecture can reduce complexity to provide supercomputing resource scalability, high performance, user convenience, various HPC applications, and management efficiency.


2016 ◽  
Vol 31 (6) ◽  
pp. 1985-1996 ◽  
Author(s):  
David Siuta ◽  
Gregory West ◽  
Henryk Modzelewski ◽  
Roland Schigas ◽  
Roland Stull

Abstract As cloud-service providers like Google, Amazon, and Microsoft decrease costs and increase performance, numerical weather prediction (NWP) in the cloud will become a reality not only for research use but for real-time use as well. The performance of the Weather Research and Forecasting (WRF) Model on the Google Cloud Platform is tested and configurations and optimizations of virtual machines that meet two main requirements of real-time NWP are found: 1) fast forecast completion (timeliness) and 2) economic cost effectiveness when compared with traditional on-premise high-performance computing hardware. Optimum performance was found by using the Intel compiler collection with no more than eight virtual CPUs per virtual machine. Using these configurations, real-time NWP on the Google Cloud Platform is found to be economically competitive when compared with the purchase of local high-performance computing hardware for NWP needs. Cloud-computing services are becoming viable alternatives to on-premise compute clusters for some applications.


Sign in / Sign up

Export Citation Format

Share Document