High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters

Author(s):  
Jie Zhang ◽  
Xiaoyi Lu ◽  
Dhabaleswar K. Panda

2021 ◽  
Vol 11 (3) ◽  
pp. 923
Author(s):  
Guohua Li ◽  
Joon Woo ◽  
Sang Boem Lim

The complexity of high-performance computing (HPC) workflows is an important issue in the provision of HPC cloud services in most national supercomputing centers. This complexity problem is especially critical because it affects HPC resource scalability, management efficiency, and convenience of use. To solve this problem, while exploiting the advantage of bare-metal-level high performance, container-based cloud solutions have been developed. However, various problems still exist, such as an isolated environment between HPC and the cloud, security issues, and workload management issues. We propose an architecture that reduces this complexity by using Docker and Singularity, which are the container platforms most often used in the HPC cloud field. This HPC cloud architecture integrates both image management and job management, which are the two main elements of HPC cloud workflows. To evaluate the serviceability and performance of the proposed architecture, we developed and implemented a platform in an HPC cluster experiment. Experimental results indicated that the proposed HPC cloud architecture can reduce complexity to provide supercomputing resource scalability, high performance, user convenience, various HPC applications, and management efficiency.



Charity ◽  
2020 ◽  
Vol 3 (2) ◽  
Author(s):  
Heru Suhartanto ◽  
Arry Yanuar ◽  
Ari Wibisono ◽  
Yohanes Gultom

Masalah pertama yang dihadapi terkait kegiatan ini adalah Penggunaan sumber daya High Performance Computing (HPC) membutuhkan fasilitas superkomputer yang sangat mahal, baik pengadaan maupun perawatannya. Sehingga fasilitas HPC tersebut hanya dimiliki institusi tertentu yang memiliki sumber pendanaan cukup besar. Terutama di Indonesia, mungkin hanya segelintir lembaga pendidikan dan penelitian yang mampu memilikinya. Hal ini mengakibatkan, pemanfaatan HPC untuk penelitian menjadi terbatas, karena sangat sedikit sekali aktivitas penelitian yang memiliki akses ke fasilitas HPC tertentu. Sehingga hal ini menjadi suatu hambatan tersendiri, terutama untuk kasus penelitian yang menuntut sumber daya komputasi besar. Masalah kedua yakni para peneliti yang umumnya berasal dari berbagai macam disiplin ilmu pengetahuan sering tidak memiliki kemampuan tentang bagaimana menggunakan infrastruktur HPC tersebut. Umumnya, pengguna HPC cloud akan diberikan beberapa server virtual, kemudian server virtual tersebut harus disiapkan secara mandiri sesuai kebutuhan aplikasinya. Setup tersebut berkaitan dengan instalasi Sistem operasi, midleware, aplikasi, serta beberapa konfigurasi yang tidak sederhana. (Rajan et all, 2011) Sehingga, peneliti tersebut harus bertambah pekerjaan dan waktu tambahan untuk mempelajari suatu kemampuan lain yang cukup rumit di luar esensi penelitian itu sendiri agar mampu menggunakan cloud IAAS tersebut Untuk mengatasi masalah masalah pertama tersebut, muncul satu alternatif solusi, yaitu dengan penggunaan layanan cloud Infrastruktur-as-a-Service (IAAS), di mana layanan cloud tersebut menyediakan infrastruktur HPC. Layanan infrastruktur tersebut meliputi prosesor, memory, storage, jaringan internet, listrik serta perawatan. Saat ini banyak bermunculan vendor IAAS, seperti Amazon EC2 (Elastic Computing Cloud for Computing Service), S3 (Simple Storage Service), Microsoft Azure (PAAS), Google AppEngine, dan lainnya. Penulis telah mengembangkan prototype portal Sumber Daya HPC untuk simulasi dinamika molekuler sebagai output dari kegiatan penelitian beberapa tahun belakangan ini. Dalam kegiatan ini, dilakukan ujicoba implementasi prototype tersebut kepada usernya yakni para peneliti baik dosen dan mahasiswa. Sosialisasi pengenalan dan ujicoba prototype tersebut telah dilakukan kepada beberapa rekan dosen, peneliti dan mahasiswa di Universitas Padjajadan dan Institute Teknologi Bandung. Berdasarkan hasil kuesioner kegiatan sosialisasi ini, seluruh peserta merasa puas dengan kegiatan sosialisasi ini dan menganggap prototype tersebut dapat membantu memperbaiki kondisi mereka. Sistem yang diperkenalkan ini juga dianggap sesuai oleh seluruh peserta untuk mengangkat potensi bidang mereka (farmasi/kimia). Sebagian besar peserta juga merasa puas dengan acara yang diselenggarakan ini dan merasa cukup mampu untuk memanfaatkan sistem ini secara mandiri tanpa bantuan/pendampingan dari tim UI.



Author(s):  
Manoj Himmatrao Devare

The scientist, engineers, and researchers highly need the high-performance computing (HPC) services for executing the energy, engineering, environmental sciences, weather, and life science simulations. The virtual machine (VM) or docker-enabled HPC Cloud service provides the advantages of consolidation and support for multiple users in public cloud environment. Adding the hypervisor on the top of bare metal hardware brings few challenges like the overhead of computation due to virtualization, especially in HPC environment. This chapter discusses the challenges, solutions, and opportunities due to input-output, VMM overheads, interconnection overheads, VM migration problems, and scalability problems in HPC Cloud. This chapter portrays HPC Cloud as highly complex distributed environment consisting of the heterogeneous types of architectures consisting of the different processor architectures, inter-connectivity techniques, the problems of the shared memory, distributed memory, and hybrid architectures in distributed computing like resilience, scalability, check-pointing, and fault tolerance.



Author(s):  
Manoj Himmatrao Devare

The scientist, engineers, and researchers highly need the high-performance computing (HPC) services for executing the energy, engineering, environmental sciences, weather, and life science simulations. The virtual machine (VM) or docker-enabled HPC Cloud service provides the advantages of consolidation and support for multiple users in public cloud environment. Adding the hypervisor on the top of bare metal hardware brings few challenges like the overhead of computation due to virtualization, especially in HPC environment. This chapter discusses the challenges, solutions, and opportunities due to input-output, VMM overheads, interconnection overheads, VM migration problems, and scalability problems in HPC Cloud. This chapter portrays HPC Cloud as highly complex distributed environment consisting of the heterogeneous types of architectures consisting of the different processor architectures, inter-connectivity techniques, the problems of the shared memory, distributed memory, and hybrid architectures in distributed computing like resilience, scalability, check-pointing, and fault tolerance.



Author(s):  
А.В. Баранов ◽  
Е.А. Киселёв

Организация облачных сервисов для высокопроизводительных вычислений затруднена, во-первых, по причине высоких накладных расходов на виртуализацию, во-вторых, из-за специфики систем управления заданиями и ресурсами в научных суперкомпьютерных центрах. В настоящей работе рассмотрен подход к построению облачных сервисов видов PaaS и SaaS, основанных на совместном функционировании облачной платформы Proxmox VE и системы управления прохождением параллельных заданий, применяемой в качестве менеджера ресурсов в Межведомственном суперкомпьютерном центре РАН. Purpose. The purpose of this paper is to develop methods and technologies for building high-performance computing cloud services in scientific supercomputer centers. Methodology.To build a cloud environment for high-performance scientific calculations (HPC), the corresponding three-level model and the method of combining flows of supercomputer tasks of various types were applied. Results.A high-level HPC cloud services technology based on the free Proxmox VE software platform has been developed. The Proxmox VE platform has been integrated with the domestic supercomputer job management system called SUPPZ. Experimental estimates of the overheads introduced in the high-performance computing process by the Proxmox components are obtained. Findings.An approach to the integration a supercomputer job management system and a virtualization platform is proposed. The presented approach is based on the representation of the supercomputer jobs as virtual machines or containers. Using the Proxmox VE platform as an example, the influence of a virtual environment on the execution time of parallel programs is investigated experimentally. The possibility of applying the proposed approach to building cloud services of the PaaS and SaaS type in scientific supercomputing centers of collective use is substantiated for a class of applications for which the overhead costs introduced by the Proxmox components are acceptable.





Author(s):  
S. Sur ◽  
U. K. R. Bondhugula ◽  
A. Mamidala ◽  
H. -W. Jin ◽  
D. K. Panda


Sign in / Sign up

Export Citation Format

Share Document