High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters

HPC Cloud Architecture to Reduce HPC Workflow Complexity in Containerized Environments

Applied Sciences ◽

10.3390/app11030923 ◽

2021 ◽

Vol 11 (3) ◽

pp. 923

Author(s):

Guohua Li ◽

Joon Woo ◽

Sang Boem Lim

Keyword(s):

High Performance ◽

Cloud Services ◽

Workload Management ◽

Job Management ◽

Security Issues ◽

Cloud Architecture ◽

Management Efficiency ◽

Complexity Problem ◽

And Performance ◽

Hpc Cloud

The complexity of high-performance computing (HPC) workflows is an important issue in the provision of HPC cloud services in most national supercomputing centers. This complexity problem is especially critical because it affects HPC resource scalability, management efficiency, and convenience of use. To solve this problem, while exploiting the advantage of bare-metal-level high performance, container-based cloud solutions have been developed. However, various problems still exist, such as an isolated environment between HPC and the cloud, security issues, and workload management issues. We propose an architecture that reduces this complexity by using Docker and Singularity, which are the container platforms most often used in the HPC cloud field. This HPC cloud architecture integrates both image management and job management, which are the two main elements of HPC cloud workflows. To evaluate the serviceability and performance of the proposed architecture, we developed and implemented a platform in an HPC cluster experiment. Experimental results indicated that the proposed HPC cloud architecture can reduce complexity to provide supercomputing resource scalability, high performance, user convenience, various HPC applications, and management efficiency.

High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing - ICS '07 ◽

10.1145/1274971.1274997 ◽

2007 ◽

Cited By ~ 31

Author(s):

Matthew J. Koop ◽

Sayantan Sur ◽

Qi Gao ◽

Dhabaleswar K. Panda

Keyword(s):

High Performance ◽

Infiniband Clusters

Sosialisasi Implementasi Prototype Portal Manajemen Sumber Daya High Performance Computing (HPC): Simulasi Dinamika Molekular

Charity ◽

10.25124/charity.v3i1.2369 ◽

2020 ◽

Vol 3 (2) ◽

Author(s):

Heru Suhartanto ◽

Arry Yanuar ◽

Ari Wibisono ◽

Yohanes Gultom

Keyword(s):

High Performance Computing ◽

High Performance ◽

Memory Storage ◽

Storage Service ◽

Elastic Computing ◽

Microsoft Azure ◽

Amazon Ec2 ◽

Performance Computing ◽

Hpc Cloud

Masalah pertama yang dihadapi terkait kegiatan ini adalah Penggunaan sumber daya High Performance Computing (HPC) membutuhkan fasilitas superkomputer yang sangat mahal, baik pengadaan maupun perawatannya. Sehingga fasilitas HPC tersebut hanya dimiliki institusi tertentu yang memiliki sumber pendanaan cukup besar. Terutama di Indonesia, mungkin hanya segelintir lembaga pendidikan dan penelitian yang mampu memilikinya. Hal ini mengakibatkan, pemanfaatan HPC untuk penelitian menjadi terbatas, karena sangat sedikit sekali aktivitas penelitian yang memiliki akses ke fasilitas HPC tertentu. Sehingga hal ini menjadi suatu hambatan tersendiri, terutama untuk kasus penelitian yang menuntut sumber daya komputasi besar. Masalah kedua yakni para peneliti yang umumnya berasal dari berbagai macam disiplin ilmu pengetahuan sering tidak memiliki kemampuan tentang bagaimana menggunakan infrastruktur HPC tersebut. Umumnya, pengguna HPC cloud akan diberikan beberapa server virtual, kemudian server virtual tersebut harus disiapkan secara mandiri sesuai kebutuhan aplikasinya. Setup tersebut berkaitan dengan instalasi Sistem operasi, midleware, aplikasi, serta beberapa konfigurasi yang tidak sederhana. (Rajan et all, 2011) Sehingga, peneliti tersebut harus bertambah pekerjaan dan waktu tambahan untuk mempelajari suatu kemampuan lain yang cukup rumit di luar esensi penelitian itu sendiri agar mampu menggunakan cloud IAAS tersebut Untuk mengatasi masalah masalah pertama tersebut, muncul satu alternatif solusi, yaitu dengan penggunaan layanan cloud Infrastruktur-as-a-Service (IAAS), di mana layanan cloud tersebut menyediakan infrastruktur HPC. Layanan infrastruktur tersebut meliputi prosesor, memory, storage, jaringan internet, listrik serta perawatan. Saat ini banyak bermunculan vendor IAAS, seperti Amazon EC2 (Elastic Computing Cloud for Computing Service), S3 (Simple Storage Service), Microsoft Azure (PAAS), Google AppEngine, dan lainnya. Penulis telah mengembangkan prototype portal Sumber Daya HPC untuk simulasi dinamika molekuler sebagai output dari kegiatan penelitian beberapa tahun belakangan ini. Dalam kegiatan ini, dilakukan ujicoba implementasi prototype tersebut kepada usernya yakni para peneliti baik dosen dan mahasiswa. Sosialisasi pengenalan dan ujicoba prototype tersebut telah dilakukan kepada beberapa rekan dosen, peneliti dan mahasiswa di Universitas Padjajadan dan Institute Teknologi Bandung. Berdasarkan hasil kuesioner kegiatan sosialisasi ini, seluruh peserta merasa puas dengan kegiatan sosialisasi ini dan menganggap prototype tersebut dapat membantu memperbaiki kondisi mereka. Sistem yang diperkenalkan ini juga dianggap sesuai oleh seluruh peserta untuk mengangkat potensi bidang mereka (farmasi/kimia). Sebagian besar peserta juga merasa puas dengan acara yang diselenggarakan ini dan merasa cukup mampu untuk memanfaatkan sistem ini secara mandiri tanpa bantuan/pendampingan dari tim UI.

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2017.43 ◽

2017 ◽

Cited By ~ 8

Author(s):

Jie Zhang ◽

Xiaoyi Lu ◽

Dhabaleswar K. Panda

Keyword(s):

Virtual Machine ◽

High Performance ◽

Virtual Machine Migration ◽

Infiniband Clusters ◽

Mpi Applications

Challenges and Opportunities in High Performance Cloud Computing

Advances in Wireless Technologies and Telecommunication - Handbook of Research on the IoT, Cloud Computing, and Wireless Network Optimization ◽

10.4018/978-1-5225-7335-7.ch005 ◽

2019 ◽

pp. 85-114 ◽

Cited By ~ 1

Author(s):

Manoj Himmatrao Devare

Keyword(s):

High Performance ◽

Cloud Service ◽

Cloud Environment ◽

Public Cloud ◽

Distributed Environment ◽

Processor Architectures ◽

Challenges And Opportunities ◽

Performance Computing ◽

Energy Engineering ◽

Hpc Cloud

The scientist, engineers, and researchers highly need the high-performance computing (HPC) services for executing the energy, engineering, environmental sciences, weather, and life science simulations. The virtual machine (VM) or docker-enabled HPC Cloud service provides the advantages of consolidation and support for multiple users in public cloud environment. Adding the hypervisor on the top of bare metal hardware brings few challenges like the overhead of computation due to virtualization, especially in HPC environment. This chapter discusses the challenges, solutions, and opportunities due to input-output, VMM overheads, interconnection overheads, VM migration problems, and scalability problems in HPC Cloud. This chapter portrays HPC Cloud as highly complex distributed environment consisting of the heterogeneous types of architectures consisting of the different processor architectures, inter-connectivity techniques, the problems of the shared memory, distributed memory, and hybrid architectures in distributed computing like resilience, scalability, check-pointing, and fault tolerance.

Challenges and Opportunities in High Performance Cloud Computing

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch096 ◽

2021 ◽

pp. 1989-2018

Author(s):

Manoj Himmatrao Devare

Keyword(s):

High Performance ◽

Cloud Service ◽

Cloud Environment ◽

Public Cloud ◽

Distributed Environment ◽

Processor Architectures ◽

Challenges And Opportunities ◽

Performance Computing ◽

Energy Engineering ◽

Hpc Cloud

The scientist, engineers, and researchers highly need the high-performance computing (HPC) services for executing the energy, engineering, environmental sciences, weather, and life science simulations. The virtual machine (VM) or docker-enabled HPC Cloud service provides the advantages of consolidation and support for multiple users in public cloud environment. Adding the hypervisor on the top of bare metal hardware brings few challenges like the overhead of computation due to virtualization, especially in HPC environment. This chapter discusses the challenges, solutions, and opportunities due to input-output, VMM overheads, interconnection overheads, VM migration problems, and scalability problems in HPC Cloud. This chapter portrays HPC Cloud as highly complex distributed environment consisting of the heterogeneous types of architectures consisting of the different processor architectures, inter-connectivity techniques, the problems of the shared memory, distributed memory, and hybrid architectures in distributed computing like resilience, scalability, check-pointing, and fault tolerance.

HPC cloud services based on the Proxmox VE platform

Вычислительные технологии ◽

10.25743/ict.2019.24.6.002. ◽

2019 ◽

Author(s):

А.В. Баранов ◽

Е.А. Киселёв

Keyword(s):

High Performance Computing ◽

Management System ◽

High Performance ◽

Virtual Machines ◽

Cloud Services ◽

Job Management ◽

Job Management System ◽

High Level ◽

Performance Computing ◽

Hpc Cloud

Организация облачных сервисов для высокопроизводительных вычислений затруднена, во-первых, по причине высоких накладных расходов на виртуализацию, во-вторых, из-за специфики систем управления заданиями и ресурсами в научных суперкомпьютерных центрах. В настоящей работе рассмотрен подход к построению облачных сервисов видов PaaS и SaaS, основанных на совместном функционировании облачной платформы Proxmox VE и системы управления прохождением параллельных заданий, применяемой в качестве менеджера ресурсов в Межведомственном суперкомпьютерном центре РАН. Purpose. The purpose of this paper is to develop methods and technologies for building high-performance computing cloud services in scientific supercomputer centers. Methodology.To build a cloud environment for high-performance scientific calculations (HPC), the corresponding three-level model and the method of combining flows of supercomputer tasks of various types were applied. Results.A high-level HPC cloud services technology based on the free Proxmox VE software platform has been developed. The Proxmox VE platform has been integrated with the domestic supercomputer job management system called SUPPZ. Experimental estimates of the overheads introduced in the high-performance computing process by the Proxmox components are obtained. Findings.An approach to the integration a supercomputer job management system and a virtualization platform is proposed. The presented approach is based on the representation of the supercomputer jobs as virtual machines or containers. Using the Proxmox VE platform as an example, the influence of a virtual environment on the execution time of parallel programs is investigated experimentally. The possibility of applying the proposed approach to building cloud services of the PaaS and SaaS type in scientific supercomputing centers of collective use is substantiated for a class of applications for which the overhead costs introduced by the Proxmox components are acceptable.

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters

2014 21st International Conference on High Performance Computing (HiPC) ◽

10.1109/hipc.2014.7116875 ◽

2014 ◽

Cited By ~ 6

Author(s):

A. Venkatesh ◽

H. Subramoni ◽

K. Hamidouche ◽

Dhabaleswar K. Panda

Keyword(s):

High Performance ◽

Streaming Applications ◽

Infiniband Clusters

High Performance Computing on the Cloud via HPC+Cloud software framework

2016 Fifth International Conference on Eco-friendly Computing and Communication Systems (ICECCS) ◽

10.1109/eco-friendly.2016.7893240 ◽

2016 ◽

Author(s):

Suresh Reuben Balakrishnan ◽

Shanmugam Veeramani ◽

John Alan Leong ◽

Iain Murray ◽

Amandeep S. Sidhu

Keyword(s):

High Performance Computing ◽

High Performance ◽

Software Framework ◽

Performance Computing ◽

Hpc Cloud

High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters

Lecture Notes in Computer Science - High Performance Computing – HiPC 2005 ◽

10.1007/11602569_19 ◽

2005 ◽

pp. 148-157 ◽

Cited By ~ 13

Author(s):

S. Sur ◽

U. K. R. Bondhugula ◽

A. Mamidala ◽

H. -W. Jin ◽

D. K. Panda

Keyword(s):

High Performance ◽

Infiniband Clusters