The Sicilian Grid Infrastructure for High Performance Computing

Author(s):  
Carmelo Marcello Iacono-Manno ◽  
Marco Fargetta ◽  
Roberto Barbera ◽  
Alberto Falzone ◽  
Giuseppe Andronico ◽  
...  

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present paper will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

2010 ◽  
Vol 1 (1) ◽  
pp. 40-54 ◽  
Author(s):  
Carmelo Marcello Iacono-Manno ◽  
Marco Fargetta ◽  
Roberto Barbera ◽  
Alberto Falzone ◽  
Giuseppe Andronico ◽  
...  

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present article will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).


Author(s):  
Dazhong Wu ◽  
Xi Liu ◽  
Steve Hebert ◽  
Wolfgang Gentzsch ◽  
Janis Terpenny

Cloud computing is an innovative computing paradigm that can potentially bridge the gap between increasing computing demands in computer aided engineering (CAE) applications and limited scalability, flexibility, and agility in traditional computing paradigms. In light of the benefits of cloud computing, high performance computing (HPC) in the cloud has the potential to enable users to not only accelerate computationally expensive CAE simulations (e.g., finite element analysis), but also to reduce costs by utilizing on-demand and scalable cloud computing resources. The objective of this research is to evaluate the performance of running a large finite element simulation in a public cloud. Specifically, an experiment is performed to identify individual and interactive effects of several factors (e.g., CPU core count, memory size, solver computational rate, and input/output rate) on run time using statistical methods. Our experimental results have shown that the performance of HPC in the cloud is sufficient for the application of a large finite element analysis, and that run time can be optimized by properly selecting a configuration of CPU, memory, and interconnect.


Author(s):  
Masahiro Nakao ◽  
Hitoshi Murai ◽  
Hidetoshi Iwashita ◽  
Taisuke Boku ◽  
Mitsuhisa Sato

To improve productivity for developing parallel applications on high performance computing systems, the XcalableMP PGAS language has been proposed. XcalableMP supports both a typical parallelization under the “global-view memory model” which uses directives and a flexible parallelization under the “local-view memory model” which uses coarray features. The goal of the present paper is to clarify XcalableMP’s productivity and performance. To do so, we implement and evaluate the high performance computing challenge benchmark, namely, EP STREAM Triad, High Performance Linpack, Global fast Fourier transform, and RandomAccess on the K computer using up to 16,384 compute nodes and a generic cluster system using up to 128 compute nodes. We found that we could more easily implement the benchmarks using XcalableMP rather than using MPI. Moreover, most of the performance results using XcalableMP were almost the same as those using MPI.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
E. A. Huerta ◽  
Asad Khan ◽  
Edward Davis ◽  
Colleen Bushell ◽  
William D. Gropp ◽  
...  

Abstract Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.


Author(s):  
Camille Coti

This chapter gives an overview of techniques used to tolerate failures in high-performance distributed applications. We describe basic replication techniques, automatic rollback recovery and application-based fault tolerance. We present the challenges raised specifically by distributed, high performance computing and the performance overhead the fault tolerance mechanisms are likely to cost. Last, we give an example of a fault-tolerant algorithm that exploits specific properties of a recent algorithm.


Sign in / Sign up

Export Citation Format

Share Document