The Sicilian Grid Infrastructure for High Performance Computing

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present paper will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

Download Full-text

The Sicilian Grid Infrastructure for High Performance Computing

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2010090803 ◽

2010 ◽

Vol 1 (1) ◽

pp. 40-54 ◽

Cited By ~ 1

Author(s):

Carmelo Marcello Iacono-Manno ◽

Marco Fargetta ◽

Roberto Barbera ◽

Alberto Falzone ◽

Giuseppe Andronico ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Grid Infrastructure ◽

Scheduling Policy ◽

Computing Paradigm ◽

Regional Area ◽

Computer Fluid Dynamics ◽

Grid Infrastructures ◽

Performance Computing

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present article will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

Download Full-text

A Non-intrusive Methodology to Improve the Performance of Parallel Applications in High Performance Computing

2012 41st International Conference on Parallel Processing Workshops ◽

10.1109/icppw.2012.56 ◽

2012 ◽

Author(s):

Fernando H.P. Luz ◽

Denis Taniguchi ◽

Liria M. Sato

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Performance Computing

Download Full-text

Performance Evaluation of Cloud-Based High Performance Computing for Finite Element Analysis

Volume 1A: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-46381 ◽

2015 ◽

Cited By ~ 1

Author(s):

Dazhong Wu ◽

Xi Liu ◽

Steve Hebert ◽

Wolfgang Gentzsch ◽

Janis Terpenny

Keyword(s):

Finite Element Analysis ◽

Cloud Computing ◽

Finite Element ◽

High Performance Computing ◽

High Performance ◽

Interactive Effects ◽

Element Analysis ◽

Computing Paradigm ◽

Run Time ◽

Performance Computing

Cloud computing is an innovative computing paradigm that can potentially bridge the gap between increasing computing demands in computer aided engineering (CAE) applications and limited scalability, flexibility, and agility in traditional computing paradigms. In light of the benefits of cloud computing, high performance computing (HPC) in the cloud has the potential to enable users to not only accelerate computationally expensive CAE simulations (e.g., finite element analysis), but also to reduce costs by utilizing on-demand and scalable cloud computing resources. The objective of this research is to evaluate the performance of running a large finite element simulation in a public cloud. Specifically, an experiment is performed to identify individual and interactive effects of several factors (e.g., CPU core count, memory size, solver computational rate, and input/output rate) on run time using statistical methods. Our experimental results have shown that the performance of HPC in the cloud is sufficient for the application of a large finite element analysis, and that run time can be optimized by properly selecting a configuration of CPU, memory, and interconnect.

Download Full-text

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017698214 ◽

2017 ◽

Vol 33 (1) ◽

pp. 110-123 ◽

Cited By ~ 5

Author(s):

Masahiro Nakao ◽

Hitoshi Murai ◽

Hidetoshi Iwashita ◽

Taisuke Boku ◽

Mitsuhisa Sato

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Memory Model ◽

Computing Systems ◽

Local View ◽

And Performance ◽

Performance Results ◽

Performance Computing ◽

Do So

To improve productivity for developing parallel applications on high performance computing systems, the XcalableMP PGAS language has been proposed. XcalableMP supports both a typical parallelization under the “global-view memory model” which uses directives and a flexible parallelization under the “local-view memory model” which uses coarray features. The goal of the present paper is to clarify XcalableMP’s productivity and performance. To do so, we implement and evaluate the high performance computing challenge benchmark, namely, EP STREAM Triad, High Performance Linpack, Global fast Fourier transform, and RandomAccess on the K computer using up to 16,384 compute nodes and a generic cluster system using up to 128 compute nodes. We found that we could more easily implement the benchmarks using XcalableMP rather than using MPI. Moreover, most of the performance results using XcalableMP were almost the same as those using MPI.

Download Full-text

Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure

Journal Of Big Data ◽

10.1186/s40537-020-00361-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

E. A. Huerta ◽

Asad Khan ◽

Edward Davis ◽

Colleen Bushell ◽

William D. Gropp ◽

...

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Social Patterns ◽

Computing Paradigm ◽

Recent Developments ◽

Optimization Schemes ◽

Performance Computing

Abstract Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.

Download Full-text

Fault Tolerance Techniques for Distributed, Parallel Applications

Innovative Research and Applications in Next-Generation High Performance Computing - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-5225-0287-6.ch009 ◽

2016 ◽

pp. 221-252

Author(s):

Camille Coti

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Fault Tolerant ◽

Distributed Applications ◽

Parallel Applications ◽

Rollback Recovery ◽

Tolerance Mechanisms ◽

Performance Computing

This chapter gives an overview of techniques used to tolerate failures in high-performance distributed applications. We describe basic replication techniques, automatic rollback recovery and application-based fault tolerance. We present the challenges raised specifically by distributed, high performance computing and the performance overhead the fault tolerance mechanisms are likely to cost. Last, we give an example of a fault-tolerant algorithm that exploits specific properties of a recent algorithm.

Download Full-text