An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

It may not be a challenge to run a Finite-Difference Time-Domain (FDTD) code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface) virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.

Download Full-text

Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines

Journal of Computational Chemistry ◽

10.1002/jcc.23214 ◽

2013 ◽

Vol 34 (11) ◽

pp. 915-927 ◽

Cited By ~ 43

Author(s):

Xiaohua Zhang ◽

Sergio E. Wong ◽

Felice C. Lightstone

Keyword(s):

Molecular Docking ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Large Databases ◽

Computing Machines ◽

Performance Computing

Download Full-text

Improved Failure Detection and Propagation Mechanisms for MPI

10.5753/eradsp.2021.16702 ◽

2021 ◽

Author(s):

Pedro Henrique Di Francia Rosso ◽

Emilio Francesquini

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Failure Detection ◽

Essential Components ◽

Failure Propagation ◽

Performance Computing

The Message Passing Interface (MPI) standard is largely used in High-Performance Computing (HPC) systems. Such systems employ a large number of computing nodes. Thus, Fault Tolerance (FT) is a concern since a large number of nodes leads to more frequent failures. Two essential components of FT are Failure Detection (FD) and Failure Propagation (FP). This paper proposes improvements to existing FD and FP mechanisms to provide more portability, scalability, and low overhead. Results show that the methods proposed can achieve better or at least similar results to existing methods while providing portability to any MPI standard-compliant distribution.

Download Full-text

Enabling High Performance Computing for Java Applications using the Message-Passing Interface

Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering ◽

10.4203/ccp.95.66 ◽

2011 ◽

Cited By ~ 1

Author(s):

A. Cheptsov ◽

M. Assel ◽

B. Koller ◽

G. Gallizo

Keyword(s):

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Performance Computing

Download Full-text

Exascale Message Passing Interface based Program Deadlock Detection

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i2.9575 ◽

2016 ◽

Vol 6 (2) ◽

pp. 887

Author(s):

Raed AlDhubhani ◽

Fathy Eassa ◽

Faisal Saeed

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Deadlock Detection ◽

Efficient Manner ◽

Parallel Processes ◽

Critical Issues ◽

Near Future ◽

Performance Computing ◽

Standard Library

Deadlock detection is one of the main issues of software testing in High Performance Computing (HPC) and also inexascale computing areas in the near future. Developing and testing programs for machines which have millions of cores is not an easy task. HPC program consists of thousands (or millions) of parallel processes which need to communicate with each other in the runtime. Message Passing Interface (MPI) is a standard library which provides this communication capability and it is frequently used in the HPC. Exascale programs are expected to be developed using MPI standard library. For parallel programs, deadlock is one of the expected problems. In this paper, we discuss the deadlock detection for exascale MPI-based programs where the scalability and efficiency are critical issues. The proposed method detects and flags the processes and communication operations which are potential to cause deadlocks in a scalable and efficient manner. MPI benchmark programs were used to test the proposed method.

Download Full-text

Scaling modeling and simulation on high-performance computing clusters

SIMULATION ◽

10.1177/0037549719878249 ◽

2019 ◽

Vol 96 (2) ◽

pp. 221-232

Author(s):

Mike Mikailov ◽

Junshan Qiu ◽

Fu-Jyh Luo ◽

Stephen Whitney ◽

Nicholas Petrick

Keyword(s):

Modeling And Simulation ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

The United States ◽

United States Food ◽

States Food ◽

Performance Computing

Large-scale modeling and simulation (M&S) applications that do not require run-time inter-process communications can exhibit scaling problems when migrated to high-performance computing (HPC) clusters if traditional software parallelization techniques, such as POSIX multi-threading and the message passing interface, are used. A comprehensive approach for scaling M&S applications on HPC clusters has been developed and is called “computation segmentation.” The computation segmentation is based on the built-in array job facility of job schedulers. If used correctly for appropriate applications, the array job approach provides significant benefits that are not obtainable using other methods. The parallelization illustrated in this paper becomes quite complex in its own right when applied to extremely large M&S tasks, particularly due to the need for nested loops. At the United States Food and Drug Administration, the approach has provided unsurpassed efficiency, flexibility, and scalability for work that can be performed using embarrassingly parallel algorithms.

Download Full-text

Interacción de los componentes del clúster Microsoft Hpc (High Performance Computing) Server 2008, con aplicaciones MPI

I3+ ◽

10.24267/23462329.76 ◽

2015 ◽

Vol 2 (1) ◽

pp. 96

Author(s):

Mauricio Ochoa Echeverría ◽

Daniel Alejandro Soto Beltrán

Keyword(s):

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Performance Computing

La computación de alto rendimiento o HPC (High Performace Computing), hace referencia a la solución de problemas complejos por medio de un grupo de servidores, llamado clúster. El clúster en su totalidad se utiliza para la resolución de un problema individual o bien a la resolución de un grupo de problemas relacionados entre sí. Inicialmente, las soluciones facilitadas por HPC estaban limitadas a la investigación científica, pero debido a la reducción de costos y a las nuevas necesidades en los negocios, ya se puede aplicar HPC a centros de datos, simulaciones de software, procesamiento de transacciones y a cualquier resolución de problemas complejos para negocios. En relación a lo anterior la Universidad de Boyacá desarrolló el proyecto de investigación titulado “Interacción de los componentes del clúster Microsoft HPC (High Performance Computing) Server 2008 con aplicaciones MPI”. Se describe la forma en que se relacionan entre sí los componentes que hacen parte del clúster de procesamiento de información Microsoft HPC (High Performance Computing) Server 2008, para resolver un problema de alta complejidad con aplicaciones desarrolladas en MPI (Message Passing Interface, Interfaz de paso de mensajes). Para el desarrollo del proyecto un clúster de alto desempeño mediante el uso de Microsoft HPC Server 2008, utilizando máquinas virtuales, para observar su funcionamiento y determinar los reportes de rendimiento que estos sistemas ofrecen a los usuarios, para lo cual se utilizaron pruebas con aplicaciones desarrolladas en MPI. Este artículo describe: El clúster HP Server incluyendo los conceptos referentes a él (Clústeres, computación de alto desempeño y MPI), todos los requerimientos de infraestructura para el desarrollo del proyecto, el proceso de creación del clúster desde la virtualización de nodos, pasando por la creación del dominio hasta llegar a la implementación de los programas MPI y el análisis de los resultados obtenidos.

Download Full-text

Comparative Study between Parallel K-Means and Parallel K-Medoids with Message Passing Interface (MPI)

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.2016.22.86 ◽

2017 ◽

Vol 2 (2) ◽

pp. 27

Author(s):

Fhira Nhita

Keyword(s):

Data Mining ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Clustering Algorithms ◽

Computation Time ◽

Sequential Algorithm ◽

Data Mining Technique ◽

Combination Technology ◽

Performance Computing

<p>Data mining is a combination technology for analyze a useful information from dataset using some technique such as classification, clustering, and etc. Clustering is one of the most used data mining technique these day. K-Means and K-Medoids is one of clustering algorithms that mostly used because it’s easy implementation, efficient, and also present good results. Besides mining important information, the needs of time spent when mining data is also a concern in today era considering the real world applications produce huge volume of data. This research analyzed the result from K-Means and K-Medoids algorithm and time performance using High Performance Computing (HPC) Cluster to parallelize K-Means and K-Medoids algorithms and using Message Passing Interface (MPI) library. The results shown that K-Means algorithm gives smaller SSE than K-Medoids. And also parallel algorithm that used MPI gives faster computation time than sequential algorithm.</p>

Download Full-text

GPU Computation and Platforms

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing ◽

10.4018/978-1-4666-8853-7.ch007 ◽

2016 ◽

pp. 136-174

Author(s):

K. Bhargavi ◽

Sathish Babu B.

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Gpu Computing ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Computing Platforms ◽

Computationally Intensive ◽

Graphics Processing

The GPUs (Graphics Processing Unit) were mainly used to speed up computation intensive high performance computing applications. There are several tools and technologies available to perform general purpose computationally intensive application. This chapter primarily discusses about GPU parallelism, applications, probable challenges and also highlights some of the GPU computing platforms, which includes CUDA, OpenCL (Open Computing Language), OpenMPC (Open MP extended for CUDA), MPI (Message Passing Interface), OpenACC (Open Accelerator), DirectCompute, and C++ AMP (C++ Accelerated Massive Parallelism). Each of these platforms is discussed briefly along with their advantages and disadvantages.

Download Full-text

Study of Particle Swarm Optimization Algorithms Using Message Passing Interface and Graphical Processing Units Employing a High Performance Computing Cluster

Communications in Computer and Information Science - High Performance Computer Applications ◽

10.1007/978-3-319-32243-8_8 ◽

2016 ◽

pp. 116-131 ◽

Cited By ~ 1

Author(s):

Manuel-H. Santana-Castolo ◽

J. Alejandro Morales ◽

Sulema Torres-Ramos ◽

Alma Y. Alanis

Keyword(s):

Particle Swarm Optimization ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Swarm Optimization ◽

Graphical Processing Units ◽

Graphical Processing ◽

High Performance Computing Cluster ◽

Performance Computing

Download Full-text

A three-phase workflow for general and expressive representations of nondeterminism in HPC applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019868826 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1175-1184

Author(s):

Dylan Chapp ◽

Danny Rorabaugh ◽

Kento Sato ◽

Dong H Ahn ◽

Michela Taufer

Keyword(s):

Machine Learning ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Structural Similarity ◽

Machine Learning Techniques ◽

Graph Kernels ◽

Learning Techniques ◽

Three Phase ◽

Performance Computing

Nondeterminism is an increasingly entrenched property of high-performance computing (HPC) applications and has recently been shown to seriously hamper debugging and reproducibility efforts. Tools for addressing the nondeterministic debugging problem have emerged, but they do not provide methods for systematically cataloging the nondeterminism in a given application. We propose a three-phase workflow for representing executions of nondeterministic message passing interface programs as event graphs, quantifying their structural similarity with graph kernels, and applying machine learning techniques to investigate shared properties across applications. We present an empirical study comparing two graph kernels’ suitability for this task and propose future uses of the methodology.

Download Full-text