Comparative Study between Parallel           K-Means and Parallel K-Medoids with Message Passing Interface (MPI)

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

Improved Failure Detection and Propagation Mechanisms for MPI

10.5753/eradsp.2021.16702 ◽

2021 ◽

Author(s):

Pedro Henrique Di Francia Rosso ◽

Emilio Francesquini

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Failure Detection ◽

Essential Components ◽

Failure Propagation ◽

Performance Computing

The Message Passing Interface (MPI) standard is largely used in High-Performance Computing (HPC) systems. Such systems employ a large number of computing nodes. Thus, Fault Tolerance (FT) is a concern since a large number of nodes leads to more frequent failures. Two essential components of FT are Failure Detection (FD) and Failure Propagation (FP). This paper proposes improvements to existing FD and FP mechanisms to provide more portability, scalability, and low overhead. Results show that the methods proposed can achieve better or at least similar results to existing methods while providing portability to any MPI standard-compliant distribution.

Download Full-text

Enabling High Performance Computing for Java Applications using the Message-Passing Interface

Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering ◽

10.4203/ccp.95.66 ◽

2011 ◽

Cited By ~ 1

Author(s):

A. Cheptsov ◽

M. Assel ◽

B. Koller ◽

G. Gallizo

Keyword(s):

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Performance Computing

Download Full-text

Exascale Message Passing Interface based Program Deadlock Detection

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i2.9575 ◽

2016 ◽

Vol 6 (2) ◽

pp. 887

Author(s):

Raed AlDhubhani ◽

Fathy Eassa ◽

Faisal Saeed

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Deadlock Detection ◽

Efficient Manner ◽

Parallel Processes ◽

Critical Issues ◽

Near Future ◽

Performance Computing ◽

Standard Library

Deadlock detection is one of the main issues of software testing in High Performance Computing (HPC) and also inexascale computing areas in the near future. Developing and testing programs for machines which have millions of cores is not an easy task. HPC program consists of thousands (or millions) of parallel processes which need to communicate with each other in the runtime. Message Passing Interface (MPI) is a standard library which provides this communication capability and it is frequently used in the HPC. Exascale programs are expected to be developed using MPI standard library. For parallel programs, deadlock is one of the expected problems. In this paper, we discuss the deadlock detection for exascale MPI-based programs where the scalability and efficiency are critical issues. The proposed method detects and flags the processes and communication operations which are potential to cause deadlocks in a scalable and efficient manner. MPI benchmark programs were used to test the proposed method.

Download Full-text

Scaling modeling and simulation on high-performance computing clusters

SIMULATION ◽

10.1177/0037549719878249 ◽

2019 ◽

Vol 96 (2) ◽

pp. 221-232

Author(s):

Mike Mikailov ◽

Junshan Qiu ◽

Fu-Jyh Luo ◽

Stephen Whitney ◽

Nicholas Petrick

Keyword(s):

Modeling And Simulation ◽

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

The United States ◽

United States Food ◽

States Food ◽

Performance Computing

Large-scale modeling and simulation (M&S) applications that do not require run-time inter-process communications can exhibit scaling problems when migrated to high-performance computing (HPC) clusters if traditional software parallelization techniques, such as POSIX multi-threading and the message passing interface, are used. A comprehensive approach for scaling M&S applications on HPC clusters has been developed and is called “computation segmentation.” The computation segmentation is based on the built-in array job facility of job schedulers. If used correctly for appropriate applications, the array job approach provides significant benefits that are not obtainable using other methods. The parallelization illustrated in this paper becomes quite complex in its own right when applied to extremely large M&S tasks, particularly due to the need for nested loops. At the United States Food and Drug Administration, the approach has provided unsurpassed efficiency, flexibility, and scalability for work that can be performed using embarrassingly parallel algorithms.

Download Full-text

An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

International Journal of Antennas and Propagation ◽

10.1155/2015/690510 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Shugang Jiang ◽

Yu Zhang ◽

Zhongchao Lin ◽

Xunwang Zhao

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Optimal Topology ◽

Communication Model ◽

Electromagnetic Simulations ◽

General Rules ◽

Computing Platforms ◽

Difference Time ◽

Performance Computing

It may not be a challenge to run a Finite-Difference Time-Domain (FDTD) code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface) virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.

Download Full-text

3-D Deterministic Burnup With High Performance Computing

Volume 2: Fuel Cycle and High Level Waste Management; Computational Fluid Dynamics, Neutronics Methods and Coupled Codes; Student Paper Competition ◽

10.1115/icone16-48961 ◽

2008 ◽

Author(s):

Thomas J. Plower ◽

Kevin Manalo ◽

Mireille A. Rowe

Keyword(s):

Monte Carlo ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Diffusion Theory ◽

Computation Time ◽

Fuel Burnup ◽

Parallel Computer ◽

Fuel Pin ◽

Alternative Means

Current 3-D reactor burnup simulation codes typically utilize either transport-corrected diffusion theory or Monte Carlo methods to perform flux calculations necessary for fuel depletion. Monte Carlo codes, particularly the Monte Carlo N-particle Transport Code (MCNP) from Los Alamos, have become increasingly popular with the growth of parallel computing. While achieving a criticality eigenvalue is relatively straight forward, run times for large models requiring converged fission sources from proper burnup computation quickly becomes very time consuming. Additionally, past analyses have shown difficulties in source convergence for lattice problems using Monte Carlo [1]. To invoke an alternative means of computing core burnup and decreasing computation time for large models, a deterministic tool such as the PENTRAN/PENBURN suite is necessary. PENTRAN is a multi-group, anisotropic Sn code for 3-D Cartesian geometries; it has been specifically designed for distributed memory, scalable parallel computer architectures using the MPI (Message Passing Interface) library. Automatic domain decomposition among the angular, energy, and spatial variables with an adaptive differencing algorithm and other numerical enhancements make PENTRAN an extremely robust solver with a 0.975 parallel code fraction (based on Amdahl’s law). PENBURN (Parallel Environment BURNup), a recently developed fuel depletion solver, works in conjunction with PENTRAN and performs 3-D zone based fuel burnup using the direct Bateman chain solution method. The aim of this paper is to demonstrate the capabilities and unique features of the PENTRAN/PENBURN suite through a fuel burnup study on a 3 wt% enriched UO2 fuel pin and 17×17 Westinghouse OFA assembly.

Download Full-text

Interacción de los componentes del clúster Microsoft Hpc (High Performance Computing) Server 2008, con aplicaciones MPI

I3+ ◽

10.24267/23462329.76 ◽

2015 ◽

Vol 2 (1) ◽

pp. 96

Author(s):

Mauricio Ochoa Echeverría ◽

Daniel Alejandro Soto Beltrán

Keyword(s):

High Performance Computing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Performance Computing

La computación de alto rendimiento o HPC (High Performace Computing), hace referencia a la solución de problemas complejos por medio de un grupo de servidores, llamado clúster. El clúster en su totalidad se utiliza para la resolución de un problema individual o bien a la resolución de un grupo de problemas relacionados entre sí. Inicialmente, las soluciones facilitadas por HPC estaban limitadas a la investigación científica, pero debido a la reducción de costos y a las nuevas necesidades en los negocios, ya se puede aplicar HPC a centros de datos, simulaciones de software, procesamiento de transacciones y a cualquier resolución de problemas complejos para negocios. En relación a lo anterior la Universidad de Boyacá desarrolló el proyecto de investigación titulado “Interacción de los componentes del clúster Microsoft HPC (High Performance Computing) Server 2008 con aplicaciones MPI”. Se describe la forma en que se relacionan entre sí los componentes que hacen parte del clúster de procesamiento de información Microsoft HPC (High Performance Computing) Server 2008, para resolver un problema de alta complejidad con aplicaciones desarrolladas en MPI (Message Passing Interface, Interfaz de paso de mensajes). Para el desarrollo del proyecto un clúster de alto desempeño mediante el uso de Microsoft HPC Server 2008, utilizando máquinas virtuales, para observar su funcionamiento y determinar los reportes de rendimiento que estos sistemas ofrecen a los usuarios, para lo cual se utilizaron pruebas con aplicaciones desarrolladas en MPI. Este artículo describe: El clúster HP Server incluyendo los conceptos referentes a él (Clústeres, computación de alto desempeño y MPI), todos los requerimientos de infraestructura para el desarrollo del proyecto, el proceso de creación del clúster desde la virtualización de nodos, pasando por la creación del dominio hasta llegar a la implementación de los programas MPI y el análisis de los resultados obtenidos.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368.v1 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text