scholarly journals Comparative Study between Parallel K-Means and Parallel K-Medoids with Message Passing Interface (MPI)

Author(s):  
Fhira Nhita

<p>Data mining is a combination technology for analyze a useful information from dataset using some technique such as classification, clustering, and etc. Clustering is one of the most used data mining technique these day. K-Means and K-Medoids is one of clustering algorithms that mostly used because it’s easy implementation, efficient, and also present good results. Besides mining important information, the needs of time spent when mining data is also a concern in today era considering the real world applications produce huge volume of data. This research analyzed the result from K-Means and K-Medoids algorithm and time performance using High Performance Computing (HPC) Cluster to parallelize K-Means and K-Medoids algorithms and using Message Passing Interface (MPI) library. The results shown that K-Means algorithm gives smaller SSE than K-Medoids. And also parallel algorithm that used MPI gives faster computation time than sequential algorithm.</p>

2021 ◽  
Author(s):  
Oluvaseun Owojaiye

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.


2021 ◽  
Author(s):  
Pedro Henrique Di Francia Rosso ◽  
Emilio Francesquini

The Message Passing Interface (MPI) standard is largely used in High-Performance Computing (HPC) systems. Such systems employ a large number of computing nodes. Thus, Fault Tolerance (FT) is a concern since a large number of nodes leads to more frequent failures. Two essential components of FT are Failure Detection (FD) and Failure Propagation (FP). This paper proposes improvements to existing FD and FP mechanisms to provide more portability, scalability, and low overhead. Results show that the methods proposed can achieve better or at least similar results to existing methods while providing portability to any MPI standard-compliant distribution.


Author(s):  
Raed AlDhubhani ◽  
Fathy Eassa ◽  
Faisal Saeed

Deadlock detection is one of the main issues of software testing in High Performance Computing (HPC) and also inexascale computing areas in the near future. Developing and testing programs for machines which have millions of cores is not an easy task. HPC program consists of thousands (or millions) of parallel processes which need to communicate with each other in the runtime. Message Passing Interface (MPI) is a standard library which provides this communication capability and it is frequently used in the HPC. Exascale programs are expected to be developed using MPI standard library. For parallel programs, deadlock is one of the expected problems. In this paper, we discuss the deadlock detection for exascale MPI-based programs where the scalability and efficiency are critical issues. The proposed method detects and flags the processes and communication operations which are potential to cause deadlocks in a scalable and efficient manner. MPI benchmark programs were used to test the proposed method.


SIMULATION ◽  
2019 ◽  
Vol 96 (2) ◽  
pp. 221-232
Author(s):  
Mike Mikailov ◽  
Junshan Qiu ◽  
Fu-Jyh Luo ◽  
Stephen Whitney ◽  
Nicholas Petrick

Large-scale modeling and simulation (M&S) applications that do not require run-time inter-process communications can exhibit scaling problems when migrated to high-performance computing (HPC) clusters if traditional software parallelization techniques, such as POSIX multi-threading and the message passing interface, are used. A comprehensive approach for scaling M&S applications on HPC clusters has been developed and is called “computation segmentation.” The computation segmentation is based on the built-in array job facility of job schedulers. If used correctly for appropriate applications, the array job approach provides significant benefits that are not obtainable using other methods. The parallelization illustrated in this paper becomes quite complex in its own right when applied to extremely large M&S tasks, particularly due to the need for nested loops. At the United States Food and Drug Administration, the approach has provided unsurpassed efficiency, flexibility, and scalability for work that can be performed using embarrassingly parallel algorithms.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Shugang Jiang ◽  
Yu Zhang ◽  
Zhongchao Lin ◽  
Xunwang Zhao

It may not be a challenge to run a Finite-Difference Time-Domain (FDTD) code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface) virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.


Author(s):  
Thomas J. Plower ◽  
Kevin Manalo ◽  
Mireille A. Rowe

Current 3-D reactor burnup simulation codes typically utilize either transport-corrected diffusion theory or Monte Carlo methods to perform flux calculations necessary for fuel depletion. Monte Carlo codes, particularly the Monte Carlo N-particle Transport Code (MCNP) from Los Alamos, have become increasingly popular with the growth of parallel computing. While achieving a criticality eigenvalue is relatively straight forward, run times for large models requiring converged fission sources from proper burnup computation quickly becomes very time consuming. Additionally, past analyses have shown difficulties in source convergence for lattice problems using Monte Carlo [1]. To invoke an alternative means of computing core burnup and decreasing computation time for large models, a deterministic tool such as the PENTRAN/PENBURN suite is necessary. PENTRAN is a multi-group, anisotropic Sn code for 3-D Cartesian geometries; it has been specifically designed for distributed memory, scalable parallel computer architectures using the MPI (Message Passing Interface) library. Automatic domain decomposition among the angular, energy, and spatial variables with an adaptive differencing algorithm and other numerical enhancements make PENTRAN an extremely robust solver with a 0.975 parallel code fraction (based on Amdahl’s law). PENBURN (Parallel Environment BURNup), a recently developed fuel depletion solver, works in conjunction with PENTRAN and performs 3-D zone based fuel burnup using the direct Bateman chain solution method. The aim of this paper is to demonstrate the capabilities and unique features of the PENTRAN/PENBURN suite through a fuel burnup study on a 3 wt% enriched UO2 fuel pin and 17×17 Westinghouse OFA assembly.


I3+ ◽  
2015 ◽  
Vol 2 (1) ◽  
pp. 96
Author(s):  
Mauricio Ochoa Echeverría ◽  
Daniel Alejandro Soto Beltrán

La computación de alto rendimiento o HPC (High Performace Computing), hace referencia a la solución de problemas complejos por medio de un grupo de servidores, llamado clúster. El clúster en su totalidad se utiliza para la resolución de un problema individual o bien a la resolución de un grupo de problemas relacionados entre sí. Inicialmente, las soluciones facilitadas por HPC estaban limitadas a la investigación científica, pero debido a la reducción de costos y a las nuevas necesidades en los negocios, ya se puede aplicar HPC a centros de datos, simulaciones de software, procesamiento de transacciones y a cualquier resolución de problemas complejos para negocios. En relación a lo anterior la Universidad de Boyacá desarrolló el proyecto de investigación titulado “Interacción de los componentes del clúster Microsoft HPC (High Performance Computing) Server 2008 con aplicaciones MPI”. Se describe la forma en que se relacionan entre sí los componentes que hacen parte del clúster de procesamiento de información Microsoft HPC (High Performance Computing) Server 2008, para resolver un problema de alta complejidad con aplicaciones desarrolladas en MPI (Message Passing Interface, Interfaz de paso de mensajes). Para el desarrollo del proyecto un clúster de alto desempeño mediante el uso de Microsoft HPC Server 2008, utilizando máquinas virtuales, para observar su funcionamiento y determinar los reportes de rendimiento que estos sistemas ofrecen a los usuarios, para lo cual se utilizaron pruebas con aplicaciones desarrolladas en MPI. Este artículo describe: El clúster HP Server incluyendo los conceptos referentes a él (Clústeres, computación de alto desempeño y MPI), todos los requerimientos de infraestructura para el desarrollo del proyecto, el proceso de creación del clúster desde la virtualización de nodos, pasando por la creación del dominio hasta llegar a la implementación de los programas MPI y el análisis de los resultados obtenidos. 


2021 ◽  
Author(s):  
Oluvaseun Owojaiye

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.


Sign in / Sign up

Export Citation Format

Share Document