A highly portable parallel implementation of AMBER4 using the message passing interface standard

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text

Optimized parallel simulations of analytic bond-order potentials on hybrid shared/distributed memory with MPI and OpenMP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017727060 ◽

2017 ◽

Vol 33 (2) ◽

pp. 227-241 ◽

Cited By ~ 1

Author(s):

Carlos Teijeiro ◽

Thomas Hammerschmidt ◽

Ralf Drautz ◽

Godehard Sutmann

Keyword(s):

Message Passing ◽

Bond Order ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computational Cost ◽

Parallel Simulations ◽

Decomposition Scheme ◽

Significant Performance ◽

Restricted Volume ◽

Bond Order Potentials

Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

BigMPI4py: Python module for parallelization of Big Data objects

10.1101/517441 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alex M. Ascension ◽

Marcos J. Araúzo-Bravo

Keyword(s):

Big Data ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Germ Layers ◽

Link Type ◽

Big Data Applications ◽

Genome Bisulfite Sequencing ◽

Data Objects ◽

Parallel Evaluation

AbstractBig Data analysis is a discipline with a growing number of areas where huge amounts of data is extracted and analyzed. Parallelization in Python integrates Message Passing Interface via mpi4py module. Since mpi4py does not support parallelization of objects greater than 231 bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and also uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and HPC systems. We validated BigMPI4py on whole genome bisulfite sequencing (WGBS) DNA methylation ENCODE data of 59 samples from 27 human tissues. We categorized them on the three germ layers and developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. We observed a differentiation of the germ layers, and a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores. BigMPI4py is available at https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis

Download Full-text

MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

Scientific Programming ◽

10.1155/2017/3409647 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Anuj Sharma ◽

Irene Moulitsas

Keyword(s):

High Resolution ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Unstructured Meshes ◽

Navier Stokes ◽

Performance Measurements ◽

Partitioned Global Address Space ◽

Computational Fluid Dynamics Cfd ◽

And Performance

High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD). These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI) has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS) parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

Download Full-text

A strong reinforcement parallel implementation of k-means algorithm using message passing interface

Materials Today Proceedings ◽

10.1016/j.matpr.2021.02.032 ◽

2021 ◽

Author(s):

T. Ragunthar ◽

P. Ashok ◽

N. Gopinath ◽

M. Subashini

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation

Download Full-text

Parallel Implementation of the SHYFEM Model

10.5194/gmd-2021-319 ◽

2021 ◽

Author(s):

Giorgio Micaletto ◽

Ivano Barletta ◽

Silvia Mocavero ◽

Ivan Federico ◽

Italo Epicoco ◽

...

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Unstructured Grid ◽

Parallel Implementation ◽

Three Dimensional ◽

Implicit Time ◽

Linear System Of Equations ◽

Time Stepping ◽

Fully Implicit ◽

Sparse Linear System

Abstract. This paper presents the MPI-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art HPC systems. A distributed memory approach was used, based on the message passing interface (MPI). Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark.

Download Full-text

Parallel Computing for Sorting Algorithms

Baghdad Science Journal ◽

10.21123/bsj.11.2.292-302 ◽

2014 ◽

Vol 11 (2) ◽

pp. 292-302

Author(s):

Baghdad Science Journal

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Median Filter ◽

Close Relation ◽

C Programming Language ◽

Sorting Algorithms ◽

C Programming ◽

Ordered Data

The expanding use of multi-processor supercomputers has made a significant impact on the speed and size of many problems. The adaptation of standard Message Passing Interface protocol (MPI) has enabled programmers to write portable and efficient codes across a wide variety of parallel architectures. Sorting is one of the most common operations performed by a computer. Because sorted data are easier to manipulate than randomly ordered data, many algorithms require sorted data. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. In this paper, sequential sorting algorithms, the parallel implementation of many sorting methods in a variety of ways using MPICH.NT.1.2.3 library under C++ programming language and comparisons between the parallel and sequential implementations are presented. Then, these methods are used in the image processing field. It have been built a median filter based on these submitted algorithms. As the parallel platform is unavailable, the time is computed in terms of a number of computations steps and communications steps

Download Full-text

A Robot's Response Acceleration Using the Metric Dimension Problem

10.20944/preprints201911.0194.v1 ◽

2019 ◽

Author(s):

Elsayed Badr ◽

Khalid Aloufi

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Metric Dimension ◽

Complete Problem ◽

Path Graph ◽

Ladder Graph ◽

Current Location ◽

C Programming ◽

Np Complete

Consider a robot that is navigating in a space modeled by a graph, and that wants to know its current location. It can send a signal to determine how far it is from each landmark among a set of fixed landmarks. We study the problem of computing the minimum required number of landmarks, and where they should be placed so that the robot can always determine its location. Since the problem is an NP-complete problem, the robot's responses to the actions are slow. To accelerate this response, we can use the parallel version of this problem. In this work, we introduce a new parallel implementation for determining the metric dimension of a given graph. We run the proposed algorithm on a symmetric multi-processing (SMP) cluster using C programming language and the Message Passing Interface (MPI) library. Finally, we run our implementation on four categories of graphs (the tracks in which the robot moves): a cycle graph Cn, a path graph Pn, a triangular snake graph and a ladder graph Ln. Preliminary computational results indicate that the metric dimension problem is an NP-complete problem and prove the ability of the proposed algorithm to achieve a speedup of 6 for 8 processors.

Download Full-text