Solution of finite element problems using hybrid parallelization with MPI and OpenMP

The Finite Element Method (FEM) is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface) is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.

Download Full-text

Based on Numerical Simulation of High-Performance Parallel Machine Muffler Experimental Calibration

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.1645 ◽

2013 ◽

Vol 718-720 ◽

pp. 1645-1650

Author(s):

Gen Yin Cheng ◽

Sheng Chen Yu ◽

Zhi Yong Wei ◽

Shao Jie Chen ◽

You Cheng

Keyword(s):

Numerical Simulation ◽

Finite Element ◽

Boundary Element ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Parallel Machine ◽

Simulation Software ◽

Experimental Calibration ◽

The Cost

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.

Download Full-text

Model Order Reduction of Large-Scale Finite Element Systems in an MPI Parallelized Environment for Usage in Multibody Simulation

Archive of Mechanical Engineering ◽

10.1515/meceng-2016-0027 ◽

2016 ◽

Vol 63 (4) ◽

pp. 475-494 ◽

Cited By ~ 1

Author(s):

Thomas Volzer ◽

Peter Eberhard

Keyword(s):

Finite Element ◽

Model Reduction ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Block Size ◽

Reduction Process ◽

Element Model ◽

Multibody Simulation ◽

Elastic Bodies

Abstract The use of elastic bodies within a multibody simulation became more and more important within the last years. To include the elastic bodies, described as a finite element model in multibody simulations, the dimension of the system of ordinary differential equations must be reduced by projection. For this purpose, in this work, the modal reduction method, a component mode synthesis based method and a moment-matching method are used. Due to the always increasing size of the non-reduced systems, the calculation of the projection matrix leads to a large demand of computational resources and cannot be done on usual serial computers with available memory. In this paper, the model reduction software Morembs++ is presented using a parallelization concept based on the message passing interface to satisfy the need of memory and reduce the runtime of the model reduction process. Additionally, the behaviour of the Block-Krylov-Schur eigensolver, implemented in the Anasazi package of the Trilinos project, is analysed with regard to the choice of the size of the Krylov base, the block size and the number of blocks. Besides, an iterative solver is considered within the CMS-based method.

Download Full-text

A Hybrid Mpi-Openmp Implementation of an Implicit Finite-Element Code on Parallel Architectures

The International Journal of High Performance Computing Applications ◽

10.1177/109434200201600402 ◽

2002 ◽

Vol 16 (4) ◽

pp. 371-393 ◽

Cited By ~ 31

Author(s):

G. Mahinthakumar ◽

F. Saied

Keyword(s):

Finite Element ◽

Hybrid Model ◽

Message Passing ◽

Message Passing Interface ◽

Hybrid Approach ◽

Parallel Architectures ◽

Finite Element Code ◽

Multiple Threads ◽

Smp Clusters ◽

Performance Results

Summary The hybrid MPI-OpenMP model is a natural parallel programming paradigm for emerging parallel architectures that are based on symmetric multiprocessor (SMP) clusters. This paper presents a hybrid implementation adapted for an implicit finite-element code developed for groundwater transport simulations. The original code was parallelized for distributed memory architectures using MPI (Message Passing Interface) using a domain decomposition strategy. OpenMP directives were then added to the code (a straightforward loop-level implementation) to use multiple threads within each MPI process. To improve the OpenMP performance, several loop modifications were adopted. The parallel performance results are compared for four modern parallel architectures. The results show that for most of the cases tested, the pure MPI approach outperforms the hybrid model. The exceptions to this observation were mainly due to a limitation in the MPI library implementation on one of the architectures. A general conclusion is that while the hybrid model is a promising approach for SMP cluster architectures, at the time of this writing, the payoff may not be justified for converting all existing MPI codes to hybrid codes. However, improvements in OpenMP compilers combined with potential MPI limitations in SMP nodes may make the hybrid approach more attractive for a broader set of applications in the future.

Download Full-text

Coupled Peridynamics Least Square Minimization with Finite Element Method in 3D and Implicit Solutions by Message Passing Interface

Journal of Peridynamics and Nonlocal Modeling ◽

10.1007/s42102-021-00060-3 ◽

2021 ◽

Author(s):

Qibang Liu ◽

X. J. Xin ◽

Jeff Ma

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Message Passing ◽

Message Passing Interface ◽

Least Square ◽

Element Method

Download Full-text

Parallelized Simulation of a Finite Element Method in Many Integrated Core Architecture

Journal of Engineering Materials and Technology ◽

10.1115/1.4035326 ◽

2017 ◽

Vol 139 (2) ◽

Cited By ~ 1

Author(s):

Moonho Tak ◽

Taehyo Park

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Linear Algebra ◽

Message Passing ◽

Message Passing Interface ◽

Domain Decomposition Method ◽

Xeon Phi ◽

Parallel Libraries ◽

Many Integrated Core ◽

Element Method

We investigate a domain decomposition method (DDM) of finite element method (FEM) using Intel's many integrated core (MIC) architecture in order to determine the most effective MIC usage. For this, recently introduced high-scalable parallel method of DDM is first introduced with a detailed procedure. Then, the Intel's Xeon Phi MIC architecture is presented to understand how to apply the parallel algorithm into a multicore architecture. The parallel simulation using the Xeon Phi MIC has an advantage that traditional parallel libraries such as the message passing interface (MPI) and the open multiprocessing (OpenMP) can be used without any additional libraries. We demonstrate the DDM using popular libraries for solving linear algebra such as the linear algebra package (LAPACK) or the basic linear algebra subprograms (BLAS). Moreover, both MPI and OpenMP are used for parallel resolutions of the DDM. Finally, numerical parallel efficiencies are validated by a two-dimensional numerical example.

Download Full-text

A comparison of MPI and co-array FORTRAN for large finite element variably saturated flow simulations

Scalable Computing Practice and Experience ◽

10.12694/scpe.v19i4.1468 ◽

2018 ◽

Vol 19 (4) ◽

pp. 423-432

Author(s):

Fred Thomas Tracy ◽

Thomas C. Oppe ◽

Maureen K. Corcoran

Keyword(s):

Finite Element ◽

Message Passing ◽

Message Passing Interface ◽

Unstructured Mesh ◽

Source Code ◽

Groundwater Modelling ◽

Large Problem ◽

Saturated Flow ◽

Flow Simulations

The purpose of this research is to determine how well co-array FORTRAN (CAF) performs relative to Message Passing Interface (MPI) on unstructured mesh finite element groundwater modelling applications with large problem sizes and core counts. This research used almost 150 million nodes and 300 million 3-D prism elements. Results for both the Cray XE6 and Cray XC30 are given. A comparison of the ghost-node update algorithms with source code provided for both MPI and CAF is also presented.

Download Full-text

MagIC v5.10: a two-dimensional message-passing interface (MPI) distribution for pseudo-spectral magnetohydrodynamics simulations in spherical geometry

Geoscientific Model Development ◽

10.5194/gmd-14-7477-2021 ◽

2021 ◽

Vol 14 (12) ◽

pp. 7477-7495

Author(s):

Rafael Lago ◽

Thomas Gastine ◽

Tilman Dannert ◽

Markus Rampp ◽

Johannes Wicht

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distribution Data ◽

Two Dimensional ◽

Data Layout ◽

Time Step ◽

One Dimensional ◽

Hybrid Parallelization ◽

Dimensional Distribution ◽

Pseudo Spectral

Abstract. We discuss two parallelization schemes for MagIC, an open-source, high-performance, pseudo-spectral code for the numerical solution of the magnetohydrodynamics equations in a rotating spherical shell. MagIC calculates the non-linear terms on a numerical grid in spherical coordinates, while the time step updates are performed on radial grid points with a spherical harmonic representation of the lateral directions. Several transforms are required to switch between the different representations. The established hybrid parallelization of MagIC uses message-passing interface (MPI) distribution in radius and relies on existing fast spherical transforms using OpenMP. Our new two-dimensional MPI decomposition implementation also distributes the latitudes or the azimuthal wavenumbers across the available MPI tasks and compute cores. We discuss several non-trivial algorithmic optimizations and the different data distribution layouts employed by our scheme. In particular, the two-dimensional distribution data layout yields a code that strongly scales well beyond the limit of the current one-dimensional distribution. We also show that the two-dimensional distribution implementation, although not yet fully optimized, can already be faster than the existing finely optimized hybrid parallelization when using many thousands of CPU cores. Our analysis indicates that the two-dimensional distribution variant can be further optimized to also surpass the performance of the one-dimensional distribution for a few thousand cores.

Download Full-text

Some Aspects of Parallel Implementation of the Finite Element Method on Message Passing Architectures

10.21236/ada198731 ◽

1988 ◽

Cited By ~ 2

Author(s):

I. Babuska ◽

H. C. Elman

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Message Passing ◽

Parallel Implementation ◽

The Finite Element Method ◽

Element Method

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Families of quasigroup operations satisfying the generalized distributive law

Discrete Mathematics and Applications ◽

10.1515/dma-2020-0018 ◽

2020 ◽

Vol 30 (3) ◽

pp. 187-202

Author(s):

Sergey V. Polin

Keyword(s):

System Of Equations ◽

Systems Of Equations ◽

The Family ◽

Distributive Law ◽

Elimination Of Variables

AbstractThe previous paper was concerned with systems of equations over a certain family 𝓢 of quasigroups. In that work a method of elimination of an outermost variable from the system of equations was suggested and it was shown that further elimination of variables requires that the family 𝓢 of quasigroups satisfy the generalized distributive law (GDL). In this paper we describe families 𝓢 that satisfy GDL. The results are applied to construct classes of easily solvable systems of equations.

Download Full-text