A Parallelized Generalized Method of Cells Framework for Multiscale Studies of Composite Materials

Linear Equations ◽

Local Analysis ◽

Length Scale ◽

Multi Scale ◽

Multiple Data ◽

Generalized Method Of Cells ◽

Speed Up

Abstract This paper presents a parallelized framework for a multi-scale material analysis method called the generalized method of cells (GMC) model which can be used to effectively homogenize or localize material properties over two different length scales. Parallelization is utlized at two instances: (a) for the solution of the governing linear equations, and (b) for the local analysis of each subcell. The governing linear equation is solved parallely using a parallel form of the Gaussian substitution method, and the subsequent local subcell analysis is performed parallely using a domain decomposition method wherein the lower length scale subcells are equally divided over available processors. The parellization algorithm takes advantage of a single program multiple data (SPMD) distributed memory architecture using the Message Passing Interface (MPI) standard, which permits scaling up of the analysis algorithm to any number of processors on a computing cluster. Results show significant decrease in solution time for the parallelized algorithm compared to serial algorithms, especially for denser microscale meshes. The consequent speed-up in processing time permits the analysis of complex length scale dependent phenomenon, nonlinear analysis, and uncertainty studies with multiscale effects which would otherwise be prohibitively expensive.

A Simulation of Domain Decomposition Method for Smoothed Particle Hydrodynamics

Journal of Engineering Materials and Technology ◽

10.1115/1.4035486 ◽

2017 ◽

Vol 139 (2) ◽

Author(s):

Taehyo Park ◽

Shengjie Li ◽

Mina Lee ◽

Moonho Tak

Keyword(s):

Parallel Computing ◽

Smoothed Particle Hydrodynamics ◽

Message Passing ◽

Computational Domain ◽

Sph Method ◽

Multiple Data ◽

Particle Hydrodynamics ◽

Smoothed Particle

Nowadays, the numerical method has become a very important approach for solving complex problems in engineering and science. Some grid-based methods such as the finite difference method (FDM) and finite element method (FEM) have already been widely applied to various areas; however, they still suffer from inherent difficulties which limit their applications to many problems. Therefore, a strong interest is focused on the meshfree methods such as smoothed particle hydrodynamics (SPH) to simulate fluid flow recently due to the advantages in dealing with some complicated problems. In the SPH method, a great number of particles will be used because the whole domain is represented by a set of arbitrarily distributed particles. To improve the numerical efficiency, parallelization using message-passing interface (MPI) is applied to the problems with the large computational domain. In parallel computing, the whole domain is decomposed by the parallel method for continuity of subdomain boundary under the single instruction multiple data (SIMD) and also based on the procedure of the SPH computations. In this work, a new scheme of parallel computing is employed into the SPH method to analyze SPH particle fluid. In this scheme, the whole domain is decomposed into subdomains under the SIMD process and it composes the boundary conditions to the interface particles which will improve the detection of neighbor particles near the boundary. With the method of parallel computing, the SPH method is to be more flexible and perform better.

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Transportation Systems and Engineering ◽

10.4018/978-1-4666-8473-7.ch016 ◽

2015 ◽

pp. 334-347 ◽

Cited By ~ 2

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

A Parallel Block Predictor-Corrector Method by Python-Based Distributed Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1315 ◽

2012 ◽

Vol 263-266 ◽

pp. 1315-1318

Author(s):

Kun Ming Yu ◽

Ming Gong Lee

Keyword(s):

Differential Equations ◽

Numerical Solution ◽

Numerical Method ◽

Initial Value Problem ◽

Message Passing ◽

Parallel Structure ◽

Initial Value ◽

Speed Up ◽

Predictor Corrector

This paper is to discuss how Python can be used in designing a cluster parallel computation environment in numerical solution of some block predictor-corrector method for ordinary differential equations. In the parallel process, MPI-2(message passing interface) is used as a standard of MPICH2 to communicate between CPUs. The operation of data receiving and sending are operated and controlled by mpi4py which is based on Python. Implementation of a block predictor-corrector numerical method with one and two CPUs respectively is used to test the performance of some initial value problem. Minor speed up is obtained due to small size problems and few CPUs used in the scheme, though the establishment of this scheme by Python is valuable due to very few research has been carried in this kind of parallel structure under Python.

EFFECTIVE NUMERICAL ALGORITHM FOR SIMULATIONS OF BEAM STABILIZATION IN BROAD AREA SEMICONDUCTOR LASERS AND AMPLIFIERS

Mathematical Modelling and Analysis ◽

10.3846/13926292.2014.979453 ◽

2014 ◽

Vol 19 (5) ◽

pp. 627-646 ◽

Cited By ~ 17

Author(s):

Mindaugas Radziunas ◽

Raimondas Čiegis

Keyword(s):

Semiconductor Lasers ◽

Message Passing ◽

Beam Quality ◽

Fourier Method ◽

Wave Model ◽

Sequential Algorithm ◽

Broad Area ◽

Novel Device

A 2 + 1 dimensional PDE traveling wave model describing spatial-lateral dynamics of edge-emitting broad area semiconductor devices is considered. A numerical scheme based on a split-step Fourier method is presented. The domain decomposition method is used to parallelize the sequential algorithm. The parallel algorithm is implemented by using Message Passing Interface system, results of computational experiments are presented and the scalability of the algorithm is analyzed. Simulations of the model equations are used for optimizing of existing devices with respect to the emitted beam quality, as well as for creating and testing of novel device design concepts.

Asynchronous Parallelization of a CFD Solver

Journal of Computational Engineering ◽

10.1155/2015/295393 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Daniel S. Abdi ◽

Girma T. Bitsuamlak

Keyword(s):

Message Passing ◽

Stokes Equations ◽

Asynchronous Communication ◽

Navier Stokes ◽

Navier Stokes Equations ◽

Asynchronous Iterations ◽

Alternative Approach ◽

Asynchronous Methods

A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI), and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.

2003 International Conference on Parallel Processing Workshops, 2003. Proceedings. ◽

Parallel multi-scale computation using the message passing interface

10.1109/icppw.2003.1240371 ◽

2004 ◽

Cited By ~ 2

Author(s):

B. Fox ◽

P. Liu ◽

C. Lu ◽

H.P. Lee

Keyword(s):

Message Passing ◽

Multi Scale

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

F1000Research ◽

10.12688/f1000research.22954.3 ◽

2020 ◽

Vol 9 ◽

pp. 240

Author(s):

Frédéric Jarlier ◽

Nicolas Joly ◽

Nicolas Fedy ◽

Thomas Magalhaes ◽

Leonor Sirotti ◽

...

Keyword(s):

High Throughput ◽

Message Passing ◽

High Performance ◽

High Throughput Sequencing ◽

Genome Structure ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Speed Up ◽

Time To Delivery

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data. Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.

An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017732628 ◽

2017 ◽

Vol 33 (1) ◽

pp. 212-224 ◽

Cited By ~ 5

Author(s):

Vladimir Mironov ◽

Alexander Moskovsky ◽

Michael D’Mello ◽

Yuri Alexeev

Keyword(s):

Message Passing ◽

First Generation ◽

Hot Spot ◽

Direct Consequence ◽

Xeon Phi ◽

Hartree Fock ◽

Self Consistent Field ◽

Speed Up ◽

Electron Repulsion Integrals

The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant ([Formula: see text]×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.

Numerical Solution of 2-D Water Entry Problems Based on a CIP Method and a Parallel Computing Algorithm

Volume 7: Ocean Engineering ◽

10.1115/omae2015-41309 ◽

2015 ◽

Author(s):

Peng Wen ◽

Wei Qiu

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Water Entry ◽

Navier Stokes ◽

Cip Method ◽

Constrained Interpolation ◽

Decomposition Scheme ◽

Computing Algorithm ◽

Speed Up

A constrained interpolation profile (CIP) method has been developed to solve 2-D water entry problems. This paper presents the further development of the numerical method using staggered grids and a parallel computing algorithm. In this work, the multi-phase slamming problems, governed by the Navier-Stokes (N-S) equations, are solved by a CIP-based finite difference method. The interfaces between different phases (solid, water and air) are captured using density functions. A parallel computing algorithm based on the Message Passing Interface (MPI) method and the domain decomposition scheme was implemented to speed up the computations. The effect of decomposition scheme on the solution and the speed-up were studied. Validation studies were carried out for the water entry of various 2-D wedges and a ship section. The predicted slamming force, pressure distribution and free surface elevation are compared with experimental results and other numerical results.

Performance of Parallel Distributed Bat Algorithm using MPI on a PC Cluster

Annals of Emerging Technologies in Computing ◽

10.33166/aetic.2020.01.003 ◽

2020 ◽

Vol 4 (1) ◽

pp. 19-27 ◽

Cited By ~ 1

Author(s):

Fazal Noor ◽

Abdulghani Ibrahim ◽

Mohammed M. AlKhattab

Keyword(s):

Message Passing ◽

Fitness Function ◽

Nonlinear Problems ◽

Bat Algorithm ◽

C Language ◽

Pc Cluster ◽

Elapsed Time ◽

Speed Up ◽

Linear Problems

Optimization algorithms are often used to obtain optimal solutions to complex nonlinear problems and appear in many areas such as control, communication, computation, and others. Bat algorithm is a heuristic optimization algorithm and efficient in obtaining approximate best solutions to non-linear problems. In many situations complex problems involve large amount of computations that may require simulations to run for days or weeks or even years for an algorithm to converge to a solution. In this research, a Parallel Distributed Bat Algorithm (PDBA) is formulated using Message Passing Interface (MPI) in C language code for a PC Cluster. The time complexity of PDBA is determined and presented. The performance in terms of speed-up, efficiency, elapsed time, and number of times fitness function is executed is also presented.