scholarly journals Communication Optimization for Multiphase Flow Solver in the Library of OpenFOAM

Water ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 1461 ◽  
Author(s):  
Zhipeng Lin ◽  
Wenjing Yang ◽  
Houcun Zhou ◽  
Xinhai Xu ◽  
Liaoyuan Sun ◽  
...  

Multiphase flow solvers are widely-used applications in OpenFOAM, whose scalability suffers from the costly communication overhead. Therefore, we establish communication-optimized multiphase flow solvers in OpenFOAM. In this paper, we first deliver a scalability bottleneck test on the typical multiphase flow case damBreak and reveal that the Message Passing Interface (MPI) communication in a Multidimensional Universal Limiter for Explicit Solution (MULES) and a Preconditioned Conjugate Gradient (PCG) algorithm is the short slab of multiphase flow solvers. Furthermore, an analysis of the communication behavior is carried out. We find that the redundant communication in MULES and the global synchronization in PCG are the performance limiting factors. Based on the analysis, we propose our communication optimization algorithm. For MULES, we remove the redundant communication and obtain optMULES. For PCG, we import several intermediate variables and rearrange PCG to reduce the global communication. We also overlap the computation of matrix-vector multiply and vector update with the non-blocking computation. The resulting algorithms are respectively referred to as OFPiPePCG and OFRePiPePCG. Extensive experiments show that our proposed method could dramatically increase the parallel scalability and solving speed of multiphase flow solvers in OpenFOAM approximately without the loss of accuracy.

2018 ◽  
Vol 13 ◽  
pp. 174830181879706 ◽  
Author(s):  
Wenpeng Ma ◽  
Xiaodong Hu ◽  
Xiazhen Liu

In this paper we investigate parallel implementations of multibody separation simulation using a hybrid of message passing interface and OpenMP. We propose a mesh block-based overset communication optimization algorithm. After presenting details of local data structures, we present our strategy for parallelizing both the overset mesh assembler and the flow solver by employing message passing interface and OpenMP. Experimental results show that the mesh block-based overset communication optimization algorithm has an advantage in real elapsed time when compared to a process-based implementation. The hybrid version shows that it is suitable for improving the load balance if a large number of CPU cores are used. We report results for a standard multibody separation case.


2005 ◽  
Vol 13 (2) ◽  
pp. 79-91 ◽  
Author(s):  
George A. Gravvanis ◽  
Konstantinos M. Giannoutakis

A new class of normalized explicit approximate inverse matrix techniques, based on normalized approximate factorization procedures, for solving sparse linear systems resulting from the finite difference discretization of partial differential equations in three space variables are introduced. A new parallel normalized explicit preconditioned conjugate gradient square method in conjunction with normalized approximate inverse matrix techniques for solving efficiently sparse linear systems on distributed memory systems, using Message Passing Interface (MPI) communication library, is also presented along with theoretical estimates on speedups and efficiency. The implementation and performance on a distributed memory MIMD machine, using Message Passing Interface (MPI) is also investigated. Applications on characteristic initial/boundary value problems in three dimensions are discussed and numerical results are given.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Yanyan Sheng ◽  
William S. Welling ◽  
Michelle M. Zhu

Item response theory (IRT) is a popular approach used for addressing large-scale statistical problems in psychometrics as well as in other fields. The fully Bayesian approach for estimating IRT models is usually memory and computationally expensive due to the large number of iterations. This limits the use of the procedure in many applications. In an effort to overcome such restraint, previous studies focused on utilizing the message passing interface (MPI) in a distributed memory-based Linux cluster to achieve certain speedups. However, given the high data dependencies in a single Markov chain for IRT models, the communication overhead rapidly grows as the number of cluster nodes increases. This makes it difficult to further improve the performance under such a parallel framework. This study aims to tackle the problem using massive core-based graphic processing units (GPU), which is practical, cost-effective, and convenient in actual applications. The performance comparisons among serial CPU, MPI, and compute unified device architecture (CUDA) programs demonstrate that the CUDA GPU approach has many advantages over the CPU-based approach and therefore is preferred.


2015 ◽  
Vol 8 (7) ◽  
pp. 5983-6019
Author(s):  
D. A. Belikov ◽  
S. Maksyutov ◽  
A. Yaremchuk ◽  
A. Ganshin ◽  
T. Kaminski ◽  
...  

Abstract. We present the development of the Adjoint of the Global Eulerian–Lagrangian Coupled Atmospheric (A-GELCA) model that consists of the National Institute for Environmental Studies (NIES) model as an Eulerian three-dimensional transport model (TM), and FLEXPART (FLEXible PARTicle dispersion model) as the Lagrangian plume diffusion model (LPDM). The tangent and adjoint components of the Eulerian model were constructed directly from the original NIES TM code using an automatic differentiation tool known as TAF (Transformation of Algorithms in Fortran; http://www.FastOpt.com), with additional manual pre- and post-processing aimed at improving the performance of the computing, including MPI (Message Passing Interface). As results, the adjoint of Eulerian model is discrete. Construction of the adjoint of the Lagrangian component did not require any code modification, as LPDMs are able to track a significant number of particles back in time and thereby calculate the sensitivity of observations to the neighboring emissions areas. Eulerian and Lagrangian adjoint components were coupled at the time boundary in the global domain.The results are verified using a series of test experiments. The forward simulation shown the coupled model is effective in reproducing the seasonal cycle and short-term variability of CO2 even in the case of multiple limiting factors, such as high uncertainty of fluxes and the low resolution of the Eulerian model. The adjoint model demonstrates the high accuracy compared to direct forward sensitivity calculations and fast performance. The developed adjoint of the coupled model combines the flux conservation and stability of an Eulerian discrete adjoint formulation with the flexibility, accuracy, and high resolution of a Lagrangian backward trajectory formulation.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2284
Author(s):  
Krzysztof Przystupa ◽  
Mykola Beshley ◽  
Olena Hordiichuk-Bublivska ◽  
Marian Kyryk ◽  
Halyna Beshley ◽  
...  

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

1995 ◽  
Vol 05 (04) ◽  
pp. 575-586
Author(s):  
BEN LEE ◽  
ALI R. HURSON

The issue of scalability is key to the success of massively parallel processing. Due to their distributed nature, message-passing multicomputers are appropriate for achieving scalar performance. However, the message-passing model lacks programmability due to difficulties encountered by the programmers to partition and schedule the computation over the processors and to establish efficient inter-processor communication in the user code. Therefore, this paper presents a compile-time scheduling heuristic, called BLS, that maps programs onto the processors of a message-passing multicomputer. In contrast to other methods proposed, BLS takes a more global approach in attempt to balance the tradeoff between exploiting parallelism and reducing communication overhead. To evaluate the effectiveness of BLS, simulation studies of scheduling SISAL programs are presented.


2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


Sign in / Sign up

Export Citation Format

Share Document