scholarly journals Optimization of lattice Boltzmann simulations on heterogeneous computers

Author(s):  
E Calore ◽  
A Gabbana ◽  
SF Schifano ◽  
R Tripiccione

High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach, in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting overall performances. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper, we consider exactly this problem for a class of applications based on lattice Boltzmann methods, widely used in computational fluid dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit the different parallel and vector options of the various accelerators efficiently, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using, as testbeds, HPC clusters incorporating different accelerators: Intel Xeon Phi many-core processors, NVIDIA GPUs, and AMD GPUs.

2007 ◽  
Vol 18 (04) ◽  
pp. 455-462 ◽  
Author(s):  
MARTIN GEIER ◽  
ANDREAS GREINER ◽  
JAN G. KORVINK

The theory of the lattice Boltzmann automaton is based on a moment transform which is not Galilean invariant. It is explained how the central moments transform, used in the cascaded lattice Boltzmann method, overcomes this problem by choosing the center of mass coordinate system as the frame of reference. Galilean invariance is restored and the form of the kinetic theory is unaffected. Conservation laws are not compromised by the high order polyinomials in the equilibrium distribution arising from the central moment transform. Two sources of instabilities in lattice Boltzmann simulations are discussed: negative numerical viscosity due to insufficient Galilean invariance and aliasing. The cascaded lattice Boltzmann automaton overcomes both problems. It is discussed why aliasing is unavoidable in lattice Boltzmann methods that rely on a single relaxation time. An appendix lists the complete scattering operator of the D2Q9 cascaded lattice Boltzmann automaton.


Author(s):  
Radhika S. Saksena ◽  
Marco D. Mazzeo ◽  
Stefan J. Zasada ◽  
Peter V. Coveney

We present very large-scale rheological studies of self-assembled cubic gyroid liquid crystalline phases in ternary mixtures of oil, water and amphiphilic species performed on petascale supercomputers using the lattice-Boltzmann method. These nanomaterials have found diverse applications in materials science and biotechnology, for example, in photovoltaic devices and protein crystallization. They are increasingly gaining importance as delivery vehicles for active agents in pharmaceuticals, personal care products and food technology. In many of these applications, the self-assembled structures are subject to flows of varying strengths and we endeavour to understand their rheological response with the objective of eventually predicting it under given flow conditions. Computationally, our lattice-Boltzmann simulations of ternary fluids are inherently memory- and data-intensive. Furthermore, our interest in dynamical processes necessitates remote visualization and analysis as well as the associated transfer and storage of terabytes of time-dependent data. These simulations are distributed on a high-performance grid infrastructure using the application hosting environment; we employ a novel parallel in situ visualization approach which is particularly suited for such computations on petascale resources. We present computational and I/O performance benchmarks of our application on three different petascale systems.


2021 ◽  
Author(s):  
Mariza Ferro ◽  
Vinicius P. Klôh ◽  
Matheus Gritz ◽  
Vitor de Sá ◽  
Bruno Schulze

Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0250306
Author(s):  
Jonas Latt ◽  
Christophe Coreixas ◽  
Joël Beny

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project stlbm, which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.


2005 ◽  
Vol 19 (28n29) ◽  
pp. 1515-1518 ◽  
Author(s):  
JINKU WANG ◽  
MORAN WANG ◽  
ZHIXIN LI

The Lattice Boltzmann methods are used to study the mixing enhancements by the electro-osmotic flow in microchannel. Three sets of lattice evolution methods are performed for the fluid flow, for the electrical potential distribution, and for the concentration propagation. The simulation results show that the electro-osmotic flow induces y-directional velocity which enhances the mixing in microchannels. The mixing enhancement is related with the surface zeta potential arrangement and the external electric field strength.


Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 44
Author(s):  
Ivan Girotto ◽  
Sebastiano Fabio Schifano ◽  
Enrico Calore ◽  
Gianluca Di Staso ◽  
Federico Toschi

This paper presents the performance analysis for both the computing performance and the energy efficiency of a Lattice Boltzmann Method (LBM) based application, used to simulate three-dimensional multicomponent turbulent systems on massively parallel architectures for high-performance computing. Extending results reported in previous works, the analysis is meant to demonstrate the impact of using optimized data layouts designed for LBM based applications on high-end computer platforms. A particular focus is given to the Intel Skylake processor and to compare the target architecture with other models of the Intel processor family. We introduce the main motivations of the presented work as well as the relevance of its scientific application. We analyse the measured performances of the implemented data layouts on the Skylake processor while scaling the number of threads per socket. We compare the results obtained on several CPU generations of the Intel processor family and we make an analysis of energy efficiency on the Skylake processor compared with the Intel Xeon Phi processor, finally adding our interpretation of the presented results.


2019 ◽  
Vol 31 (20) ◽  
Author(s):  
Ruo‐Fan Qiu ◽  
Hai‐Ning Wang ◽  
Jian‐Feng Zhu ◽  
Rong‐Qian Chen ◽  
Cheng‐Xiang Zhu ◽  
...  

Author(s):  
Yaser Jararweh ◽  
Moath Jarrah ◽  
Abdelkader Bousselham

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.


Sign in / Sign up

Export Citation Format

Share Document