Optimization of lattice Boltzmann simulations on heterogeneous computers

High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach, in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting overall performances. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper, we consider exactly this problem for a class of applications based on lattice Boltzmann methods, widely used in computational fluid dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit the different parallel and vector options of the various accelerators efficiently, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using, as testbeds, HPC clusters incorporating different accelerators: Intel Xeon Phi many-core processors, NVIDIA GPUs, and AMD GPUs.

Download Full-text

PROPERTIES OF THE CASCADED LATTICE BOLTZMANN AUTOMATON

International Journal of Modern Physics C ◽

10.1142/s0129183107010681 ◽

2007 ◽

Vol 18 (04) ◽

pp. 455-462 ◽

Cited By ~ 24

Author(s):

MARTIN GEIER ◽

ANDREAS GREINER ◽

JAN G. KORVINK

Keyword(s):

Lattice Boltzmann ◽

Equilibrium Distribution ◽

Center Of Mass ◽

Central Moment ◽

Galilean Invariance ◽

Scattering Operator ◽

Lattice Boltzmann Methods ◽

Numerical Viscosity ◽

Lattice Boltzmann Simulations ◽

Boltzmann Method

The theory of the lattice Boltzmann automaton is based on a moment transform which is not Galilean invariant. It is explained how the central moments transform, used in the cascaded lattice Boltzmann method, overcomes this problem by choosing the center of mass coordinate system as the frame of reference. Galilean invariance is restored and the form of the kinetic theory is unaffected. Conservation laws are not compromised by the high order polyinomials in the equilibrium distribution arising from the central moment transform. Two sources of instabilities in lattice Boltzmann simulations are discussed: negative numerical viscosity due to insufficient Galilean invariance and aliasing. The cascaded lattice Boltzmann automaton overcomes both problems. It is discussed why aliasing is unavoidable in lattice Boltzmann methods that rely on a single relaxation time. An appendix lists the complete scattering operator of the D2Q9 cascaded lattice Boltzmann automaton.

Download Full-text

Petascale lattice-Boltzmann studies of amphiphilic cubic liquid crystalline materials in a globally distributed high-performance computing and visualization environment

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2010.0160 ◽

2010 ◽

Vol 368 (1925) ◽

pp. 3983-3999 ◽

Cited By ~ 1

Author(s):

Radhika S. Saksena ◽

Marco D. Mazzeo ◽

Stefan J. Zasada ◽

Peter V. Coveney

Keyword(s):

Lattice Boltzmann ◽

High Performance ◽

Liquid Crystalline ◽

Large Scale ◽

Materials Science ◽

Protein Crystallization ◽

Ternary Mixtures ◽

Remote Visualization ◽

Lattice Boltzmann Simulations ◽

Self Assembled

We present very large-scale rheological studies of self-assembled cubic gyroid liquid crystalline phases in ternary mixtures of oil, water and amphiphilic species performed on petascale supercomputers using the lattice-Boltzmann method. These nanomaterials have found diverse applications in materials science and biotechnology, for example, in photovoltaic devices and protein crystallization. They are increasingly gaining importance as delivery vehicles for active agents in pharmaceuticals, personal care products and food technology. In many of these applications, the self-assembled structures are subject to flows of varying strengths and we endeavour to understand their rheological response with the objective of eventually predicting it under given flow conditions. Computationally, our lattice-Boltzmann simulations of ternary fluids are inherently memory- and data-intensive. Furthermore, our interest in dynamical processes necessitates remote visualization and analysis as well as the associated transfer and storage of terabytes of time-dependent data. These simulations are distributed on a high-performance grid infrastructure using the application hosting environment; we employ a novel parallel in situ visualization approach which is particularly suited for such computations on petascale resources. We present computational and I/O performance benchmarks of our application on three different petascale systems.

Download Full-text

Predicting Runtime in HPC Environments for an Efficient Use of Computational Resources

10.5753/wscad.2021.18513 ◽

2021 ◽

Author(s):

Mariza Ferro ◽

Vinicius P. Klôh ◽

Matheus Gritz ◽

Vitor de Sá ◽

Bruno Schulze

Keyword(s):

Neural Network ◽

Machine Learning ◽

Linear Regression ◽

Decision Tree ◽

High Performance ◽

Performance Metrics ◽

Scientific Applications ◽

Computing Systems ◽

Computational Resources ◽

Performance Computing

Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.

Download Full-text

Cross-platform programming model for many-core lattice Boltzmann simulations

PLoS ONE ◽

10.1371/journal.pone.0250306 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0250306

Author(s):

Jonas Latt ◽

Christophe Coreixas ◽

Joël Beny

Keyword(s):

Parallel Algorithms ◽

Lattice Boltzmann ◽

Programming Model ◽

Simulation Software ◽

Performance Impact ◽

Lattice Boltzmann Simulations ◽

Cross Platform ◽

Access Patterns ◽

Many Core ◽

Code Annotations

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project stlbm, which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.

Download Full-text

Market-inspired dynamic resource allocation in many-core high performance computing systems

2015 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcsim.2015.7237070 ◽

2015 ◽

Cited By ~ 6

Author(s):

Amit Kumar Singh ◽

Piotr Dziurzanski ◽

Leandro Soares Indrusiak

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

High Performance ◽

Dynamic Resource Allocation ◽

Computing Systems ◽

Dynamic Resource ◽

Many Core ◽

Performance Computing

Download Full-text

LATTICE BOLTZMANN SIMULATIONS OF MIXING ENHANCEMENT BY THE ELECTRO-OSMOTIC FLOW IN MICROCHANNELS

Modern Physics Letters B ◽

10.1142/s0217984905009791 ◽

2005 ◽

Vol 19 (28n29) ◽

pp. 1515-1518 ◽

Cited By ~ 30

Author(s):

JINKU WANG ◽

MORAN WANG ◽

ZHIXIN LI

Keyword(s):

Fluid Flow ◽

Lattice Boltzmann ◽

Potential Distribution ◽

Electrical Potential ◽

Mixing Enhancement ◽

Lattice Boltzmann Methods ◽

Osmotic Flow ◽

Lattice Boltzmann Simulations ◽

Simulation Results ◽

Electro Osmotic Flow

The Lattice Boltzmann methods are used to study the mixing enhancements by the electro-osmotic flow in microchannel. Three sets of lattice evolution methods are performed for the fluid flow, for the electrical potential distribution, and for the concentration propagation. The simulation results show that the electro-osmotic flow induces y-directional velocity which enhances the mixing in microchannels. The mixing enhancement is related with the surface zeta potential arrangement and the external electric field strength.

Download Full-text

Performance and Energy Assessment of a Lattice Boltzmann Method Based Application on the Skylake Processor

Computation ◽

10.3390/computation8020044 ◽

2020 ◽

Vol 8 (2) ◽

pp. 44

Author(s):

Ivan Girotto ◽

Sebastiano Fabio Schifano ◽

Enrico Calore ◽

Gianluca Di Staso ◽

Federico Toschi

Keyword(s):

Energy Efficiency ◽

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

High Performance ◽

Three Dimensional ◽

Scientific Application ◽

Energy Assessment ◽

Boltzmann Method ◽

The Impact ◽

Data Layouts

This paper presents the performance analysis for both the computing performance and the energy efficiency of a Lattice Boltzmann Method (LBM) based application, used to simulate three-dimensional multicomponent turbulent systems on massively parallel architectures for high-performance computing. Extending results reported in previous works, the analysis is meant to demonstrate the impact of using optimized data layouts designed for LBM based applications on high-end computer platforms. A particular focus is given to the Intel Skylake processor and to compare the target architecture with other models of the Intel processor family. We introduce the main motivations of the presented work as well as the relevance of its scientific application. We analyse the measured performances of the implemented data layouts on the Skylake processor while scaling the number of threads per socket. We compare the results obtained on several CPU generations of the Intel processor family and we make an analysis of energy efficiency on the Skylake processor compared with the Intel Xeon Phi processor, finally adding our interpretation of the presented results.

Download Full-text

Lattice–Boltzmann simulations for complex geometries on high-performance computers

CEAS Aeronautical Journal ◽

10.1007/s13272-020-00450-1 ◽

2020 ◽

Vol 11 (3) ◽

pp. 745-766

Author(s):

Andreas Lintermann ◽

Wolfgang Schröder

Keyword(s):

Lattice Boltzmann ◽

High Performance ◽

Complex Geometries ◽

Lattice Boltzmann Simulations ◽

High Performance Computers

Download Full-text

Compressible lattice Boltzmann simulations on high‐performance and low‐cost GeForce GPU

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5341 ◽

2019 ◽

Vol 31 (20) ◽

Author(s):

Ruo‐Fan Qiu ◽

Hai‐Ning Wang ◽

Jian‐Feng Zhu ◽

Rong‐Qian Chen ◽

Cheng‐Xiang Zhu ◽

...

Keyword(s):

Lattice Boltzmann ◽

High Performance ◽

Low Cost ◽

Lattice Boltzmann Simulations

Download Full-text

GPU Scaling

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2014100102 ◽

2014 ◽

Vol 9 (4) ◽

pp. 13-23

Author(s):

Yaser Jararweh ◽

Moath Jarrah ◽

Abdelkader Bousselham

Keyword(s):

High Performance Computing ◽

High Performance ◽

Gpu Computing ◽

State Of The Art ◽

Computing Systems ◽

Current State ◽

Viable Solution ◽

Order Of Magnitude ◽

Computational Resources ◽

Performance Computing

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.

Download Full-text