Data-Oriented Language Implementation of the Lattice–Boltzmann Method for Dense and Sparse Geometries

The performance of lattice–Boltzmann solver implementations usually depends mainly on memory access patterns. Achieving high performance requires then complex code which handles careful data placement and ordering of memory transactions. In this work, we analyse the performance of an implementation based on a new approach called the data-oriented language, which allows the combination of complex memory access patterns with simple source code. As a use case, we present and provide the source code of a solver for D2Q9 lattice and show its performance on GTX Titan Xp GPU for dense and sparse geometries up to 40962 nodes. The obtained results are promising, around 1000 lines of code allowed us to achieve performance in the range of 0.6 to 0.7 of maximum theoretical memory bandwidth (over 2.5 and 5.0 GLUPS for double and single precision, respectively) for meshes of sizes above 10242 nodes, which is close to the current state-of-the-art. However, we also observed relatively high and sometimes difficult to predict overheads, especially for sparse data structures. The additional issue was also a rather long compilation, which extended the time of short simulations, and a lack of access to low-level optimisation mechanisms.

Download Full-text

A new approach in modeling of constant temperature boundary condition in thermal lattice-Boltzmann method

International Journal for Numerical Methods in Fluids ◽

10.1002/fld.2748 ◽

2012 ◽

Vol 70 (11) ◽

pp. 1367-1377 ◽

Cited By ~ 3

Author(s):

Mehdi Seddiq ◽

Mehdi Maerefat ◽

Masaud Mirzaei

Keyword(s):

Boundary Condition ◽

Lattice Boltzmann Method ◽

Constant Temperature ◽

Lattice Boltzmann ◽

New Approach ◽

Temperature Boundary ◽

Boltzmann Method ◽

Thermal Lattice Boltzmann ◽

Temperature Boundary Condition

Download Full-text

A new approach using lattice Boltzmann method to simulate fluid structure interaction

Energy Procedia ◽

10.1016/j.egypro.2017.11.241 ◽

2017 ◽

Vol 139 ◽

pp. 481-486 ◽

Cited By ~ 2

Author(s):

M. Benamour ◽

E. Liberge ◽

C. Beghein

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Fluid Structure Interaction ◽

New Approach ◽

Fluid Structure ◽

Structure Interaction ◽

Boltzmann Method

Download Full-text

High Performance Computation by Multi-Node GPU Cluster-Tsubame2.0 on the Air Flow in an Urban City Using Lattice Boltzmann Method

International Journal of Aerospace and Lightweight Structures (IJALS) - ◽

10.3850/s2010428612000232 ◽

2012 ◽

Vol 02 (01) ◽

pp. 77 ◽

Cited By ~ 5

Author(s):

Xian Wang ◽

Takayuki Aoki

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

High Performance ◽

Air Flow ◽

Gpu Cluster ◽

Urban City ◽

Boltzmann Method ◽

High Performance Computation

Download Full-text

Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing

Sensors ◽

10.3390/s20102953 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2953

Author(s):

Marcos Baptista Ríos ◽

Roberto Javier López-Sastre ◽

Francisco Javier Acevedo-Rodríguez ◽

Pilar Martín-Martín ◽

Saturnino Maldonado-Bascón

Keyword(s):

Video Processing ◽

State Of The Art ◽

Learning To Rank ◽

Support Vector ◽

New Approach ◽

Support Vector Classifier ◽

Current State ◽

Video Frames ◽

Unsupervised Approach ◽

Video Sensor

In this work, we introduce an intelligent video sensor for the problem of Action Proposals (AP). AP consists of localizing temporal segments in untrimmed videos that are likely to contain actions. Solving this problem can accelerate several video action understanding tasks, such as detection, retrieval, or indexing. All previous AP approaches are supervised and offline, i.e., they need both the temporal annotations of the datasets during training and access to the whole video to effectively cast the proposals. We propose here a new approach which, unlike the rest of the state-of-the-art models, is unsupervised. This implies that we do not allow it to see any labeled data during learning nor to work with any pre-trained feature on the used dataset. Moreover, our approach also operates in an online manner, which can be beneficial for many real-world applications where the video has to be processed as soon as it arrives at the sensor, e.g., robotics or video monitoring. The core of our method is based on a Support Vector Classifier (SVC) module which produces candidate segments for AP by distinguishing between sets of contiguous video frames. We further propose a mechanism to refine and filter those candidate segments. This filter optimizes a learning-to-rank formulation over the dynamics of the segments. An extensive experimental evaluation is conducted on Thumos’14 and ActivityNet datasets, and, to the best of our knowledge, this work supposes the first unsupervised approach on these main AP benchmarks. Finally, we also provide a thorough comparison to the current state-of-the-art supervised AP approaches. We achieve 41% and 59% of the performance of the best-supervised model on ActivityNet and Thumos’14, respectively, confirming our unsupervised solution as a correct option to tackle the AP problem. The code to reproduce all our results will be publicly released upon acceptance of the paper.

Download Full-text

HOPS: high-performance library for (non-)uniform sampling of convex-constrained models

Bioinformatics ◽

10.1093/bioinformatics/btaa872 ◽

2020 ◽

Author(s):

Johann F Jadebeck ◽

Axel Theorell ◽

Samuel Leweke ◽

Katharina Nöh

Keyword(s):

High Performance ◽

State Of The Art ◽

Source Code ◽

Third Party ◽

Supplementary Information ◽

Scalable Algorithms ◽

Uniform Sampling ◽

Non Uniform Sampling ◽

Constrained Models ◽

Performance Gains

Abstract Summary The C++ library Highly Optimized Polytope Sampling (HOPS) provides implementations of efficient and scalable algorithms for sampling convex-constrained models that are equipped with arbitrary target functions. For uniform sampling, substantial performance gains were achieved compared to the state-of-the-art. The ease of integration and utility of non-uniform sampling is showcased in a Bayesian inference setting, demonstrating how HOPS interoperates with third-party software. Availability and implementation Source code is available at https://github.com/modsim/hops/, tested on Linux and MS Windows, includes unit tests, detailed documentation, example applications and a Dockerfile. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

High-performance computing systems: Status and outlook

Acta Numerica ◽

10.1017/s0962492912000050 ◽

2012 ◽

Vol 21 ◽

pp. 379-474 ◽

Cited By ~ 36

Author(s):

J. J. Dongarra ◽

A. J. van der Steen

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Computing Systems ◽

Future Developments ◽

Steady Growth ◽

Current State ◽

Near Future ◽

Performance Computing ◽

Shed Light

This article describes the current state of the art of high-performance computing systems, and attempts to shed light on near-future developments that might prolong the steady growth in speed of such systems, which has been one of their most remarkable characteristics. We review the different ways devised to speed them up, both with regard to components and their architecture. In addition, we discuss the requirements for software that can take advantage of existing and future architectures.

Download Full-text

ON VECTORIZATION FOR LATTICE BASED SIMULATIONS

International Journal of Modern Physics C ◽

10.1142/s0129183113400111 ◽

2013 ◽

Vol 24 (12) ◽

pp. 1340011 ◽

Cited By ~ 13

Author(s):

ANIRUDDHA G. SHET ◽

K. SIDDHARTH ◽

SHAHAJHAN H. SORATHIYA ◽

ANAND M. DESHPANDE ◽

SUNIL D. SHERLEKAR ◽

...

Keyword(s):

Data Structure ◽

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Performance Optimization ◽

Memory Bandwidth ◽

Bandwidth Utilization ◽

Computing Environment ◽

Optimization Framework ◽

Boltzmann Method ◽

Type Lattice

We present a vector-friendly blocked computing strategy for the lattice Boltzmann method (LBM). This strategy, along with a recently developed data structure, Structure of Arrays of Structures (SoAoS), is implemented for multi-relaxation type lattice Boltzmann (LB). The proposed methodology enables optimal memory bandwidth utilization in the advection step and high compute efficiency in the collision step of LB implementation. In a dense computing environment, current performance optimization framework for LBM is able to achieve high single-core efficiency.

Download Full-text

Parallelization of Lattice Boltzmann method using MPI domain decomposition technology for a drop impact on a wetted solid wall

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962313500244 ◽

2014 ◽

Vol 05 (02) ◽

pp. 1350024 ◽

Cited By ~ 3

Author(s):

Zhi Shang ◽

Ming Cheng ◽

Jing Lou

Keyword(s):

Domain Decomposition ◽

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

High Performance ◽

Message Passing Interface ◽

Solid Wall ◽

Data Communication ◽

Drop Impact ◽

Two Phase ◽

Boltzmann Method

Lattice Boltzmann method (LBM) is a new attractive computational approach for simulating isothermal multi-phase flows in computational fluid dynamics (CFD). It is based on the kinetic theory and easy to be parallelized. This study aims to analyze the performance of parallel LBM programming for the incompressible two-phase flows at high density and viscosity ratio. For this purpose, a liquid drop impact on a wetted wall with a pre-existing thin film of the same liquid is simulated by using the parallel LBM code. During the simulations, the domain decomposition, data communication and parallelization of the LBM code using the message passing interface (MPI) library have been investigated. The computational results show that the parallel LBM code exhibits a good high performance computing (HPC) on the parallel speed-up.

Download Full-text

Analysis of Portfolio-Style Parallel SAT Solving on Current Multi-Core Architectures

10.29007/73n4 ◽

2018 ◽

Author(s):

Martin Aigner ◽

Armin Biere ◽

Christoph Kirsch ◽

Aina Niemetz ◽

Mathias Preiner

Keyword(s):

Experimental Evaluation ◽

State Of The Art ◽

Memory System ◽

Memory Access ◽

Sat Solving ◽

Running Time ◽

Current State ◽

The Individual

Effectively parallelizing SAT solving is an open andimportant issue. The current state-of-the-art isbased on parallel portfolios. This technique relieson running multiple solvers on the same instance inparallel. As soon one instance finishes the entirerun stops. Several succesful systems even use plainparallel portfolio (PPP), where the individual solversdo not exchange any information. This paper containsa thorough experimental evaluation which shows that PPPcan improve wall-clock running time because memory accessis still local, respectively the memory system can hidethe latency of memory access. In particular, there doesnot seem as much cache congestion as one might imagine.We also present some limits on the scalibility of PPP.Thus this paper gives one argument why PPP solvers are agood fit for todays multi-core architectures.

Download Full-text

Challenges on Porting Lattice Boltzmann Method on Accelerators

Advances in Computer and Electrical Engineering - Analysis and Applications of Lattice Boltzmann Simulations ◽

10.4018/978-1-5225-4760-0.ch002 ◽

2018 ◽

pp. 30-53 ◽

Cited By ~ 1

Author(s):

Claudio Schepke ◽

João V. F. Lima ◽

Matheus S. Serpa

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

High Performance ◽

Three Dimensional ◽

Fluid Flows ◽

Xeon Phi ◽

Performance Impact ◽

Operation Rules ◽

Dimensional Version ◽

Boltzmann Method

Currently NVIDIA GPUs and Intel Xeon Phi accelerators are alternatives of computational architectures to provide high performance. This chapter investigates the performance impact of these architectures on the lattice Boltzmann method. This method is an alternative to simulate fluid flows iteratively using discrete representations. It can be adopted for a large number of flows simulations using simple operation rules. In the experiments, it was considered a three-dimensional version of the method, with 19 discrete directions of propagation (D3Q19). Performance evaluation compare three modern GPUs: K20M, K80, and Titan X; and two architectures of Xeon Phi: Knights Corner (KNC) and Knights Landing (KNL). Titan X provides the fastest execution time of all hardware considered. The results show that GPUs offer better processing time for the application. A KNL cache implementation presents the best results for Xeon Phi architectures and the new Xeon Phi (KNL) is two times faster than the previous model (KNC).

Download Full-text