scholarly journals Data-Oriented Language Implementation of the Lattice–Boltzmann Method for Dense and Sparse Geometries

2021 ◽  
Vol 11 (20) ◽  
pp. 9495
Author(s):  
Tadeusz Tomczak

The performance of lattice–Boltzmann solver implementations usually depends mainly on memory access patterns. Achieving high performance requires then complex code which handles careful data placement and ordering of memory transactions. In this work, we analyse the performance of an implementation based on a new approach called the data-oriented language, which allows the combination of complex memory access patterns with simple source code. As a use case, we present and provide the source code of a solver for D2Q9 lattice and show its performance on GTX Titan Xp GPU for dense and sparse geometries up to 40962 nodes. The obtained results are promising, around 1000 lines of code allowed us to achieve performance in the range of 0.6 to 0.7 of maximum theoretical memory bandwidth (over 2.5 and 5.0 GLUPS for double and single precision, respectively) for meshes of sizes above 10242 nodes, which is close to the current state-of-the-art. However, we also observed relatively high and sometimes difficult to predict overheads, especially for sparse data structures. The additional issue was also a rather long compilation, which extended the time of short simulations, and a lack of access to low-level optimisation mechanisms.

Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2953
Author(s):  
Marcos Baptista Ríos ◽  
Roberto Javier López-Sastre ◽  
Francisco Javier Acevedo-Rodríguez ◽  
Pilar Martín-Martín ◽  
Saturnino Maldonado-Bascón

In this work, we introduce an intelligent video sensor for the problem of Action Proposals (AP). AP consists of localizing temporal segments in untrimmed videos that are likely to contain actions. Solving this problem can accelerate several video action understanding tasks, such as detection, retrieval, or indexing. All previous AP approaches are supervised and offline, i.e., they need both the temporal annotations of the datasets during training and access to the whole video to effectively cast the proposals. We propose here a new approach which, unlike the rest of the state-of-the-art models, is unsupervised. This implies that we do not allow it to see any labeled data during learning nor to work with any pre-trained feature on the used dataset. Moreover, our approach also operates in an online manner, which can be beneficial for many real-world applications where the video has to be processed as soon as it arrives at the sensor, e.g., robotics or video monitoring. The core of our method is based on a Support Vector Classifier (SVC) module which produces candidate segments for AP by distinguishing between sets of contiguous video frames. We further propose a mechanism to refine and filter those candidate segments. This filter optimizes a learning-to-rank formulation over the dynamics of the segments. An extensive experimental evaluation is conducted on Thumos’14 and ActivityNet datasets, and, to the best of our knowledge, this work supposes the first unsupervised approach on these main AP benchmarks. Finally, we also provide a thorough comparison to the current state-of-the-art supervised AP approaches. We achieve 41% and 59% of the performance of the best-supervised model on ActivityNet and Thumos’14, respectively, confirming our unsupervised solution as a correct option to tackle the AP problem. The code to reproduce all our results will be publicly released upon acceptance of the paper.


Author(s):  
Johann F Jadebeck ◽  
Axel Theorell ◽  
Samuel Leweke ◽  
Katharina Nöh

Abstract Summary The C++ library Highly Optimized Polytope Sampling (HOPS) provides implementations of efficient and scalable algorithms for sampling convex-constrained models that are equipped with arbitrary target functions. For uniform sampling, substantial performance gains were achieved compared to the state-of-the-art. The ease of integration and utility of non-uniform sampling is showcased in a Bayesian inference setting, demonstrating how HOPS interoperates with third-party software. Availability and implementation Source code is available at https://github.com/modsim/hops/, tested on Linux and MS Windows, includes unit tests, detailed documentation, example applications and a Dockerfile. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Acta Numerica ◽  
2012 ◽  
Vol 21 ◽  
pp. 379-474 ◽  
Author(s):  
J. J. Dongarra ◽  
A. J. van der Steen

This article describes the current state of the art of high-performance computing systems, and attempts to shed light on near-future developments that might prolong the steady growth in speed of such systems, which has been one of their most remarkable characteristics. We review the different ways devised to speed them up, both with regard to components and their architecture. In addition, we discuss the requirements for software that can take advantage of existing and future architectures.


2013 ◽  
Vol 24 (12) ◽  
pp. 1340011 ◽  
Author(s):  
ANIRUDDHA G. SHET ◽  
K. SIDDHARTH ◽  
SHAHAJHAN H. SORATHIYA ◽  
ANAND M. DESHPANDE ◽  
SUNIL D. SHERLEKAR ◽  
...  

We present a vector-friendly blocked computing strategy for the lattice Boltzmann method (LBM). This strategy, along with a recently developed data structure, Structure of Arrays of Structures (SoAoS), is implemented for multi-relaxation type lattice Boltzmann (LB). The proposed methodology enables optimal memory bandwidth utilization in the advection step and high compute efficiency in the collision step of LB implementation. In a dense computing environment, current performance optimization framework for LBM is able to achieve high single-core efficiency.


Author(s):  
Zhi Shang ◽  
Ming Cheng ◽  
Jing Lou

Lattice Boltzmann method (LBM) is a new attractive computational approach for simulating isothermal multi-phase flows in computational fluid dynamics (CFD). It is based on the kinetic theory and easy to be parallelized. This study aims to analyze the performance of parallel LBM programming for the incompressible two-phase flows at high density and viscosity ratio. For this purpose, a liquid drop impact on a wetted wall with a pre-existing thin film of the same liquid is simulated by using the parallel LBM code. During the simulations, the domain decomposition, data communication and parallelization of the LBM code using the message passing interface (MPI) library have been investigated. The computational results show that the parallel LBM code exhibits a good high performance computing (HPC) on the parallel speed-up.


10.29007/73n4 ◽  
2018 ◽  
Author(s):  
Martin Aigner ◽  
Armin Biere ◽  
Christoph Kirsch ◽  
Aina Niemetz ◽  
Mathias Preiner

Effectively parallelizing SAT solving is an open andimportant issue. The current state-of-the-art isbased on parallel portfolios. This technique relieson running multiple solvers on the same instance inparallel. As soon one instance finishes the entirerun stops. Several succesful systems even use plainparallel portfolio (PPP), where the individual solversdo not exchange any information. This paper containsa thorough experimental evaluation which shows that PPPcan improve wall-clock running time because memory accessis still local, respectively the memory system can hidethe latency of memory access. In particular, there doesnot seem as much cache congestion as one might imagine.We also present some limits on the scalibility of PPP.Thus this paper gives one argument why PPP solvers are agood fit for todays multi-core architectures.


Author(s):  
Claudio Schepke ◽  
João V. F. Lima ◽  
Matheus S. Serpa

Currently NVIDIA GPUs and Intel Xeon Phi accelerators are alternatives of computational architectures to provide high performance. This chapter investigates the performance impact of these architectures on the lattice Boltzmann method. This method is an alternative to simulate fluid flows iteratively using discrete representations. It can be adopted for a large number of flows simulations using simple operation rules. In the experiments, it was considered a three-dimensional version of the method, with 19 discrete directions of propagation (D3Q19). Performance evaluation compare three modern GPUs: K20M, K80, and Titan X; and two architectures of Xeon Phi: Knights Corner (KNC) and Knights Landing (KNL). Titan X provides the fastest execution time of all hardware considered. The results show that GPUs offer better processing time for the application. A KNL cache implementation presents the best results for Xeon Phi architectures and the new Xeon Phi (KNL) is two times faster than the previous model (KNC).


Sign in / Sign up

Export Citation Format

Share Document