Performance Evaluation of an OpenCL Implementation of the Lattice Boltzmann Method on the Intel Xeon Phi

A portable OpenCL implementation of the lattice Boltzmann method targeting emerging many-core architectures is described. The main purpose of this work is to evaluate and compare the performance of this code on three mainstream hardware architectures available today, namely an Intel CPU, an Nvidia GPU, and the Intel Xeon Phi. Because of the similarities between OpenCL and CUDA, we chose to follow some of the strategies devised to implement efficient lattice Boltzmann solvers on Nvidia GPU, while remaining as generic as possible. Being fairly configurable, this program makes possible to ascertain the best options for each hardware platforms. The achieved performance is quite satisfactory for both the CPU and the GPU. For the Xeon Phi however, the results are below expectations. Nevertheless, comparison with data from the literature shows that on this architecture the code seems memory-bound.

Download Full-text

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201817502009 ◽

2018 ◽

Vol 175 ◽

pp. 02009

Author(s):

Carleton DeTar ◽

Steven Gottlieb ◽

Ruizi Li ◽

Doug Toussaint

Keyword(s):

Conjugate Gradient ◽

Memory Hierarchy ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Code Performance ◽

Recent Developments ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Download Full-text

Improving 3D lattice boltzmann method stencil with asynchronous transfers on many-core processors

2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2017.8280472 ◽

2017 ◽

Cited By ~ 2

Author(s):

Minh Quan Ho ◽

Christian Obrecht ◽

Bernard Tourancheau ◽

Benoit Dupont de Dinechin ◽

Julien Hascoet

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Many Core ◽

Boltzmann Method

Download Full-text

Accurate Computation of Airfoil Flow Based on the Lattice Boltzmann Method

Applied Sciences ◽

10.3390/app9102000 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2000

Author(s):

Liangjun Wang ◽

Xiaoxiao Zhang ◽

Wenhao Zhu ◽

Kangle Xu ◽

Weiguo Wu ◽

...

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Lift Coefficient ◽

Pressure Coefficient ◽

Naca0012 Airfoil ◽

Parallel Performance ◽

Sunway Taihulight ◽

Many Core ◽

Boltzmann Method ◽

Study Designs

The lattice Boltzmann method (LBM) is an important numerical algorithm for computational fluid dynamics. This study designs a two-layer parallel model for the Sunway TaihuLight supercomputer SW26010 many-core processor, which implements LBM algorithms and performs optimization. Numerical experiments with different problem sizes proved that the proposed model has better parallel performance and scalability than before. In this study, we performed numerical simulations of the flows around the two-dimensional (2D) NACA0012 airfoil, and the results of a series of flows around the different angles of attack were obtained. The results of the pressure coefficient and lift coefficient were in good agreement with those in the literature.

Download Full-text

Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor

2014 IEEE International Parallel & Distributed Processing Symposium Workshops ◽

10.1109/ipdpsw.2014.194 ◽

2014 ◽

Cited By ~ 13

Author(s):

Lei Jin ◽

Zhaokang Wang ◽

Rong Gu ◽

Chunfeng Yuan ◽

Yihua Huang

Keyword(s):

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Many Core ◽

Intel Xeon

Download Full-text

Parallelization of Molecular-Dynamics Simulations Using Tasks

MRS Proceedings ◽

10.1557/opl.2015.113 ◽

2015 ◽

Vol 1753 ◽

Cited By ~ 2

Author(s):

Ralf Meyer ◽

Chris M. Mangiardi

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulations ◽

Shared Memory ◽

Md Simulations ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Novel Algorithms ◽

Dynamics Simulations ◽

Many Core ◽

Intel Xeon

ABSTRACTThis article discusses novel algorithms for molecular-dynamics (MD) simulations with short-ranged forces on modern multi- and many-core processors like the Intel Xeon Phi. A task-based approach to the parallelization of MD on shared-memory computers and a tiling scheme to facilitate the SIMD vectorization of the force calculations is described. The algorithms have been tested with three different potentials and the resulting speed-ups on Intel Xeon Phi coprocessors are shown.

Download Full-text

Using Data Compression for Increasing Efficiency of Data Transfer Between Main Memory and Intel Xeon Phi Coprocessor or NVidia GPU in Parallel DBMS

Procedia Computer Science ◽

10.1016/j.procs.2015.11.072 ◽

2015 ◽

Vol 66 ◽

pp. 635-641 ◽

Cited By ~ 1

Author(s):

Konstantin Y. Besedin ◽

Pavel S. Kostenetskiy ◽

Stepan O. Prikazchikov

Keyword(s):

Data Compression ◽

Data Transfer ◽

Main Memory ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Using Data ◽

Nvidia Gpu ◽

Parallel Dbms ◽

Intel Xeon

Download Full-text

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2017.09.005 ◽

2018 ◽

Vol 120 ◽

pp. 395-404 ◽

Cited By ~ 3

Author(s):

Xuntao Cheng ◽

Bingsheng He ◽

Mian Lu ◽

Chiew Tong Lau

Keyword(s):

Query Processing ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Fine Grained ◽

Many Core ◽

Intel Xeon

Download Full-text

A parallel algorithm of Euclidean distance matrix computation for the Intel Xeon Phi Knights Landing many-core processor

Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering ◽

10.14529/cmse180305 ◽

2018 ◽

Vol 7 (3) ◽

Keyword(s):

Parallel Algorithm ◽

Euclidean Distance ◽

Distance Matrix ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Euclidean Distance Matrix ◽

Matrix Computation ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

Download Full-text

Challenges on Porting Lattice Boltzmann Method on Accelerators

Advances in Computer and Electrical Engineering - Analysis and Applications of Lattice Boltzmann Simulations ◽

10.4018/978-1-5225-4760-0.ch002 ◽

2018 ◽

pp. 30-53 ◽

Cited By ~ 1

Author(s):

Claudio Schepke ◽

João V. F. Lima ◽

Matheus S. Serpa

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

High Performance ◽

Three Dimensional ◽

Fluid Flows ◽

Xeon Phi ◽

Performance Impact ◽

Operation Rules ◽

Dimensional Version ◽

Boltzmann Method

Currently NVIDIA GPUs and Intel Xeon Phi accelerators are alternatives of computational architectures to provide high performance. This chapter investigates the performance impact of these architectures on the lattice Boltzmann method. This method is an alternative to simulate fluid flows iteratively using discrete representations. It can be adopted for a large number of flows simulations using simple operation rules. In the experiments, it was considered a three-dimensional version of the method, with 19 discrete directions of propagation (D3Q19). Performance evaluation compare three modern GPUs: K20M, K80, and Titan X; and two architectures of Xeon Phi: Knights Corner (KNC) and Knights Landing (KNL). Titan X provides the fastest execution time of all hardware considered. The results show that GPUs offer better processing time for the application. A KNL cache implementation presents the best results for Xeon Phi architectures and the new Xeon Phi (KNL) is two times faster than the previous model (KNC).

Download Full-text

Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators

2015 IEEE International Conference on Cluster Computing ◽

10.1109/cluster.2015.87 ◽

2015 ◽

Author(s):

Poornima Nookala ◽

Serapheim Dimitropoulos ◽

Karl Stough ◽

Ioan Raicu

Keyword(s):

Xeon Phi ◽

Intel Xeon Phi ◽

Many Core ◽

Intel Xeon

Download Full-text