Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing

Рассматривается проблема эффективного использования ускорителей Xeon Phi при моделировании лазерной плазмы. Приводится анализ особенностей архитектуры Xeon Phi, влияющих на производительность кода при численном моделировании плазмы методом частиц в ячейках. Описывается параллельный программный комплекс PICADOR, оптимизированный ранее для расчетов на ускорителях. Производительность программного комплекса на Xeon Phi в сравнении с CPU исследуется при решении трех вычислительно трудоемких задач. Обсуждается соотношение времени расчета на Xeon Phi и CPU на разных этапах метода частиц в ячейках. Демонстрируется, что в зависимости от особенностей задачи Xeon Phi может как опережать, так и отставать от CPU при выполнении расчетов. An efficient application of computational systems equipped with Intel Xeon Phi coprocessors for the laser-plasma simulation is considered. The features of Xeon Phi architecture that influence the performance of Particle-in-Cell plasma simulation are analyzed. The PICADOR parallel plasma simulation code previously optimized for Xeon Phi is described. Its performance on Xeon Phi compared to CPU is studied on three computationally intensive plasma simulation problems. The ratio of computational time on Xeon Phi to CPU is discussed for the main stages of the Particle-in-Cell method. It is shown that, depending on the features of a physical problem, the use of Xeon Phi can be both advantageous and disadvantageous compared to CPU.

Download Full-text

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201817502009 ◽

2018 ◽

Vol 175 ◽

pp. 02009

Author(s):

Carleton DeTar ◽

Steven Gottlieb ◽

Ruizi Li ◽

Doug Toussaint

Keyword(s):

Conjugate Gradient ◽

Memory Hierarchy ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Code Performance ◽

Recent Developments ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Download Full-text

Performance Evaluation of Scientific Applications on Intel Xeon Phi Knights Landing Clusters

2018 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcs.2018.00063 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ji-Hoon Kang ◽

Oh-Kyoung Kwon ◽

Hoon Ryu ◽

Jinwoo Jeong ◽

Kyunghun Lim

Keyword(s):

Performance Evaluation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Scientific Applications ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Simulating Multiphase Flows in Porous Media Using OpenFOAM on Intel Xeon Phi Knights Landing Processors

Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17 ◽

10.1145/3093338.3093350 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhi Shang ◽

Honggao Liu

Keyword(s):

Porous Media ◽

Multiphase Flows ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Flows In Porous Media ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Long-time simulations with complex code using multiple nodes of Intel Xeon Phi Knights Landing

Journal of Computational and Applied Mathematics ◽

10.1016/j.cam.2017.12.050 ◽

2018 ◽

Vol 337 ◽

pp. 18-36 ◽

Cited By ~ 1

Author(s):

Jonathan S. Graf ◽

Matthias K. Gobbert ◽

Samuel Khuvis

Keyword(s):

Xeon Phi ◽

Intel Xeon Phi ◽

Long Time ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

2017 Fifth International Symposium on Computing and Networking (CANDAR) ◽

10.1109/candar.2017.66 ◽

2017 ◽

Cited By ~ 1

Author(s):

Issaku Kanamori ◽

Hideo Matsufuru

Keyword(s):

Lattice Qcd ◽

Practical Implementation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor

Lecture Notes in Computer Science - High Performance Computing ◽

10.1007/978-3-319-58667-0_8 ◽

2017 ◽

pp. 139-157 ◽

Cited By ~ 5

Author(s):

Josh Tobin ◽

Alexander Breuer ◽

Alexander Heinecke ◽

Charles Yount ◽

Yifeng Cui

Keyword(s):

Xeon Phi ◽

Intel Xeon Phi ◽

Knights Landing ◽

Intel Xeon

Download Full-text

Performance Comparison of Intel Xeon Phi Knights Landing

SIAM Undergraduate Research Online ◽

10.1137/17s015896 ◽

2017 ◽

Vol 10 ◽

Cited By ~ 2

Author(s):

Ishmail Jabbie

Keyword(s):

Performance Comparison ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Knights Landing ◽

Intel Xeon

Download Full-text

A parallel algorithm of Euclidean distance matrix computation for the Intel Xeon Phi Knights Landing many-core processor

Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering ◽

10.14529/cmse180305 ◽

2018 ◽

Vol 7 (3) ◽

Keyword(s):

Parallel Algorithm ◽

Euclidean Distance ◽

Distance Matrix ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Euclidean Distance Matrix ◽

Matrix Computation ◽

Knights Landing ◽

Many Core ◽

Intel Xeon

Download Full-text

DD-αAMG on QPACE 3

EPJ Web of Conferences ◽

10.1051/epjconf/201817502007 ◽

2018 ◽

Vol 175 ◽

pp. 02007 ◽

Cited By ~ 4

Author(s):

Peter Georg ◽

Daniel Richtmann ◽

Tilo Wettig

Keyword(s):

First Generation ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Knights Landing ◽

Speedup Factor ◽

Single Processor ◽

Intel Xeon

We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations.

Download Full-text