The Case for Message Passing on Many-Core Chips

Abstract To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.

Download Full-text

Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing ◽

10.1109/ccgrid.2015.68 ◽

2015 ◽

Author(s):

Min Si ◽

Pavan Balaji ◽

Yutaka Ishikawa

Keyword(s):

Message Passing ◽

Highly Efficient ◽

Many Core

Download Full-text

Reduced Complexity Many-Core: Timing Predictability Due to Message-Passing

Architecture of Computing Systems - ARCS 2017 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54999-6_11 ◽

2017 ◽

pp. 139-151 ◽

Cited By ~ 7

Author(s):

Jörg Mische ◽

Martin Frieb ◽

Alexander Stegmeier ◽

Theo Ungerer

Keyword(s):

Message Passing ◽

Reduced Complexity ◽

Many Core

Download Full-text

Low Overhead Message Passing for High Performance Many-Core Processors

2013 First International Symposium on Computing and Networking ◽

10.1109/candar.2013.62 ◽

2013 ◽

Cited By ~ 4

Author(s):

Sumeet S. Kumar ◽

Mitzi Tjin A. Djie ◽

Rene Van Leuken

Keyword(s):

Message Passing ◽

High Performance ◽

Many Core

Download Full-text

HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model

Geoscientific Model Development ◽

10.5194/gmd-12-1423-2019 ◽

2019 ◽

Vol 12 (4) ◽

pp. 1423-1441 ◽

Cited By ~ 5

Author(s):

Luca Bertagna ◽

Michael Deakin ◽

Oksana Guba ◽

Daniel Sunderland ◽

Andrew M. Bradley ◽

...

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Earth System Model ◽

Parallel Execution ◽

System Model ◽

Earth System ◽

Dynamical Core ◽

Fortran Implementation ◽

Many Core ◽

Intel Xeon

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

Download Full-text