Update propagation in distributed memory hierarchy

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Download Full-text

A Formal Definition of Logic Topology for All-to-One Reduces in Distributed Memory Parallel Computing

2009 International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2009.241 ◽

2009 ◽

Cited By ~ 1

Author(s):

Yuqing Xiong

Keyword(s):

Parallel Computing ◽

Distributed Memory ◽

Formal Definition ◽

Logic Topology ◽

Definition Of

Download Full-text

Implementing actor-based primitives on distributed-memory architectures

ACM SIGPLAN OOPS Messenger ◽

10.1145/127070.127078 ◽

1991 ◽

Vol 2 (2) ◽

pp. 45-49 ◽

Cited By ~ 1

Author(s):

Michele Di Santo ◽

Giulio Iannello

Keyword(s):

Distributed Memory ◽

Memory Architectures

Download Full-text

Cache and Memory Hierarchy Design

ACM SIGARCH Computer Architecture News ◽

10.1145/203618.564957 ◽

1995 ◽

Vol 23 (3) ◽

pp. 28

Author(s):

Daniel Tabak

Keyword(s):

Memory Hierarchy

Download Full-text

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text