Update propagation in distributed memory hierarchy

Author(s):  
M. Bellew ◽  
M. Hsu ◽  
V.-O. Tam
2015 ◽  
Vol 21 (6) ◽  
pp. 714-729 ◽  
Author(s):  
Duy-Quoc Lai ◽  
Behzad Sajadi ◽  
Shan Jiang ◽  
Gopi Meenakshisundaram ◽  
Aditi Majumder

2018 ◽  
Vol 175 ◽  
pp. 02009
Author(s):  
Carleton DeTar ◽  
Steven Gottlieb ◽  
Ruizi Li ◽  
Doug Toussaint

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.


1991 ◽  
Vol 2 (2) ◽  
pp. 45-49 ◽  
Author(s):  
Michele Di Santo ◽  
Giulio Iannello

1995 ◽  
Vol 23 (3) ◽  
pp. 28
Author(s):  
Daniel Tabak
Keyword(s):  

2021 ◽  
Vol 26 ◽  
pp. 1-67
Author(s):  
Patrick Dinklage ◽  
Jonas Ellert ◽  
Johannes Fischer ◽  
Florian Kurpicz ◽  
Marvin Löbel

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.


Sign in / Sign up

Export Citation Format

Share Document