torus network
Recently Published Documents


TOTAL DOCUMENTS

89
(FIVE YEARS 14)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Antoine Bossard

Modern supercomputers are massively parallel systems: they embody thousands of computing nodes and sometimes several millions. The torus topology has proven very popular for the interconnect of these high-performance systems. Notably, this network topology is employed by the supercomputer ranked number one in the world as of November 2020, the supercomputer Fugaku. Given the high number of compute nodes in such systems, efficient parallel processing is critical to maximise the computing performance. It is well known that cycles harm the parallel processing capacity of systems: for instance, deadlocks and starvations are two notorious issues of parallel computing that are directly linked to the presence of cycles. Hence, network decycling is an important issue, and it has been extensively discussed in the literature. We describe in this paper a decycling algorithm for the 3-dimensional k -ary torus topology and compare it with established results, both theoretically and experimentally. (This paper is a revised version of Antoine Bossard (2020)).


2021 ◽  
Vol 179 ◽  
pp. 590-597
Author(s):  
Maryam Manaa Al-Shammari ◽  
Asrar Haque ◽  
M.M. Hafizur Rahman

2020 ◽  
Vol 20 (6) ◽  
pp. 94-104
Author(s):  
Ivan Lirkov

AbstractPractical realizations of 3D forward/inverse separable discrete transforms, such as Fourier transform, cosine/sine transform, etc. are frequently the principal limiters that prevent many practical applications from scaling to a large number of processors. Existing approaches, which are based primarily on 1D or 2D data decompositions, prevent the 3D transforms from effectively scaling to the maximum (possible/available) number of computer nodes. A highly scalable approach to realize forward/inverse 3D transforms has been proposed. It is based on a 3D decomposition of data and geared towards a torus network of computer nodes. The proposed algorithms requires compute-and-roll time-steps, where each step consists of an execution of multiple GEMM operations and concurrent movement of cubical data blocks between nearest neighbors. The aim of this paper is to present an experimental performance study of an implementation on high performance computer architecture.


Sign in / Sign up

Export Citation Format

Share Document