Polarized routing for large interconnection networks

IEEE Micro ◽  
2022 ◽  
pp. 1-1
Cristobal Camarero ◽  
Carmen Martinez ◽  
Ramon Beivide
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

1983 ◽  
Vol 11 (3) ◽  
pp. 309-315 ◽  
W. Kent Fuchs ◽  
Jacob A. Abraham ◽  
Kuang-Hua Huang

1988 ◽  
Vol 16 (1) ◽  
pp. 114-123 ◽  
N. M. Patel ◽  
P. G. Harrison

Ryota Yasudo ◽  
Koji Nakano ◽  
Michihiro Koibuchi ◽  
Hiroki Matsutani ◽  
Hideharu Amano

1997 ◽  
Vol 08 (03) ◽  
pp. 289-304 ◽  
Marc Baumslag ◽  
Bojana Obrenić

Index-shuffle graphs are introduced as candidate interconnection networks for parallel computers. The comparative advantages of index-shuffle graphs over the standard bounded-degree "approximations" of the hypercube, namely butterfly-like and shuffle-like graphs, are demonstrated in the theoretical framework of graph embedding and network emulations. An N-node index-shuffle graph emulates: • an N-node shuffle-exchange graph with no slowdown, which the currently best emulations of shuffle-like graphs by hypercubes and butterflies incur a slowdown of Ω( log N). • its like-sized butterfly graph with a slowdown O( log log log N), while the currently best emulations of butterfly-like graphs by shuffle-like graphs incur a slowdown of Ω( log log N). • an N-node hypercube that executes an on-line leveled algorithm with a slowdown O( log log N), while the slowdown of currently best such emulations of the hypercube by its bounded-degree shuffle-like and butterfly-like derivatives remains Ω( log N). Our emulation is based on an embedding of an N-node hypercube into an N-node index-shuffle graph with dilation O( log log N), while the currently best embeddings of the hypercube into its bounded-degree shuffle-like and butterfly-like derivatives incur a dilation of Ω( log N).

Sign in / Sign up

Export Citation Format

Share Document