Communication Performance Evaluation of the Locally Twisted Cube

2020 ◽  
Vol 31 (02) ◽  
pp. 233-252
Author(s):  
Yuejuan Han ◽  
Lantao You ◽  
Cheng-Kuan Lin ◽  
Jianxi Fan

The topology properties of multi-processors interconnection networks are important to the performance of high performance computers. The hypercube network [Formula: see text] has been proved to be one of the most popular interconnection networks. The [Formula: see text]-dimensional locally twisted cube [Formula: see text] is an important variant of [Formula: see text]. Fault diameter and wide diameter are two communication performance evaluation parameters of a network. Let [Formula: see text]), [Formula: see text] and [Formula: see text] denote the diameter, the [Formula: see text] fault diameter and the wide diameter of [Formula: see text], respectively. In this paper, we prove that [Formula: see text] if [Formula: see text] is an odd integer with [Formula: see text], [Formula: see text] if [Formula: see text] is an even integer with [Formula: see text].

2021 ◽  
Author(s):  
Karthik K ◽  
Sudarson Jena ◽  
Venu Gopal T

Abstract A Multiprocessor is a system with at least two processing units sharing access to memory. The principle goal of utilizing a multiprocessor is to process the undertakings all the while and support the system’s performance. An Interconnection Network interfaces the various handling units and enormously impacts the exhibition of the whole framework. Interconnection Networks, also known as Multi-stage Interconnection Networks, are node-to-node links in which each node may be a single processor or a group of processors. These links transfer information from one processor to the next or from the processor to the memory, allowing the task to be isolated and measured equally. Hypercube systems are a kind of system geography used to interconnect various processors with memory modules and precisely course the information. Hypercube systems comprise of 2n nodes. Any Hypercube can be thought of as a graph with nodes and edges, where a node represents a processing unit and an edge represents a connection between the processors to transmit. Degree, Speed, Node coverage, Connectivity, Diameter, Reliability, Packet loss, Network cost, and so on are some of the different system scales that can be used to measure the performance of Interconnection Networks. A portion of the variations of Hypercube Interconnection Networks include Hypercube Network, Folded Hypercube Network, Multiple Reduced Hypercube Network, Multiply Twisted Cube, Recursive Circulant, Exchanged Crossed Cube Network, Half Hypercube Network, and so forth. This work assesses the performing capability of different variations of Hypercube Interconnection Networks. A group of properties is recognized and a weight metric is structured utilizing the distinguished properties to assess the performance exhibition. Utilizing this weight metric, the performance of considered variations of Hypercube Interconnection Networks is evaluated and summed up to recognize the effective variant. A compact survey of a portion of the variations of Hypercube systems, geographies, execution measurements, and assessment of the presentation are examined in this paper. Degree and Diameter are considered to ascertain the Network cost. On the off chance that Network Cost is considered as the measurement to assess the exhibition, Multiple Reduced Hypercube stands ideal with its lower cost. Notwithstanding it, on the off chance that we think about some other properties/ scales/metrics to assess the performance, any variant other than MRH may show considerably more ideal execution. The considered properties probably won't be ideally adequate to assess the effective performance of Hypercube variations in all respects. On the off chance that a sensibly decent number of properties are utilized to assess the presentation, a proficient variation of Hypercube Interconnection Network can be distinguished for a wide scope of uses. This is the inspiration to do this research work.


2014 ◽  
Vol 936 ◽  
pp. 2307-2312
Author(s):  
He Li

Due to integrated positive features of both hypercube and tori, optical multi-mesh hypercube (OMMH) networks in high-performance computers are regarded as a class of promising optical inter-connection networks. This paper firstly derive that the diagnosability of OMMH under the pessimistic strategy is (2n+6)/(2n+6), which shows that the OMMH possesses strong self-diagnosingability. With the improved cycle decomposition method by Yang in J. Parall. Distrib. Comput. [10], a fast diagnosis algorithm to identify all faulty nodes tailored for OMMH, which runs in O(Nlog2N) time is also proposed, where N is the number of the processors of an OMMH.


2011 ◽  
Vol 291-294 ◽  
pp. 3044-3049
Author(s):  
Hong Bo Liang ◽  
Yi Ping Yao ◽  
Xiao Dong Mu

High performance simulation has great prospect of application in the fields of Materials Science and Engineering. In high performance simulation, high performance computers are used to improve the performance of simulation. As one of the simulation standards, HLA simulation was greatly applied in computer simulation. In HLA simulation domain, many RTIs are designed to support the simulation in LAN/WAN environment. Because of the general TCP/UDP communication mechanism, high simulation performance can’t be achieved by these software on high performance computer. To improve the simulation performance, a customized RTI software for high performance computer and PC hybrid environment is designed. By using of partially hierarchical design on functional distributed architecture, large scale simulation can be supported. An adaptive communication mechanism is proposed, which can adapt communication between different RTI components to shared memory, Infiniband and Ethernet automatically, thus can greatly improve communication performance. In addition, this paper explains the related design in this customized RTI.


Author(s):  
Yan Li ◽  
Jidong Zhai ◽  
Keqin Li

With the development of high performance computers, communication performance is a key factor affecting the performance of HPC applications. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generating communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive. Furthermore, for designers of large-scale parallel computers, it is greatly desired that performance of a parallel application can be predicted at the design phase. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines. In this chapter, we will introduce a novel technique for performing fast communication trace collection for large-scale parallel applications and an automatic performance prediction framework with a trace-driven network simulator.


Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


Author(s):  
Jack Dongarra ◽  
Laura Grigori ◽  
Nicholas J. Higham

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Sign in / Sign up

Export Citation Format

Share Document