High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems

Author(s):  
Jongsoo Park ◽  
Mikhail Smelyanskiy ◽  
Ulrike Meier Yang ◽  
Dheevatsa Mudigere ◽  
Pradeep Dubey
Computation ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 20 ◽  
Author(s):  
Enrico Calore ◽  
Alessandro Gabbana ◽  
Sebastiano Fabio Schifano ◽  
Raffaele Tripiccione

In the last years, the energy efficiency of HPC systems is increasingly becoming of paramount importance for environmental, technical, and economical reasons. Several projects have investigated the use of different processors and accelerators in the quest of building systems able to achieve high energy efficiency levels for data centers and HPC installations. In this context, Arm CPU architecture has received a lot of attention given its wide use in low-power and energy-limited applications, but server grade processors have appeared on the market just recently. In this study, we targeted the Marvell ThunderX2, one of the latest Arm-based processors developed to fit the requirements of high performance computing applications. Our interest is mainly focused on the assessment in the context of large HPC installations, and thus we evaluated both computing performance and energy efficiency, using the ERT benchmark and two HPC production ready applications. We finally compared the results with other processors commonly used in large parallel systems and highlight the characteristics of applications which could benefit from the ThunderX2 architecture, in terms of both computing performance and energy efficiency. Pursuing this aim, we also describe how ERT has been modified and optimized for ThunderX2, and how to monitor power drain while running applications on this processor.


2004 ◽  
Author(s):  
Hongyu Chen ◽  
Chung-Kuan Cheng ◽  
Nan-chi Chou ◽  
A.B. Kahng

VLSI Design ◽  
1995 ◽  
Vol 2 (4) ◽  
pp. 305-314 ◽  
Author(s):  
Peter W. Thompson ◽  
Julian D. Lewis

High-performance parallel systems demand a high-performance interconnect so that their component parts can exchange data and synchronise efficiently. The interconnect must be cheap, and must also scale well in both performance and cost relative to the system size. In this paper we describe the rationale, architecture and operation of the STC104, the first commercially available, general-purpose interconnect chip. The serial protocols used by the device are described, followed by an overview of the microarchitecture, The operation of the fundamental block is outlined, including the response to error conditions. Chip-wide design issues and design methodology are discussed, and finally various aspects of performance are calculated.


Sign in / Sign up

Export Citation Format

Share Document