Thermal and Mechanical Design of the Fastest Supercomputer of the World in Cognitive Systems: IBM POWER AC 922

Author(s):  
Anil Yuksel ◽  
Vic Mahaney ◽  
Chris Marroquin ◽  
Shurong Tian ◽  
Mark Hoffmeyer ◽  
...  

Abstract High performance computing (HPC), artificial intelligence (AI) and cognitive systems have initiated a new era of computing. Efficient thermal management technologies of these systems have been vital due to the increasing power density in the electronic components. In 2018 IBM delivered the fastest supercomputer of the world through Summit with 200 petaflops computing performance with LINPACK benchmarks. The system is both air and water cooled, where water is employed to cool the high power dissipated electronic components which are the IBM POWER9 processors and NVIDIA GPUs. In this paper, we highlight the overview of the thermal and mechanical design strategies applied on these systems. In air cooled systems, we discuss the fan and heat sink designs, as well as the preheating effect on PCI section. Liquid cooled system has a unique coldplate design which cool the processors and the GPUs with water. We examine the water flow path design for the processor and the GPUs by providing the thermal performance of the coldplate. Also, an overview of the cooling assemblies such as TIMs and air baffles in the servers are discussed. Moreover, unit and rack manifolds are investigated; flow and pressure distribution at the node and rack level are provided.

2020 ◽  
Vol 143 (1) ◽  
Author(s):  
Anil Yuksel ◽  
Vic Mahaney ◽  
Chris Marroquin ◽  
Shurong Tian ◽  
Mark Hoffmeyer ◽  
...  

Abstract A new era of computing has begun with the development of high-performance computing (HPC), artificial intelligence (AI), machine learning (ML), and cognitive systems. Dramatic increases in the power density of the electronic components have led to the design and architecture of efficient thermal management technologies on these systems. IBM designed and delivered in 2018 the most powerful and fastest supercomputers of the world known as Summit and Sierra having 200 petaflops peak computing performance through LINPACK benchmarks. These systems which are called as IBM POWER AC922 are both air and liquid cooled, where water is employed in liquid-cooled systems to cool the high-power electronic components including IBM POWER9 processors and NVIDIA graphics processing units (GPUs). In this paper, we highlight the overview of the thermal and mechanical design strategies applied to these systems. Testing and experimental analysis with comparison to computational modeling is provided. Thermal control strategies are investigated for the optimization of overall system efficiency. In air cooled systems, we discuss the fan and heat sink designs, as well as the preheating effect on the PCIe section. In liquid-cooled systems, which have a unique cold plate design cooling the processors and the GPUs with water, we examine the water flow path design for the central processing units (CPUs), the GPUs, and the thermal performance of the cold plate. An overview of the cooling assemblies such as TIMs and air baffles in these systems is discussed. Unit and rack manifolds and rear door heat exchanger (RDHx) are investigated. Water flow and pressure distribution at the node and rack-level are provided.


2012 ◽  
Vol 4 (4) ◽  
pp. 68-88
Author(s):  
Chao-Tung Yang ◽  
Wen-Feng Hsieh

This paper’s objective is to implement and evaluate a high-performance computing environment by clustering idle PCs (personal computers) with diskless slave nodes on campuses to obtain the effectiveness of the largest computer potency. Two sets of Cluster platforms, BCCD and DRBL, are used to compare computing performance. It’s to prove that DRBL has better performance than BCCD in this experiment. Originally, DRBL was created to facilitate instructions for a Free Software Teaching platform. In order to achieve the purpose, DRBL is applied to the computer classroom with 32 PCs so to enable PCs to be switched manually or automatically among different OS (operating systems). The bioinformatics program, mpiBLAST, is executed smoothly in the Cluster architecture as well. From management’s view, the state of each Computation Node in Clusters is monitored by “Ganglia”, an existing Open Source. The authors gather the relevant information of CPU, Memory, and Network Load for each Computation Node in every network section. Through comparing aspects of performance, including performance of Swap and different network environment, they attempted to find out the best Cluster environment in a computer classroom at the school. Finally, HPL of HPCC is used to demonstrate cluster performance.


Computation ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 20 ◽  
Author(s):  
Enrico Calore ◽  
Alessandro Gabbana ◽  
Sebastiano Fabio Schifano ◽  
Raffaele Tripiccione

In the last years, the energy efficiency of HPC systems is increasingly becoming of paramount importance for environmental, technical, and economical reasons. Several projects have investigated the use of different processors and accelerators in the quest of building systems able to achieve high energy efficiency levels for data centers and HPC installations. In this context, Arm CPU architecture has received a lot of attention given its wide use in low-power and energy-limited applications, but server grade processors have appeared on the market just recently. In this study, we targeted the Marvell ThunderX2, one of the latest Arm-based processors developed to fit the requirements of high performance computing applications. Our interest is mainly focused on the assessment in the context of large HPC installations, and thus we evaluated both computing performance and energy efficiency, using the ERT benchmark and two HPC production ready applications. We finally compared the results with other processors commonly used in large parallel systems and highlight the characteristics of applications which could benefit from the ThunderX2 architecture, in terms of both computing performance and energy efficiency. Pursuing this aim, we also describe how ERT has been modified and optimized for ThunderX2, and how to monitor power drain while running applications on this processor.


2014 ◽  
Vol 22 (4) ◽  
pp. 259-260 ◽  
Author(s):  
Siegfried Benkner ◽  
Franz Franchetti ◽  
Hans Michael Gerndt ◽  
Jeffrey K. Hollingsworth

High Performance Computing architectures have become incredibly complex and exploiting their full potential is becoming more and more challenging. As a consequence, automatic performance tuning (autotuning) of HPC applications is of growing interest and many research groups around the world are currently involved. Autotuning is still a rapidly evolving research field with many different approaches being taken. This special issue features selected papers presented at the Dagstuhl seminar on “Automatic Application Tuning for HPC Architectures” in October 2013, which brought together researchers from the areas of autotuning and performance analysis in order to exchange ideas and steer future collaborations.


2020 ◽  
Vol 71 (3) ◽  
pp. 263-267
Author(s):  
М. Serik ◽  
◽  
G. Zh. Yerlanova ◽  

At present, along with the dynamic development of computer technology in the world, the most effective ways of solving problems of practical importance are being considered. High performance computing takes the lead in this. Therefore, the development of modern society is closely related to the training of experienced, modern specialists in the field of information technology. This, in turn, depends on the inclusion of new courses in the curriculum and full coverage of these issues in the content of the taught courses. This article analyzes the courses on high performance computing, taught at experimental bases and abroad, on the basis of this, the topics of the special course and the content recommended for implementation in the educational process are determined. During the training, the competencies of students in high performance computing were identified.


Author(s):  
Peter V Coveney

We introduce a definition of Grid computing which is adhered to throughout this Theme Issue. We compare the evolution of the World Wide Web with current aspirations for Grid computing and indicate areas that need further research and development before a generally usable Grid infrastructure becomes available. We discuss work that has been done in order to make scientific Grid computing a viable proposition, including the building of Grids, middleware developments, computational steering and visualization. We review science that has been enabled by contemporary computational Grids, and associated progress made through the widening availability of high performance computing.


Sign in / Sign up

Export Citation Format

Share Document