Increasing the energy efficiency of a data center based on machine learning

Author(s):  
Zhen Yang ◽  
Jinhong Du ◽  
Yiting Lin ◽  
Zhen Du ◽  
Li Xia ◽  
...  
Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4378
Author(s):  
Anastasiia Grishina ◽  
Marta Chinnici ◽  
Ah-Lian Kor ◽  
Eric Rondeau ◽  
Jean-Philippe Georges

The energy efficiency of Data Center (DC) operations heavily relies on a DC ambient temperature as well as its IT and cooling systems performance. A reliable and efficient cooling system is necessary to produce a persistent flow of cold air to cool servers that are subjected to constantly increasing computational load due to the advent of smart cloud-based applications. Consequently, the increased demand for computing power will inadvertently increase server waste heat creation in data centers. To improve a DC thermal profile which could undeniably influence energy efficiency and reliability of IT equipment, it is imperative to explore the thermal characteristics analysis of an IT room. This work encompasses the employment of an unsupervised machine learning technique for uncovering weaknesses of a DC cooling system based on real DC monitoring thermal data. The findings of the analysis result in the identification of areas for thermal management and cooling improvement that further feeds into DC recommendations. With the aim to identify overheated zones in a DC IT room and corresponding servers, we applied analyzed thermal characteristics of the IT room. Experimental dataset includes measurements of ambient air temperature in the hot aisle of the IT room in ENEA Portici research center hosting the CRESCO6 computing cluster. We use machine learning clustering techniques to identify overheated locations and categorize computing nodes based on surrounding air temperature ranges abstracted from the data. This work employs the principles and approaches replicable for the analysis of thermal characteristics of any DC, thereby fostering transferability. This paper demonstrates how best practices and guidelines could be applied for thermal analysis and profiling of a commercial DC based on real thermal monitoring data.


Author(s):  
marta chinnici ◽  
Anastasiia GRISHIna ◽  
Ah-Lian KOR ◽  
Eric Rondeau ◽  
jean philippe georges

Energy efficiency of Data Center (DC) operations heavily relies on IT and cooling systems performance. A reliable and efficient cooling system is necessary to produce a persistent flow of cold air to cool servers that are subjected to constantly increasing computational load due to the advent of IoT- enabled smart systems. Consequently, increased demand for computing power will bring about increased waste heat dissipation in data centers. In order to bring about a DC energy efficiency, it is imperative to explore the thermal characteristics analysis of an IT room (due to waste heat). This work encompasses the employment of an unsupervised machine learning modelling technique for uncovering weaknesses of the DC cooling system based on real DC monitoring thermal data. The findings of the analysis result in the identification of areas for energy efficiency improvement that will feed into DC recommendations. The methodology employed for this research includes statistical analysis of IT room thermal characteristics, and the identification of individual servers that frequently occur in the hotspot zones. A critical analysis has been conducted on available big dataset of ambient air temperature in the hot aisle of ENEA Portici CRESCO6 computing cluster. Clustering techniques have been used for hotspots localization as well as categorization of nodes based on surrounding air temperature ranges. The principles and approaches covered in this work are replicable for energy efficiency evaluation of any DC and thus, foster transferability. This work showcases applicability of best practices and guidelines in the context of a real commercial DC that transcends the set of existing metrics for DC energy efficiency assessment.


2018 ◽  
Author(s):  
Tao Wang ◽  
Yuhua Li ◽  
Huan Liu ◽  
Lei Zhang ◽  
Yuyan Jiang ◽  
...  

Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


2018 ◽  
Vol 8 (4) ◽  
pp. 34 ◽  
Author(s):  
Vishal Saxena ◽  
Xinyu Wu ◽  
Ira Srivastava ◽  
Kehan Zhu

The ongoing revolution in Deep Learning is redefining the nature of computing that is driven by the increasing amount of pattern classification and cognitive tasks. Specialized digital hardware for deep learning still holds its predominance due to the flexibility offered by the software implementation and maturity of algorithms. However, it is being increasingly desired that cognitive computing occurs at the edge, i.e., on hand-held devices that are energy constrained, which is energy prohibitive when employing digital von Neumann architectures. Recent explorations in digital neuromorphic hardware have shown promise, but offer low neurosynaptic density needed for scaling to applications such as intelligent cognitive assistants (ICA). Large-scale integration of nanoscale emerging memory devices with Complementary Metal Oxide Semiconductor (CMOS) mixed-signal integrated circuits can herald a new generation of Neuromorphic computers that will transcend the von Neumann bottleneck for cognitive computing tasks. Such hybrid Neuromorphic System-on-a-chip (NeuSoC) architectures promise machine learning capability at chip-scale form factor, and several orders of magnitude improvement in energy efficiency. Practical demonstration of such architectures has been limited as performance of emerging memory devices falls short of the expected behavior from the idealized memristor-based analog synapses, or weights, and novel machine learning algorithms are needed to take advantage of the device behavior. In this article, we review the challenges involved and present a pathway to realize large-scale mixed-signal NeuSoCs, from device arrays and circuits to spike-based deep learning algorithms with ‘brain-like’ energy-efficiency.


2020 ◽  
Vol 48 (2) ◽  
pp. 21-23
Author(s):  
Boudewijn R. Haverkort ◽  
Felix Finkbeiner ◽  
Pieter-Tjerk de Boer

Sign in / Sign up

Export Citation Format

Share Document