Autonomous Detection of Thermal Anomalies in Data Centers

Author(s):  
Manish Marwah ◽  
Ratnesh K. Sharma ◽  
Wilfredo Lugo

In recent years, there has been a significant growth in number, size and power densities of data centers. A significant part of data center power consumption is attributed to the cooling infrastructure, consisting of computer air conditioning units (CRACs), chillers and cooling towers. For energy efficient operation and management of the cooling resources, data centers are beginning to be extensively instrumented with temperature sensors. While this allows cooling actuators, such as CRAC set point temperature, to be dynamically controlled and data centers operated at higher temperatures to save energy, it also increases chances of thermal anomalies. Furthermore, considering that large data centers can contain thousands to tens of thousands of such sensors, it is virtually impossible to manually inspect and analyze the large volumes of dynamic data generated by these sensors, thus necessitating autonomous mechanisms for thermal anomaly detection. Also, in addition to threshold-based detection methods, other mechanisms of anomaly detection are also necessary. In this paper, we describe the commonly occurring thermal anomalies in a data center. Furthermore, we describe — with examples from a production data center — techniques to autonomously detect these anomalies. In particular, we show the usefulness of a principal component analysis (PCA) based methodology to a large temperature sensor network. Specifically, we examine thermal anomalies such as those related to misconfiguration of equipment, blocked vent tiles, faulty sensor and CRAC related anomalies. Furthermore, several of these anomalies normally go undetected since no temperature thresholds are violated. We present examples of the thermal anomalies and their detection from a real data center.

2017 ◽  
Author(s):  
Zhong-Hu Jiao ◽  
Jing Zhao ◽  
Xinjian Shan

Abstract. Detecting thermal anomalies prior to strong earthquakes is a key in understanding and forecasting earthquake activities because of its recognition of thermal radiation-related phenomena in seismic preparation phases. Data from satellite observations serve as a powerful tool in monitoring earthquake preparation areas at a global scale and in a nearly real-time manner. Over the past several decades, many new different data sources have been utilized in this field, and progressive anomaly detection approaches have been developed. This paper dedicatedly reviews the progress and development of pre-seismic thermal anomaly detection technology in this decade. First, precursor parameters, including parameters from the top of the atmosphere, in the atmosphere, and on the Earth’s surface, are discussed. Second, different anomaly detection methods, which are used to extract thermal anomalous signals that probably indicate future seismic events, are presented. Finally, certain critical problems with the current research are highlighted, and new developing trends and perspectives for future work are discussed. The development of Earth observation satellites and anomaly detection algorithms can enrich available information sources, provide advanced tools for multilevel earthquake monitoring and improve short- and medium-term forecasting, which should play a large and growing role in pre-seismic thermal anomaly research.


2020 ◽  
Author(s):  
Arash Karimi Zarchi ◽  
Mohammad Reza Saradjian Maralan

Abstract. The recent scientific studies in the context of earthquake precursors reveal some processes connected to seismic activity including thermal anomaly before earthquakes which is a great help for making a better decision regarding this disastrous phenomenon and reducing its casualty to a minimum. This paper represents a method for grouping the proper input data for different thermal anomaly detection methods using the land surface temperature (LST) mean in multiple distances from the corresponding fault during the 40 days (i.e. 30 days before and 10 days after impending earthquake) of investigation. Six strong earthquakes with Ms > 6 that have occurred in Iran have been investigated in this study. We used two different approaches for detecting thermal anomalies. They are mean-standard deviation method also known as standard method and interquartile method which is similar to the first method but uses different parameters as input. Most of the studies have considered thermal anomalies around the known epicentre locations where the investigation can only be performed after the earthquake. This study is using fault distance-based approach in predicting the earthquake regarding the location of the faults as the potential area. This could be considered as an important step towards actual prediction of earthquake’s time and intensity. Results show that the proposed input data produces less false alarms in each of the thermal anomaly detection methods compared to the ordinary input data making this method much more accurate and stable considering the easy accessibility of thermal data and their less complicated algorithms for processing. In the final step, the detected anomalies are used for estimating earthquake intensity using Artificial Neural Network (ANN). The results show that estimated intensities of most earthquakes are very close to the actual intensities. Since the location of the active faults are known a priori, using fault distance-based approach may be regarded as a superior method in predicting the impending earthquakes for vulnerable faults. In spite of the previous investigations that the studies were only possible aftermath, the fault distance-based approach can be used as a tool for future unknown earthquakes prediction. However, it is recommended to use thermal anomaly detection as an initial process to be jointly used with other precursors to reduce the number of investigations that require more complicated algorithms and data processing.


Author(s):  
Yang Yuan ◽  
Eun Kyung Lee ◽  
Dario Pompili ◽  
Junbi Liao

The high density of servers in datacenters generates a large amount of heat, resulting in the high possibility of thermally anomalous events, i.e. computer room air conditioner fan failure, server fan failure, and workload misconfiguration. As such anomalous events increase the cost of maintaining computing and cooling components, they need to be detected, localized, and classified for taking appropriate remedial actions. In this article, a hierarchical neural network framework is proposed to detect small- (server level) and large-scale (datacenter level) thermal anomalies. This novel framework, which is organized into two tiers, analyzes the data sensed by heterogeneous sensors such as sensors built in the servers and external sensors (Telosb). The proposed solution employs a neural network to learn about (a) the relationship among sensing values (i.e. internal, external, and fan speed) and (b) the relationship between the sensing values and workload information. Then, the bottom tier of our framework detects thermal anomalies, whereas the top tier localizes and classifies them. Our solution outperforms other anomaly-detection methods based on regression model, support vector machine, and self-organizing map, as shown by the experimental results.


2019 ◽  
Vol 9 (16) ◽  
pp. 3223
Author(s):  
Jargalsaikhan Narantuya ◽  
Taejin Ha ◽  
Jaewon Bae ◽  
Hyuk Lim

In data centers, cloud-based services are usually deployed among multiple virtual machines (VMs), and these VMs have data traffic dependencies on each other. However, traffic dependency between VMs has not been fully considered when the services running in the data center are expanded by creating additional VMs. If highly dependent VMs are placed in different physical machines (PMs), the data traffic increases in the underlying physical network of the data center. To reduce the amount of data traffic in the underlying network and improve the service performance, we propose a traffic-dependency-based strategy for VM placement in software-defined data center (SDDC). The traffic dependencies between the VMs are analyzed by principal component analysis, and highly dependent VMs are grouped by gravity-based clustering. Each group of highly dependent VMs is placed within an appropriate PM based on the Hungarian matching method. This strategy of dependency-based VM placement facilitates reducing data traffic volume of the data center, since the highly dependent VMs are placed within the same PM. The results of the performance evaluation in SDDC testbed indicate that the proposed VM placement method efficiently reduces the amount of data traffic in the underlying network and improves the data center performance.


Author(s):  
Amip J. Shah ◽  
Van P. Carey ◽  
Cullen E. Bash ◽  
Chandrakant D. Patel

As heat dissipation in data centers rises by orders of magnitude, inefficiencies such as recirculation will have an increasingly significant impact on the thermal manageability and energy efficiency of the cooling infrastructure. For example, prior work has shown that for simple data centers with a single Computer Room Air-Conditioning (CRAC) unit, an operating strategy that fails to account for inefficiencies in the air space can result in suboptimal performance. To enable system-wide optimality, an exergy-based approach to CRAC control has previously been proposed. However, application of such a strategy in a real data center environment is limited by the assumptions inherent to the single-CRAC derivation. This paper addresses these assumptions by modifying the exergy-based approach to account for the additional interactions encountered in a multi-component environment. It is shown that the modified formulation provides the framework necessary to evaluate performance of multi-component data center thermal management systems under widely different operating circumstances.


Author(s):  
Cullen Bash ◽  
George Forman

Data center costs for computer power and cooling have been steadily increasing over the past decade. Much work has been done in recent years on understanding how to improve the delivery of cooling resources to IT equipment in data centers, but little attention has been paid to the optimization of heat production by considering the placement of application workload. Because certain physical locations inside the data center are more efficient to cool than others, this suggests that allocating heavy computational workloads onto those servers that are in more efficient places might bring substantial savings. This paper explores this issue by introducing a workload placement metric that considers the cooling efficiency of the environment. Additionally, results from a set of experiments that utilize this metric in a thermally isolated portion of a real data center are described. The results show that the potential savings is substantial and that further work in this area is needed to exploit the savings opportunity.


2019 ◽  
Vol 9 (18) ◽  
pp. 3850 ◽  
Author(s):  
Diogo Macedo ◽  
Radu Godina ◽  
Pedro Dinis Gaspar ◽  
Pedro da Silva ◽  
Miguel Trigueiros Covas

In recent years, reducing energy consumption has been relentlessly pursued by researchers and policy makers with the purpose of achieving a more sustainable future. The demand for data storage in data centers has been steadily increasing, leading to an increase in size and therefore to consume more energy. Consequently, the reduction of the energy consumption of data center rooms is required and it is with this perspective that this paper is proposed. By using Computational Fluid Dynamics (CFD), it is possible to model a three-dimensional model of the heat transfer and air flow in data centers, which allows forecasting the air speed and temperature range under diverse conditions of operation. In this paper, a CFD study of the thermal performance and airflow in a real data center processing room with 208 racks under different thermal loads and airflow velocities is proposed. The physical-mathematical model relies on the equations of mass, momentum and energy conservation. The fluid in this study is air and it is modeled as an ideal gas with constant properties. The model of the effect of turbulence is made by employing a k–ε standard model. The results indicate that it is possible to reduce the thermal load of the server racks by improving the thermal performance and airflow of the data center room, without affecting the correct operation of the server racks located in the sensible regions of the room.


Energies ◽  
2020 ◽  
Vol 13 (5) ◽  
pp. 1085
Author(s):  
Syed Naeem Haider ◽  
Qianchuan Zhao ◽  
Xueliang Li

Prediction of a battery’s health in data centers plays a significant role in Battery Management Systems (BMS). Data centers use thousands of batteries, and their lifespan ultimately decreases over time. Predicting battery’s degradation status is very critical, even before the first failure is encountered during its discharge cycle, which also turns out to be a very difficult task in real life. Therefore, a framework to improve Auto-Regressive Integrated Moving Average (ARIMA) accuracy for forecasting battery’s health with clustered predictors is proposed. Clustering approaches, such as Dynamic Time Warping (DTW) or k-shape-based, are beneficial to find patterns in data sets with multiple time series. The aspect of large number of batteries in a data center is used to cluster the voltage patterns, which are further utilized to improve the accuracy of the ARIMA model. Our proposed work shows that the forecasting accuracy of the ARIMA model is significantly improved by applying the results of the clustered predictor for batteries in a real data center. This paper presents the actual historical data of 40 batteries of the large-scale data center for one whole year to validate the effectiveness of the proposed methodology.


Author(s):  
A. Sledz ◽  
C. Heipke

Abstract. Thermal anomaly detection has an important role in remote sensing. One of the most widely used instruments for this task is a Thermal InfraRed (TIR) camera. In this work, thermal anomaly detection is formulated as a salient region detection, which is motivated by the assumption that a hot region often attracts attention of the human eye in thermal infrared images. Using TIR and optical images together, our working hypothesis is defined in the following manner: a hot region that appears as a salient region only in the TIR image and not in the optical image is a thermal anomaly. This work presents a two-step classification method for thermal anomaly detection based on an information fusion of saliency maps derived from both, TIR and optical images. Information fusion, based on the Dempster-Shafer evidence theory, is used in the first phase to find the location of regions suspected to be thermal anomalies. This classification problem is formulated as a multi-class problem and is carried out in an unsupervised manner on a pixel level. In the following phase, classification is formulated as a binary region-based problem in order to differentiate between normal temperature variations and thermal anomalies, while Random Forest (RF) is chosen as the classifier. In the seconds phase, the classification results from the previous phase are used as features along with temperature information and height details, which are obtained from a Digital Surface Model (DSM). We tested the approach using a dataset, which was collected from a UAV with TIR and optical cameras for monitoring District Heating Systems (DHS). Despite some limitations outlined in the paper, the presented innovative method to identify thermal anomalies has achieved up to 98.7 percent overall accuracy.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 3137 ◽  
Author(s):  
Fei Li ◽  
Lei Zhang ◽  
Xiuwei Zhang ◽  
Yanjia Chen ◽  
Dongmei Jiang ◽  
...  

Background modeling has been proven to be a promising method of hyperspectral anomaly detection. However, due to the cluttered imaging scene, modeling the background of an hyperspectral image (HSI) is often challenging. To mitigate this problem, we propose a novel structured background modeling-based hyperspectral anomaly detection method, which clearly improves the detection accuracy through exploiting the block-diagonal structure of the background. Specifically, to conveniently model the multi-mode characteristics of background, we divide the full-band patches in an HSI into different background clusters according to their spatial-spectral features. A spatial-spectral background dictionary is then learned for each cluster with a principal component analysis (PCA) learning scheme. When being represented onto those dictionaries, the background often exhibits a block-diagonal structure, while the anomalous target shows a sparse structure. In light of such an observation, we develop a low-rank representation based anomaly detection framework that can appropriately separate the sparse anomaly from the block-diagonal background. To optimize this framework effectively, we adopt the standard alternating direction method of multipliers (ADMM) algorithm. With extensive experiments on both synthetic and real-world datasets, the proposed method achieves an obvious improvement in detection accuracy, compared with several state-of-the-art hyperspectral anomaly detection methods.


Sign in / Sign up

Export Citation Format

Share Document