scholarly journals Research advances on AI-powered thermal management for data centers

2022 ◽  
Vol 27 (2) ◽  
pp. 303-314
Author(s):  
Hui Liu ◽  
AbdusSalam Aljbri ◽  
Jie Song ◽  
Jinqing Jiang ◽  
Chun Hua
Author(s):  
Tianyi Gao ◽  
James Geer ◽  
Bahgat G. Sammakia ◽  
Russell Tipton ◽  
Mark Seymour

Cooling power constitutes a large portion of the total electrical power consumption in data centers. Approximately 25%∼40% of the electricity used within a production data center is consumed by the cooling system. Improving the cooling energy efficiency has attracted a great deal of research attention. Many strategies have been proposed for cutting the data center energy costs. One of the effective strategies for increasing the cooling efficiency is using dynamic thermal management. Another effective strategy is placing cooling devices (heat exchangers) closer to the source of heat. This is the basic design principle of many hybrid cooling systems and liquid cooling systems for data centers. Dynamic thermal management of data centers is a huge challenge, due to the fact that data centers are operated under complex dynamic conditions, even during normal operating conditions. In addition, hybrid cooling systems for data centers introduce additional localized cooling devices, such as in row cooling units and overhead coolers, which significantly increase the complexity of dynamic thermal management. Therefore, it is of paramount importance to characterize the dynamic responses of data centers under variations from different cooling units, such as cooling air flow rate variations. In this study, a detailed computational analysis of an in row cooler based hybrid cooled data center is conducted using a commercially available computational fluid dynamics (CFD) code. A representative CFD model for a raised floor data center with cold aisle-hot aisle arrangement fashion is developed. The hybrid cooling system is designed using perimeter CRAH units and localized in row cooling units. The CRAH unit supplies centralized cooling air to the under floor plenum, and the cooling air enters the cold aisle through perforated tiles. The in row cooling unit is located on the raised floor between the server racks. It supplies the cooling air directly to the cold aisle, and intakes hot air from the back of the racks (hot aisle). Therefore, two different cooling air sources are supplied to the cold aisle, but the ways they are delivered to the cold aisle are different. Several modeling cases are designed to study the transient effects of variations in the flow rates of the two cooling air sources. The server power and the cooling air flow variation combination scenarios are also modeled and studied. The detailed impacts of each modeling case on the rack inlet air temperature and cold aisle air flow distribution are studied. The results presented in this work provide an understanding of the effects of air flow variations on the thermal performance of data centers. The results and corresponding analysis is used for improving the running efficiency of this type of raised floor hybrid data centers using CRAH and IRC units.


Author(s):  
Amip J. Shah ◽  
Van P. Carey ◽  
Cullen E. Bash ◽  
Chandrakant D. Patel

Data centers today contain more computing and networking equipment than ever before. As a result, a higher amount of cooling is required to maintain facilities within operable temperature ranges. Increasing amounts of resources are spent to achieve thermal control, and tremendous potential benefit lies in the optimization of the cooling process. This paper describes a study performed on data center thermal management systems using the thermodynamic concept of exergy. Specifically, an exergy analysis has been performed on sample data centers in an attempt to identify local and overall inefficiencies within thermal management systems. The development of a model using finite volume analysis has been described, and potential applications to real-world systems have been illustrated. Preliminary results suggest that such an exergy-based analysis can be a useful tool in the design and enhancement of thermal management systems.


Author(s):  
Bahgat Sammakia ◽  
Yogendra Joshi ◽  
Dereje Agonafer ◽  
Emad Samadiani ◽  
Avram Bar-Cohen

Author(s):  
Ratnesh Sharma ◽  
Rocky Shih ◽  
Chandrakant Patel ◽  
John Sontag

Data centers are the computational hub of the next generation. Rise in demand for computing has driven the emergence of high density datacenters. With the advent of high density, mission-critical datacenters, demand for electrical power for compute and cooling has grown. Deployment of a large number of high powered computer systems in very dense configurations in racks within data centers will result in very high power densities at room level. Hosting business and mission-critical applications also demand a high degree of reliability and flexibility. Managing such high power levels in the data center with cost effective reliable cooling solutions is essential to feasibility of pervasive compute infrastructure. Energy consumption of data centers can also be severely increased by over-designed air handling systems and rack layouts that allow the hot and cold air streams to mix. Absence of rack level temperature monitoring has contributed to lack of knowledge of air flow patterns and thermal management issues in conventional data centers. In this paper, we present results from exploratory data analysis (EDA) of rack-level temperature data collected over a period of several months from a conventional production datacenter. Typical datacenters experience surges in power consumption due to rise and fall in compute demand. These surges can be long term, short term or periodic, leading to associated thermal management challenges. Some variations may also be machine-dependent and vary across the datacenter. Yet other thermal perturbations may be localized and momentary. Random variations due to sensor response and calibration, if not identified, may lead to erroneous conclusions and expensive faults. Among other indicators, EDA techniques also reveal relationships among sensors and deployed hardware in space and time. Identification of such patterns can provide significant insight into data center dynamics for future forecasting purposes. Knowledge of such metrics enables energy-efficient thermal management by helping to create strategies for normal operation and disaster recovery for use with techniques like dynamic smart cooling.


Author(s):  
Jimil M. Shah ◽  
Ravya Dandamudi ◽  
Chinmay Bhatt ◽  
Pranavi Rachamreddy ◽  
Pratik Bansode ◽  
...  

Abstract In today’s networking world, utilization of servers and data centers has been increasing significantly. Increasing demand of processing and storage of data causes a corresponding increase in power density of servers. The data center energy efficiency largely depends on thermal management of servers. Currently, air cooling is the most widely used thermal management technology in data centers. However, air cooling has started to reach its limits due to high-powered processors. To overcome these limitations of air cooling in data centers, liquid immersion cooling methods using different dielectric fluids can be a viable option. Thermal shadowing is an effect in which temperature of a cooling medium increases by carrying heat from one source and results in decreasing its heat carrying capacity due to reduction in the temperature difference between the maximum junction temperature of successive heat sink and incoming fluid. Thermal Shadowing is a challenge for both air and low velocity oil flow cooling. In this study, the impact of thermal shadowing in a third-generation open compute server using different dielectric fluids is compared. The heat sink is a critical part for cooling effectiveness at server level. This work also provides an efficient range of heat sinks with computational modelling of third generation open compute server. Optimization of heat sink can allow to cool high-power density servers effectively for single-phase immersion cooling applications. A parametric study is conducted, and significant savings in the volume of a heat sink have been reported.


Author(s):  
Long Phan ◽  
Beichao Hu ◽  
Cheng-Xian Lin

Due to the rapid growth in IT demands over the past few decades, the market for data centers also increases dramatically. However, thermal management remains a big issue in the design of large-scale data centers. Although best practices are deployed to utilize perforated tiles together with the hot and cold aisles configuration to improve the thermal management, thermal hotspots are inevitable in IT racks, which causes equipment failures and signal interruptions. Thermal hotspots in air-cooled data centers are due to many factors such as insufficient cold air supply from the raised-floor plenum, air recirculation from hot aisle into cold aisle, airflow non-uniformity at the perforated tiles, etc. One of the ways to mitigate such issues is to uniformly distribute the cold air by properly controlling the airflow rate through perforated tiles. In this study, a validation study of the tile airflow and the rack airflow rate ratio of 20% is carried out using an adopted tile model. Also, several turbulence models are thoroughly investigated, and recommendations are provided for the most accurate and less time-consuming turbulence model when applying to a single rack model.


Author(s):  
Amip J. Shah ◽  
Van P. Carey ◽  
Cullen E. Bash ◽  
Chandrakant D. Patel

As heat dissipation in data centers rises by orders of magnitude, inefficiencies such as recirculation will have an increasingly significant impact on the thermal manageability and energy efficiency of the cooling infrastructure. For example, prior work has shown that for simple data centers with a single Computer Room Air-Conditioning (CRAC) unit, an operating strategy that fails to account for inefficiencies in the air space can result in suboptimal performance. To enable system-wide optimality, an exergy-based approach to CRAC control has previously been proposed. However, application of such a strategy in a real data center environment is limited by the assumptions inherent to the single-CRAC derivation. This paper addresses these assumptions by modifying the exergy-based approach to account for the additional interactions encountered in a multi-component environment. It is shown that the modified formulation provides the framework necessary to evaluate performance of multi-component data center thermal management systems under widely different operating circumstances.


Author(s):  
Chandrakant Patel ◽  
Ratnesh Sharma ◽  
Cullen Bash ◽  
Sven Graupner

Computing will be pervasive, and enablers of pervasive computing will be data centers housing computing, networking and storage hardware. The data center of tomorrow is envisaged as one containing thousands of single board computing systems deployed in racks. A data center, with 1000 racks, over 30,000 square feet, would require 10 MW of power to power the computing infrastructure. At this power dissipation, an additional 5 MW would be needed by the cooling resources to remove the dissipated heat. At $100/MWh, the cooling alone would cost $4 million per annum for such a data center. The concept of Computing Grid, based on coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations, is emerging as the new paradigm in distributed and pervasive computing for scientific as well as commercial applications. We envision a global network of data centers housing an aggregation of computing, networking and storage hardware. The increased compaction of such devices in data centers has created thermal and energy management issues that inhibit sustainability of such a global infrastructure. In this paper, we propose the framework of Energy Aware Grid that will provide a global utility infrastructure explicitly incorporating energy efficiency and thermal management among data centers. Designed around an energy-aware co-allocator, workload placement decisions will be made across the Grid, based on data center energy efficiency coefficients. The coefficient, evaluated by the data center’s resource allocation manager, is a complex function of the data center thermal management infrastructure and the seasonal and diurnal variations. A detailed procedure for implementation of a test case is provided with an estimate of energy savings to justify the economics. An example workload deployment shown in the paper aspires to seek the most energy efficient data center in the global network of data centers. The locality based energy efficiency in a data center is shown to arise from use of ground coupled loops in cold climates to lower ambient temperature for heat rejection e.g. computing and rejecting heat from a data center at nighttime ambient of 20°C. in New Delhi, India while Phoenix, USA is at 45°C. The efficiency in the cooling system in the data center in New Delhi is derived based on lower lift from evaporator to condenser. Besides the obvious advantage due to external ambient, the paper also incorporates techniques that rate the efficiency arising from internal thermo-fluids behavior of a data center in workload placement decision.


Sign in / Sign up

Export Citation Format

Share Document