Data Center Equipment Reliability Concerns—Contamination Issues, Standards Actions, and Case Studies

Author(s):  
Chris Muller ◽  
Chuck Arent ◽  
Henry Yu

Abstract Lead-free manufacturing regulations, reduction in circuit board feature sizes and the miniaturization of components to improve hardware performance have combined to make data center IT equipment more prone to attack by corrosive contaminants. Manufacturers are under pressure to control contamination in the data center environment and maintaining acceptable limits is now critical to the continued reliable operation of datacom and IT equipment. This paper will discuss ongoing reliability issues with electronic equipment in data centers and will present updates on ongoing contamination concerns, standards activities, and case studies from several different locations illustrating the successful application of contamination assessment, control, and monitoring programs to eliminate electronic equipment failures.

Author(s):  
Prabjit Singh ◽  
Levente Klein ◽  
Dereje Agonafer ◽  
Jimil M. Shah ◽  
Kanan D. Pujara

The energy used by information technology (IT) equipment and the supporting data center equipment keeps rising as data center proliferation continues unabated. In order to contain the rising computing costs, data center administrators are resorting to cost cutting measures such as not tightly controlling the temperature and humidity levels and in many cases installing air side economizers with the associated risk of introducing particulate and gaseous contaminations into their data centers. The ASHRAE TC9.9 subcommittee, on Mission Critical Facilities, Data Centers, Technology Spaces, and Electronic Equipment, has accommodated the data center administrators by allowing short period excursions outside the recommended temperature-humidity range, into allowable classes A1-A3. Under worst case conditions, the ASHRAE A3 envelope allows electronic equipment to operate at temperature and humidity as high as 24°C and 85% relative humidity for short, but undefined periods of time. This paper addresses the IT equipment reliability issues arising from operation in high humidity and high temperature conditions, with particular attention paid to the question of whether it is possible to determine the all-encompassing x-factors that can capture the effects of temperature and relative humidity on equipment reliability. The role of particulate and gaseous contamination and the aggravating effects of high temperature and high relative humidity will be presented and discussed. A method to determine the temperature and humidity x-factors, based on testing in experimental data centers located in polluted geographies, will be proposed.


2020 ◽  
Vol 142 (2) ◽  
Author(s):  
Oluwaseun Awe ◽  
Jimil M. Shah ◽  
Dereje Agonafer ◽  
Prabjit Singh ◽  
Naveen Kannan ◽  
...  

Abstract Airside economizers lower the operating cost of data centers by reducing or eliminating mechanical cooling. It, however, increases the risk of reliability degradation of information technology (IT) equipment due to contaminants. IT Equipment manufacturers have tested equipment performance and guarantee the reliability of their equipment in environments within ISA 71.04-2013 severity level G1 and the ASHRAE recommended temperature-relative humidity (RH) envelope. IT Equipment manufacturers require data center operators to meet all the specified conditions consistently before fulfilling warranty on equipment failure. To determine the reliability of electronic hardware in higher severity conditions, field data obtained from real data centers are required. In this study, a corrosion classification coupon experiment as per ISA 71.04-2013 was performed to determine the severity level of a research data center (RDC) located in an industrial area of hot and humid Dallas. The temperature-RH excursions were analyzed based on time series and weather data bin analysis using trend data for the duration of operation. After some period, a failure was recorded on two power distribution units (PDUs) located in the hot aisle. The damaged hardware and other hardware were evaluated, and cumulative corrosion damage study was carried out. The hypothetical estimation of the end of life of components is provided to determine free air-cooling hours for the site. There was no failure of even a single server operated with fresh air-cooling shows that using evaporative/free air cooling is not detrimental to IT equipment reliability. This study, however, must be repeated in other geographical locations to determine if the contamination effect is location dependent.


Author(s):  
Roger Schmidt ◽  
Madhusudan Iyengar

The patented [1] rear door heat exchanger mounted to the rear of IT equipment racks was announced in April, 2005 by IBM and has shown improvements in data center energy efficiency and reducing hot spots. It also allows data center operators to more easily implement some of the features of the newly approved ASHRAE data center recommended data center environmental guidelines [2]. This paper will describe several case studies of implementing the rear door heat exchanger in various data center layouts. The implementation of the water cooled rear door in these data centers will show the effects of various failure modes and how the new ASHRAE environmental temperature guidelines are still being met with the failure modes examined.


Author(s):  
Roger Schmidt ◽  
Madhusudan Iyengar

The heat dissipated by large servers and switching equipment is reaching levels that make it very difficult to cool these systems in data centers or telecommunications rooms. Some of the highest powered systems are dissipating upwards of 4000 watts/ft2(43,000 watts/m2) based on the equipment footprint. When systems dissipate this amount of heat and then are clustered together within a data center significant cooling challenges can result. This paper describes the thermal profile of 3 data center layouts (2 are of the same data center but different points in time with a different layout). Detailed measurements of all three were taken: electronic equipment power usage; perforated floor tile airflow; cable cutout airflow; computer room air conditioning (CRAC) airflow, temperatures and power usage; electronic equipment inlet air temperatures. Although the detailed measurements were recorded this paper will focus at the macro level results of the data center to see if some patterns present themselves that might be helpful for future guidelines of data center layout for optimized cooling. Specifically, areas of the data center where racks have similar inlet air temperatures are examined relative to the rack and CRAC unit layout.


Author(s):  
Roger Schmidt ◽  
Madhusudan Iyengar ◽  
Joe Caricari

With the ever increasing heat dissipated by IT equipment housed in data centers it is becoming more important to project the changes that can occur in the data center as the newer higher powered hardware is installed. The computational fluid dynamics (CFD) software that is available has improved over the years and some CFD software specific to data center thermal analysis has been developed. This has improved the timeliness of providing some quick analysis of the effects of new hardware into the data center. But it is critically important that this software provide a good report to the user of the effects of adding this new hardware. And it is the purpose of this paper to examine a large cluster installation and compare the CFD analysis with environmental measurements obtained from the same site. This paper shows measurements and CFD analysis of high powered racks as high as 27 kW clustered such that heat fluxes in some regions of the data center exceeded 700 Watts/ft2 (7535 W/m2). This paper describes the thermal profile of a high performance computing cluster located in an IBM data center and a comparison of that cluster modeled with CFD software. The high performance Advanced Simulation and Computing (ASC) cluster, developed and manufactured by IBM, is code named ASC Purple. It is the World’s 3rd fastest supercomputer [1], operating at a peak performance of 77.8 TFlop/s. ASC Purple, which employs IBM pSeries p575, Model 9118, contains more than 12,000 processors, 50 terabytes of memory, and 2 petabytes of globally accessible disk space. The cluster was first tested in the IBM development lab in Poughkeepsie, NY and then shipped to Lawrence Livermore National Labs in Livermore, California where it was installed to support our national security mission. Detailed measurements were taken in both data centers of electronic equipment power usage, perforated floor tile airflow, cable cutout airflow, computer room air conditioning (CRAC) airflow, and electronic equipment inlet air temperatures and were report in Schmidt [2], but only the IBM Poughkeepsie results will be reported here along with a comparison to CFD modeling results. In some areas of the Poughkeepsie data center there were regions that did exceed the equipment inlet air temperature specifications by a significant amount. These areas will be highlighted and reasons given on why these areas failed to meet the criteria. The modeling results by region showed trends that compared somewhat favorably but some rack thermal profiles deviated quite significantly from measurements.


Author(s):  
Hongfei Li ◽  
Hendrik F. Hamann

Although in most buildings the spatial allocation of cooling resources can be managed using multiple air handling units and an air ducting system, it can be challenging for an operator to leverage this capability, partially because of the complex interdependencies between the different control options. This is in particular important for data centers, where cooling is a major cost while the sufficient allocation of cooling resources has to ensure the reliable operation of mission-critical information processing equipment. It has been shown that thermal zones can provide valuable decision support for optimizing cooling. Such Thermal zones are generally defined as the region of influence of a particular cooling unit or cooling “source” (such as an air condition unit (ACU)). In this paper we show results using a statistical approach, where we leverage real-time sensor data to obtain thermal zones in realtime. Specifically, we model the correlations between temperatures observed from sensors located at the discharge of an ACU and the other sensors located in the room. Outputs from the statistical solution can be used to optimize the placement of equipment in a data center, investigate failure scenarios, and make sure that a proper cooling solution has been achieved.


Author(s):  
Norman J. Armendariz ◽  
Prawin Paulraj

Abstract The European Union is banning the use of Pb in electronic products starting July 1st, 2006. Printed circuit board assemblies or “motherboards” require that planned CPU sockets and BGA chipsets use lead-free solder ball compositions at the second level interconnections (SLI) to attach to a printed circuit board (PCB) and survive various assembly and reliability test conditions for end-use deployment. Intel is pro-actively preparing for this anticipated Pb ban, by evaluating a new lead free (LF) solder alloy in the ternary Tin- Silver-Copper (Sn4.0Ag0.5Cu) system and developing higher temperature board assembly processes. This will be pursued with a focus on achieving the lowest process temperature required to avoid deleterious higher temperature effects and still achieve a metallurgically compatible solder joint. One primary factor is the elevated peak reflow temperature required for surface mount technology (SMT) LF assembly, which is approximately 250 °C compared to present eutectic tin/lead (Sn37Pb) reflow temperatures of around 220 °C. In addition, extended SMT time-above-liquidus (TAL) and subsequent cooling rates are also a concern not only for the critical BGA chipsets and CPU BGA sockets but to other components similarly attached to the same PCB substrate. PCBs used were conventional FR-4 substrates with organic solder preservative on the copper pads and mechanical daisychanged FCBGA components with direct immersion gold surface finish on their copper pads. However, a materials analysis method and approach is also required to characterize and evaluate the effect of low peak temperature LF SMT processing on the PBA SLI to identify the absolute limits or “cliffs” and determine if the minimum processing temperature and TAL could be further lowered. The SLI system is characterized using various microanalytical techniques, such as, conventional optical microscopy, scanning electron microscopy, energy dispersive spectroscopy and microhardness testing. In addition, the SLI is further characterized using macroanalytical techniques such as dye penetrant testing (DPT) with controlled tensile testing for mechanical strength in addition to disbond and crack area mapping to complete the analysis.


2021 ◽  
Vol 11 (6) ◽  
pp. 2808
Author(s):  
Leandro H. de S. Silva ◽  
Agostinho A. F. Júnior ◽  
George O. A. Azevedo ◽  
Sergio C. Oliveira ◽  
Bruno J. T. Fernandes

The technological growth of the last decades has brought many improvements in daily life, but also concerns on how to deal with electronic waste. Electrical and electronic equipment waste is the fastest-growing rate in the industrialized world. One of the elements of electronic equipment is the printed circuit board (PCB) and almost every electronic equipment has a PCB inside it. While waste PCB (WPCB) recycling may result in the recovery of potentially precious materials and the reuse of some components, it is a challenging task because its composition diversity requires a cautious pre-processing stage to achieve optimal recycling outcomes. Our research focused on proposing a method to evaluate the economic feasibility of recycling integrated circuits (ICs) from WPCB. The proposed method can help decide whether to dismantle a separate WPCB before the physical or mechanical recycling process and consists of estimating the IC area from a WPCB, calculating the IC’s weight using surface density, and estimating how much metal can be recovered by recycling those ICs. To estimate the IC area in a WPCB, we used a state-of-the-art object detection deep learning model (YOLO) and the PCB DSLR image dataset to detect the WPCB’s ICs. Regarding IC detection, the best result was obtained with the partitioned analysis of each image through a sliding window, thus creating new images of smaller dimensions, reaching 86.77% mAP. As a final result, we estimate that the Deep PCB Dataset has a total of 1079.18 g of ICs, from which it would be possible to recover at least 909.94 g of metals and silicon elements from all WPCBs’ ICs. Since there is a high variability in the compositions of WPCBs, it is possible to calculate the gross income for each WPCB and use it as a decision criterion for the type of pre-processing.


Sign in / Sign up

Export Citation Format

Share Document