scholarly journals A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs

Author(s):  
Josie E. Rodriguez Condia ◽  
Pierpaolo Narducci ◽  
M. Sonza Reorda ◽  
L. Sterpone
2012 ◽  
Vol 15 (3) ◽  
Author(s):  
Diego Montezanti ◽  
Fernando Emmanuel Frati ◽  
Dolores Rexachs ◽  
Emilio Luque ◽  
Marcelo Naiouf ◽  
...  

The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.


Author(s):  
Weihai Sun ◽  
Lemei Han

Machine fault detection has great practical significance. Compared with the detection method that requires external sensors, the detection of machine fault by sound signal does not need to destroy its structure. The current popular audio-based fault detection often needs a lot of learning data and complex learning process, and needs the support of known fault database. The fault detection method based on audio proposed in this paper only needs to ensure that the machine works normally in the first second. Through the correlation coefficient calculation, energy analysis, EMD and other methods to carry out time-frequency analysis of the subsequent collected sound signals, we can detect whether the machine has fault.


TAPPI Journal ◽  
2014 ◽  
Vol 13 (1) ◽  
pp. 33-41
Author(s):  
YVON THARRAULT ◽  
MOULOUD AMAZOUZ

Recovery boilers play a key role in chemical pulp mills. Early detection of defects, such as water leaks, in a recovery boiler is critical to the prevention of explosions, which can occur when water reaches the molten smelt bed of the boiler. Early detection is difficult to achieve because of the complexity and the multitude of recovery boiler operating parameters. Multiple faults can occur in multiple components of the boiler simultaneously, and an efficient and robust fault isolation method is needed. In this paper, we present a new fault detection and isolation scheme for multiple faults. The proposed approach is based on principal component analysis (PCA), a popular fault detection technique. For fault detection, the Mahalanobis distance with an exponentially weighted moving average filter to reduce the false alarm rate is used. This filter is used to adapt the sensitivity of the fault detection scheme versus false alarm rate. For fault isolation, the reconstruction-based contribution is used. To avoid a combinatorial excess of faulty scenarios related to multiple faults, an iterative approach is used. This new method was validated using real data from a pulp and paper mill in Canada. The results demonstrate that the proposed method can effectively detect sensor faults and water leakage.


2019 ◽  
Vol 139 (10) ◽  
pp. 1191-1200 ◽  
Author(s):  
Adamo Santana ◽  
Yu Kawamura ◽  
Kenya Murakami ◽  
Tatsuya Iizaka ◽  
Tetsuro Matsui ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document