scholarly journals Creation of Publicly Available Data Sets for Prognostics and Diagnostics Addressing Data Scenarios Relevant to Industrial Applications

Author(s):  
Fabian Mauthe ◽  
Simon Hagmeyer ◽  
Peter Zeiler

For a successful realization of prognostics and health management (PHM), the availability of sufficient run-to-failure data sets is a crucial factor. The sheer number of given data points holds less importance than the full coverage of the potential state space. However, full coverage is a major challenge in most industrial applications. Among other things, high investment and operating costs as well as the long service life of many technical systems make it difficult to acquire complete run-to-failure data sets. Consequently, in industrial applications data sets with specific deficiencies are frequently encountered. The development of appropriate methods to address such data scenarios is a fundamental research issue. Therefore, the purpose of this paper is to provide facilitation for this research. Accordingly, the paper starts by specifying the value and availability of data in PHM. Subsequently, criteria for characterizing data sets are defined independent of the actual PHM application. The criteria are used to identify typical data scenarios with specific deficiencies that possess significant relevance for industrial applications. Thereafter, the most comprehensive overview of data sets suitable for PHM and currently publicly accessible is provided. Thereby, not all previously identified data scenarios with their specific deficiencies are addressed by at least one data set. A program is established for the aforementioned facilitation of further research. One objective of the program is to create data sets reflecting these data scenarios using a test bench. First, possible applications and their degradation processes to be studied on the test bench are briefly characterized. Thereby, the final decision to select filtration as a test bench application is argued. Subsequently, the test bench created is introduced, including a description of the functional concept, pneumatic layout and components involved, as well as the filter media and test dusts employed. Typical run-to-failure trajectories are illustrated. Thereafter, the data set published under the name Preventive to Predictive Maintenance is presented. Additionally, a schedule for future releases of data sets on further industry-relevant data scenarios is sketched.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


2019 ◽  
Vol 19 (4) ◽  
pp. 2497-2526 ◽  
Author(s):  
Charlotta Högberg ◽  
Stefan Lossow ◽  
Farahnaz Khosrawi ◽  
Ralf Bauer ◽  
Kaley A. Walker ◽  
...  

Abstract. Within the framework of the second SPARC (Stratosphere-troposphere Processes And their Role in Climate) water vapour assessment (WAVAS-II), we evaluated five data sets of δD(H2O) obtained from observations by Odin/SMR (Sub-Millimetre Radiometer), Envisat/MIPAS (Environmental Satellite/Michelson Interferometer for Passive Atmospheric Sounding), and SCISAT/ACE-FTS (Science Satellite/Atmospheric Chemistry Experiment – Fourier Transform Spectrometer) using profile-to-profile and climatological comparisons. These comparisons aimed to provide a comprehensive overview of typical uncertainties in the observational database that could be considered in the future in observational and modelling studies. Our primary focus is on stratospheric altitudes, but results for the upper troposphere and lower mesosphere are also shown. There are clear quantitative differences in the measurements of the isotopic ratio, mainly with regard to comparisons between the SMR data set and both the MIPAS and ACE-FTS data sets. In the lower stratosphere, the SMR data set shows a higher depletion in δD than the MIPAS and ACE-FTS data sets. The differences maximise close to 50 hPa and exceed 200 ‰. With increasing altitude, the biases decrease. Above 4 hPa, the SMR data set shows a lower δD depletion than the MIPAS data sets, occasionally exceeding 100 ‰. Overall, the δD biases of the SMR data set are driven by HDO biases in the lower stratosphere and by H2O biases in the upper stratosphere and lower mesosphere. In between, in the middle stratosphere, the biases in δD are the result of deviations in both HDO and H2O. These biases are attributed to issues with the calibration, in particular in terms of the sideband filtering, and uncertainties in spectroscopic parameters. The MIPAS and ACE-FTS data sets agree rather well between about 100 and 10 hPa. The MIPAS data sets show less depletion below approximately 15 hPa (up to about 30 ‰), due to differences in both HDO and H2O. Higher up this behaviour is reversed, and towards the upper stratosphere the biases increase. This is driven by increasing biases in H2O, and on occasion the differences in δD exceed 80 ‰. Below 100 hPa, the differences between the MIPAS and ACE-FTS data sets are even larger. In the climatological comparisons, the MIPAS data sets continue to show less depletion in δD than the ACE-FTS data sets below 15 hPa during all seasons, with some variations in magnitude. The differences between the MIPAS and ACE-FTS data have multiple causes, such as differences in the temporal and spatial sampling (except for the profile-to-profile comparisons), cloud influence, vertical resolution, and the microwindows and spectroscopic database chosen. Differences between data sets from the same instrument are typically small in the stratosphere. Overall, if the data sets are considered together, the differences in δD among them in key areas of scientific interest (e.g. tropical and polar lower stratosphere, lower mesosphere, and upper troposphere) are too large to draw robust conclusions on atmospheric processes affecting the water vapour budget and distribution, e.g. the relative importance of different mechanisms transporting water vapour into the stratosphere.


2020 ◽  
Vol 7 (1) ◽  
pp. 163-180
Author(s):  
Saagar S Kulkarni ◽  
Kathryn E Lorenz

This paper examines two CDC data sets in order to provide a comprehensive overview and social implications of COVID-19 related deaths within the United States over the first eight months of 2020. By analyzing the first data set during this eight-month period with the variables of age, race, and individual states in the United States, we found correlations between COVID-19 deaths and these three variables. Overall, our multivariable regression model was found to be statistically significant.  When analyzing the second CDC data set, we used the same variables with one exception; gender was used in place of race. From this analysis, it was found that trends in age and individual states were significant. However, since gender was not found to be significant in predicting deaths, we concluded that, gender does not play a significant role in the prognosis of COVID-19 induced deaths. However, the age of an individual and his/her state of residence potentially play a significant role in determining life or death. Socio-economic analysis of the US population confirms Qualitative socio-economic Logic based Cascade Hypotheses (QLCH) of education, occupation, and income affecting race/ethnicity differently. For a given race/ethnicity, education drives occupation then income, where a person lives, and in turn his/her access to healthcare coverage. Considering socio-economic data based QLCH framework, we conclude that different races are poised for differing effects of COVID-19 and that Asians and Whites are in a stronger position to combat COVID-19 than Hispanics and Blacks.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 5415-5415 ◽  
Author(s):  
Alexander H. Schmidt ◽  
Andrea Stahr ◽  
Daniel Baier ◽  
Gerhard Ehninger ◽  
Claudia Rutt

Abstract In strategic stem cell donor registry planning, it is of special importance to decide how to type newly registered donors. This question refers to both the selection of HLA loci and the resolution (low, intermediate, or high) of HLA typings. In principle, high-resolution typings of all transplant-relevant loci are preferable. However, cost considerations generally lead to incomplete typings (only selected HLA loci with low or intermediate typing resolution) in practice. Here, we present results of a project in which newly recruited donors are typed for the HLA-A, -B, -C, and -DRB1 loci with high resolution by sequencing. Efficiency of these typings is measured by subsequent requests for confirmatory typings (CTs) and stem cell donations. Results for donors who were included in the project (Donor Group A) are compared to requests for donors with other, less complete typing levels: HLA-A and HLA-B at intermediate resolution, HLA-DRB1 at high resolution (Group B); HLA-A, -B, -C, and -DRB1 at intermediate resolution (Group C); HLA-A, -B, and -DRB1 at intermediate resolution (Group D). All data are taken from the donor file of DKMS German Bone Marrow Donor Center. Since the four groups differ considerably regarding their age and sex distributions, calculations are also carried through for restricted data sets that include only male donors up to age 25. Results are shown in Table 1. Donors of Groups A and B have similar CT request frequencies of 5.90 and 5.92 requests per 100 donors per year in the resctricted data sets, respectively. These frequencies significantly exceed the corresponding frequencies of the other groups with less complete typing levels. For donation requests, the frequency is signifcantly higher for Group A than for Group B (restricted data sets): 1.45 vs 1.02 requests per donor per year (p<0.05). Obviously, the additional HLA information for Group A donors leads to a higher ratio between donations and CT requests. Again, figures are much lower for Groups C and D. These results are based on a high number of requests even for the restricted data sets, namely between 44 and 90 donation requests and between 227 and 619 CT requests per group. Our results show that full (HLA-A, -B, -C, and -DRB1) high-resolution typings at donor recruitment lead to significantly higher probabilities for donation requests. Donor centers and registries should carefully take into account these higher probabilities when they consider full high-resolution typings for newly recruited donors. However, the final decision regarding the typing strategy at recruitment must also depend on the individual cost structure of a donor center or registry. The presented results are based on a donor file that consists mainly (≈99%) of Caucasian donors. It should be subject to further analyses if these results also apply to other, more heterogeneous donor pools. Table 1: Requests per 100 donors per year by donor group CT requests Donation requests Donor Group Full data set Only male donors≤ 25 Full data set Only male donors≤ 25 A 5.14 5.90 1.45 1.45 B 4.60 5.92 0.84 1.02 C 2.50 3.03 0.58 0.67 D 2.36 2.80 0.38 0.48


Mathematics ◽  
2019 ◽  
Vol 7 (12) ◽  
pp. 1215 ◽  
Author(s):  
Hoang Pham

Selecting the best model from a set of candidates for a given set of data is obviously not an easy task. In this paper, we propose a new criterion that takes into account a larger penalty when adding too many coefficients (or estimated parameters) in the model from too small a sample in the presence of too much noise, in addition to minimizing the sum of squares error. We discuss several real applications that illustrate the proposed criterion and compare its results to some existing criteria based on a simulated data set and some real datasets including advertising budget data, newly collected heart blood pressure health data sets and software failure data.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5809
Author(s):  
Loris Nanni ◽  
Giovanni Minchio ◽  
Sheryl Brahnam ◽  
Davide Sarraggiotto ◽  
Alessandra Lumini

In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN are closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 176 ◽  
Author(s):  
David Verstraete ◽  
Enrique Droguett ◽  
Mohammad Modarres

Multi-sensor systems are proliferating in the asset management industry. Industry 4.0, combined with the Internet of Things (IoT), has ushered in the requirements of prognostics and health management systems to predict the system’s reliability and assess maintenance decisions. State of the art systems now generate big machinery data and require multi-sensor fusion for integrated remaining useful life prognostic capabilities. When dealing with these data sets, traditional prediction methods are not equipped to handle the multiple sensor signals in unison. To address this challenge, this paper proposes a new, deep, adversarial approach to any remaining useful life prediction in which a novel, non-Markovian, variational, inference-based model, incorporating an adversarial methodology, is derived. To evaluate the proposed approach, two public multi-sensor data sets are used for the remaining useful life prediction applications: (1) CMAPSS turbofan engine dataset, and (2) FEMTO Pronostia rolling element bearing data set. The proposed approach obtains favorable results when against similar deep learning models.


Author(s):  
Ali Ashasi-Sorkhabi ◽  
Stanley Fong ◽  
Guru Prakash ◽  
Sriram Narasimhan

Data-driven condition-based maintenance (CBM) can be an effective predictive maintenance strategy for components within complex systems with unknown dynamics, nonstationary vibration signatures or a lack of historical failure data. CBM strategies allow operators to maintain components based on their condition in lieu of traditional alternatives such as preventive or corrective strategies. In this paper, the authors present an outline of the CBM program and a field pilot study being conducted on the gearbox, a critical component in an automated cable-driven people mover (APM) system at Toronto’s Pearson airport. This CBM program utilizes a paired server-client “two-tier” configuration for fault detection and prognosis. At the first level, fault detection is performed in real-time using vibration data collected from accelerometers mounted on the APM gearbox. Time-domain condition indicators are extracted from the signals to establish the baseline condition of the system to detect faults in real-time. All tier one tasks are handled autonomously using a controller located on-site. In the second level pertaining to prognostics, these condition indicators are utilized for degradation modeling and subsequent remaining useful life (RUL) estimation using random coefficient and stochastic degradation models. Parameter estimation is undertaken using a hierarchical Bayesian approach. Degradation parameters and the RUL model are updated in a feedback loop using the collected degradation data. While the case study presented will primarily focus on a cable-driven APM gearbox, the underlying theory and the tools developed to undertake diagnostics and prognostics tasks are broadly applicable to a wide range of other civil and industrial applications.


Author(s):  
Loris Nanni ◽  
Giovanni Minchio ◽  
Sheryl Brahnam ◽  
Davide Sarraggiotto ◽  
Alessandra Lumini

In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN is closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


Sign in / Sign up

Export Citation Format

Share Document