scholarly journals Probabilistic Condition Monitoring of Azimuth Thrusters Based on Acceleration Measurements

Machines ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 39
Author(s):  
Riku-Pekka Nikula ◽  
Mika Ruusunen ◽  
Joni Keski-Rahkonen ◽  
Lars Saarinen ◽  
Fredrik Fagerholm

Drill ships and offshore rigs use azimuth thrusters for propulsion, maneuvering and steering, attitude control and dynamic positioning activities. The versatile operating modes and the challenging marine environment create demand for flexible and practical condition monitoring solutions onboard. This study introduces a condition monitoring algorithm using acceleration and shaft speed data to detect anomalies that give information on the defects in the driveline components of the thrusters. Statistical features of vibration are predicted with linear regression models and the residuals are then monitored relative to multivariate normal distributions. The method includes an automated shaft speed selection approach that identifies the normal distributed operational areas from the training data based on the residuals. During monitoring, the squared Mahalanobis distance to the identified distributions is calculated in the defined shaft speed ranges, providing information on the thruster condition. The performance of the method was validated based on data from two operating thrusters and compared with reference classifiers. The results suggest that the method could detect changes in the condition of the thrusters during online monitoring. Moreover, it had high accuracy in the bearing condition related binary classification tests. In conclusion, the algorithm has practical properties that exhibit suitability for online application.

Author(s):  
Yu Zhang ◽  
Miguel Martínez-García ◽  
Mike Garlick ◽  
Anthony Latimer ◽  
Samuel Cruz-Manzo

In this paper, a scheme of an ‘early warning’ system is developed for the combustion system of Industrial Gas Turbines (IGTs), which attains low computational workload and simple programming requirements, being therefore employable at an industrial level. The methodology includes trend analysis, which examines when the measurement shows different trends from the other measurements in the sensor group, and noise analysis, which examines when the measurement is displaying higher levels of noise compared to those of the other sensors. In this research, difficulties encountered by other data-driven methods due to temperature varying with load conditions of the IGT’s have also been overcome by the proposed approach. Furthermore, it brings other advantages, for instance, no historic training data is needed, and there is no requirement to set thresholds for each sensor in the system. The efficacy and effectiveness of the proposed approach has been demonstrated through experimental trials of previous pre-chamber burnout cases. And the resulting outcomes of the scheme will be of interest to IGT companies, especially in condition monitoring of the combustion system. Future work and possible improvements are also discussed at the end of the paper.


2021 ◽  
Author(s):  
Jason Meil

<p>Data preparation process generally consumes up to 80% of the Data Scientists time, with 60% of that being attributed to cleaning and labeling data.[1]  Our solution is to use automated pipelines to prepare, annotate, and catalog data. The first step upon ingestion, especially in the case of real world—unstructured and unlabeled datasets—is to leverage Snorkel, a tool specifically designed around a paradigm to rapidly create, manage, and model training data. Configured properly, Snorkel can be leveraged to temper this labeling bottle-neck through a process called weak supervision. Weak supervision uses programmatic labeling functions—heuristics, distant supervision, SME or knowledge base—scripted in python to generate “noisy labels”. The function traverses the entirety of the dataset and feeds the labeled data into a generative—conditionally probabilistic—model. The function of this model is to output the distribution of each response variable and predict the conditional probability based on a joint probability distribution algorithm. This is done by comparing the various labeling functions and the degree to which their outputs are congruent to each other. A single labeling function that has a high degree of congruence with other labeling functions will have a high degree of learned accuracy, that is, the fraction of predictions that the model got right. Conversely, single labeling functions that have a low degree of congruence with other functions will have low learned accuracy. Each prediction is then combined by the estimated weighted accuracy, whereby the predictions of the higher learned functions are counted multiple times. The result yields a transformation from a binary classification of 0 or 1 to a fuzzy label between 0 and 1— there is “x” probability that based on heuristic “n”, the response variable is “y”. The addition of data to this generative model multi-class inference will be made on the response variables positive, negative, or abstain, assigning probabilistic labels to potentially millions of data points. Thus, we have generated a discriminative ground truth for all further labeling efforts and have improved the scalability of our models. Labeling functions can be applied to unlabeled data to further machine learning efforts.<br> <br>Once our datasets are labeled and a ground truth is established, we need to persist the data into our delta lake since it combines the most performant aspects of a warehouse with the low-cost storage for data lakes. In addition, the lake can accept unstructured, semi structured, or structured data sources, and those sources can be further aggregated into raw ingestion, cleaned, and feature engineered data layers.  By sectioning off the data sources into these “layers”, the data engineering portion is abstracted away from the data scientist, who can access model ready data at any time.  Data can be ingested via batch or stream. <br> <br>The design of the entire ecosystem is to eliminate as much technical debt in machine learning paradigms as possible in terms of configuration, data collection, verification, governance, extraction, analytics, process management, resource management, infrastructure, monitoring, and post verification. </p>


2021 ◽  
Vol 14 (1) ◽  
pp. 40
Author(s):  
Eftychia Koukouraki ◽  
Leonardo Vanneschi ◽  
Marco Painho

Among natural disasters, earthquakes are recorded to have the highest rates of human loss in the past 20 years. Their unexpected nature has severe consequences on both human lives and material infrastructure, demanding urgent action to be taken. For effective emergency relief, it is necessary to gain awareness about the level of damage in the affected areas. The use of remotely sensed imagery is popular in damage assessment applications; however, it requires a considerable amount of labeled data, which are not always easy to obtain. Taking into consideration the recent developments in the fields of Machine Learning and Computer Vision, this study investigates and employs several Few-Shot Learning (FSL) strategies in order to address data insufficiency and imbalance in post-earthquake urban damage classification. While small datasets have been tested against binary classification problems, which usually divide the urban structures into collapsed and non-collapsed, the potential of limited training data in multi-class classification has not been fully explored. To tackle this gap, four models were created, following different data balancing methods, namely cost-sensitive learning, oversampling, undersampling and Prototypical Networks. After a quantitative comparison among them, the best performing model was found to be the one based on Prototypical Networks, and it was used for the creation of damage assessment maps. The contribution of this work is twofold: we show that oversampling is the most suitable data balancing method for training Deep Convolutional Neural Networks (CNN) when compared to cost-sensitive learning and undersampling, and we demonstrate the appropriateness of Prototypical Networks in the damage classification context.


Author(s):  
Farid Jauhari ◽  
Ahmad Afif Supianto

<span lang="EN-US">Student’s performance is the most important value of the educational institutes for their competitiveness. In order to improve the value, they need to predict student’s performance, so they can give special treatment to the student that predicted as low performer. In this paper, we propose 3 boosting algorithms (C5.0, adaBoost.M1, and adaBoost.SAMME) to build the classifier for predicting student’s performance. This research used <sup>1</sup>UCI student performance datasets. There are 3 scenarios of evaluation, the first scenario was employ 10-fold cross-validation to compare performance of boosting algorithms. The result of first scenario showed that adaBoost.SAMME and adaBoost.M1 outperform baseline method in binary classification. The second scenario was used to evaluate boosting algorithms under different number of training data. On the second scenario, adaBoost.M1 was outperformed another boosting algorithms and baseline method on the binary classification. As third scenario, we build models from one subject dataset and test using onother subject dataset. The third scenario results indicate that it can build prediction model using one subject to predict another subject.</span>


2020 ◽  
Vol 20 (S14) ◽  
Author(s):  
Ming Liang ◽  
ZhiXing Zhang ◽  
JiaYing Zhang ◽  
Tong Ruan ◽  
Qi Ye ◽  
...  

Abstract Background Laboratory indicator test results in electronic health records have been applied to many clinical big data analysis. However, it is quite common that the same laboratory examination item (i.e., lab indicator) is presented using different names in Chinese due to the translation problem and the habit problem of various hospitals, which results in distortion of analysis results. Methods A framework with a recall model and a binary classification model is proposed, which could reduce the alignment scale and improve the accuracy of lab indicator normalization. To reduce alignment scale, tf-idf is used for candidate selection. To assure the accuracy of output, we utilize enhanced sequential inference model for binary classification. And active learning is applied with a selection strategy which is proposed for reducing annotation cost. Results Since our indicator standardization method mainly focuses on Chinese indicator inconsistency, we perform our experiment on Shanghai Hospital Development Center and select clinical data from 8 hospitals. The method achieves a F1-score 92.08$$\%$$ % in our final binary classification. As for active learning, the new strategy proposed performs better than random baseline and could outperform the result trained on full data with only 43$$\%$$ % training data. A case study on heart failure clinic analysis conducted on the sub-dataset collected from SHDC shows that our proposed method is practical in the application with good performance. Conclusion This work demonstrates that the structure we proposed can be effectively applied to lab indicator normalization. And active learning is also suitable for this task for cost reduction. Such a method is also valuable in data cleaning, data mining, text extracting and entity alignment.


Author(s):  
Daniel B. Rubin

AdaBoost is a popular and successful data mining technique for binary classification. However, there is no universally agreed upon extension of the method for problems with more than two classes. Most multiclass generalizations simply reduce the problem to a series of binary classification problems. The statistical interpretation of AdaBoost is that it operates through loss-based estimation: by using an exponential loss function as a surrogate for misclassification loss, it sequentially minimizes empirical risk through fitting a base classifier to iteratively reweighted training data. While there are several extensions using loss-based estimation with multiclass base classifiers, these use multiclass versions of the exponential loss that are not classification calibrated: unless restrictions are placed on conditional class probabilities, it becomes possible to have optimal surrogate risk but poor misclassification risk. In this work, we introduce a new AdaBoost extension called AdaBoost.SL that does not reduce the problem into binary subproblems and that uses a classification-calibrated multiclass exponential loss function. Numerical experiments show the algorithm performs well on benchmark datasets.


Sign in / Sign up

Export Citation Format

Share Document