On failure prediction and failure identification modeling in a gas turbine system: a survey of classification approaches in a three-class problem

Catherine Cheung; Calista Biondic; Zouhair Hamaimou; Julio Valdes

doi:10.36001/phmconf.2021.v13i1.3052

On failure prediction and failure identification modeling in a gas turbine system: a survey of classification approaches in a three-class problem

Annual Conference of the PHM Society ◽

10.36001/phmconf.2021.v13i1.3052 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Catherine Cheung ◽

Calista Biondic ◽

Zouhair Hamaimou ◽

Julio Valdes

Keyword(s):

Machine Learning ◽

Gas Turbine ◽

Health Monitoring ◽

Data Selection ◽

Sensor Data ◽

Sensor Technology ◽

Selection Strategy ◽

Support Vector ◽

Data Set ◽

Unseen Data

Rapid developments in sensor technology, data processing tools and data storage capability have helped fuel an increased appetite for equipment health monitoring in mechanical systems. As a result, the number of sensors and amount of data collected for health monitoring has grown tremendously. It is hoped that by collecting large quantities of operational data, predictive tools can be developed that will provide operational, maintenance and safety benefits. Data mining and machine learning techniques are important tools in addressing the ensuing challenge of extracting useful results from the data collected. In this work, the sensor data from a gas turbine system was analyzed with the objective of failure modeling and prediction. Previous efforts had used a two-class approach for this problem, to distinguish healthy and failed states of the system. In this work, a third class labelled as deteriorated data is added prior to each failure event to explore the ability of machine learning models to provide early warning of upcoming incidents. Several maintenance incidents were recorded by the sensor system in two separate vehicles. Three approaches to selecting training data were used. The first followed a traditional method of randomly selecting data points from all data according to a desired percentage of failed data to include in training, target ratios between failed and healthy data in each data set, as well as target ratios between training and testing data. The second data selection strategy was to consider data related to failure incidents as a whole and select certain incidents to include in training, and the remaining ones to be unseen in testing. The third approach was cross-validation which is typically used as a technique to evaluate how a classifier will perform on unseen data while still using the entirety of the data to train the final classifier. In addition to investigating training and data selection strategies, the effect of hyperparameter optimization was explored as well as the effect of varying the time period of the deteriorated class. Using the gas turbine data, which included 7 failure incidents and 76 predictor variables, a variety of classifier models of the system were developed in a three-class problem to differentiate healthy, deteriorated and failed system states. The classifier methods included support vector machines, Gaussian Naïve Bayes, random forest, adaboost, multilayer perceptron, k-nearest neighbor, and XG boost. Ensemble models were also created to leverage all the individual classifier models that were developed. This paper will describe the comprehensive results that were obtained using the various approaches and combinations, highlighting the respective benefits and limitations.

Download Full-text

Application of Various Machine Learning Techniques in Predicting Water Saturation in Tight Gas Sandstone Formation

Journal of Energy Resources Technology ◽

10.1115/1.4053248 ◽

2021 ◽

pp. 1-14

Author(s):

Ahmed Farid Ibrahim ◽

Salaheldin Elkatatny ◽

Yasmin Abdelraouf ◽

Mustafa Al Ramadan

Keyword(s):

Machine Learning ◽

Water Saturation ◽

Well Logs ◽

Percentage Error ◽

Support Vector ◽

Tight Gas ◽

Data Set ◽

Unseen Data ◽

Sandstone Formation ◽

Tight Gas Sandstone

Abstract Water saturation (Sw) is a vital factor for the hydrocarbon in-place calculations. Sw is usually calculated using different equations; however, its values have been inconsistent with the experimental results due to often incorrectness of their underlying assumptions. Moreover, the main hindrance remains in these approaches due to their strong reliance on experimental analysis which are expensive and time-consuming. This study introduces the application of different machine learning (ML) methods to predict Sw from the conventional well logs. Function networks (FN), support vector machine (SVM), and random forests (RF) were implemented to calculate the Sw using gamma-ray (GR) log, Neutron porosity (NPHI) log, and resistivity (Rt) log. A dataset of 782 points from two wells (Well-1 and Well-2) in tight gas sandstone formation was used to build and then validate the different ML models. The data set from Well-1 was applied for the ML models training and testing, then the unseen data from well-2 was used to validate the developed models. The results from FN, SVM and RF models showed their capability of accurately predicting the Sw from the conventional well logging data. The correlation coefficient (R) values between actual and estimated Sw from the FN model were found to be 0.85 and 0.83 compared to 0.98, and 0.95 from the RF model in the case of training and testing sets, respectively. SVM model shows an R-value of 0.95 and 0.85 in the different datasets. The average absolute percentage error (AAPE) was less than 8% in the three ML models. The ML models outperform the empirical correlations that have AAPE greater than 19%. This study provides ML applications to accurately forecast the water saturation using the readily available conventional well logs without additional core analysis or well site interventions.

Download Full-text

The Feasibility of Using Machine Learning to Classify Calls to South African Emergency Dispatch Centres According to Prehospital Diagnosis, by Utilising Caller Descriptions of the Incident

Healthcare ◽

10.3390/healthcare9091107 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1107

Author(s):

Tayla Anthony ◽

Amit Kumar Mishra ◽

Willem Stassen ◽

Jarryd Son

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Parameter Tuning ◽

Critical Conditions ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Unseen Data ◽

The Right ◽

Time Critical

This paper presents the application of machine learning for classifying time-critical conditions namely sepsis, myocardial infarction and cardiac arrest, based off transcriptions of emergency calls from emergency services dispatch centers in South Africa. In this study we present results from the application of four multi-class classification algorithms: Support Vector Machine (SVM), Logistic Regression, Random Forest and K-Nearest Neighbor (kNN). The application of machine learning for classifying time-critical diseases may allow for earlier identification, adequate telephonic triage, and quicker response times of the appropriate cadre of emergency care personnel. The data set consisted of an original data set of 93 examples which was further expanded through the use of data augmentation. Two feature extraction techniques were investigated namely; TF-IDF and handcrafted features. The results were further improved using hyper-parameter tuning and feature selection. In our work, within the limitations of a limited data set, classification results yielded an accuracy of up to 100% when training with 10-fold cross validation, and 95% accuracy when predicted on unseen data. The results are encouraging and show that automated diagnosis based on emergency dispatch centre transcriptions is feasible. When implemented in real time, this can have multiple utilities, e.g. enabling the call-takers to take the right action with the right priority.

Download Full-text

Scrutiny of Mental Depression through Smartphone Sensors Using Machine Learning Approaches

International Journal of Innovative Computing ◽

10.11113/ijic.v10n1.259 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Abdul Kadar Muhammad Masum ◽

Erfanul Hoque Bahadur ◽

Forhada Akther Ruhi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Strategies ◽

Sensor Data ◽

Support Vector ◽

Learning Approaches ◽

Validation Data ◽

Data Set ◽

Mental Depression ◽

Depressed Patients

In addition to a variety of exceptional sensors, Smartphones now facilitates vigorous open entries in data mining and machine learning to scrutinize the Human Activity Recognition (HAR) system. The follow-up to the treatment of diseases, HAR monitoring system, can be used to recognize mental depression that until now has been overlooked for HAR applications. In this scrutinize, Smartphone sensor data were collected in the 1 Hz frequency from 20 data subjects of different ages. We drove the HAR by using basic machine learning strategies, namely Support Vector Machine, Random Forest, K-Nearest Neighbors, and Artificial Neural Network to recognize physical activities which are associated with mental depression. Random Forest outperformed to recognize daily patterns of activities with 99.80% accuracy of the validation data set. Along with, sensors data was amassed regarding the activities performed over the most recent 14 days continuously from target subjects’ Smartphone. This data was fed to the optimized Random Forest model and quantified the duration of each symptomatic activity of mental depression. Here, a push was connected to figure the risk factor for the probability that an individual has been encountering mental depression. So, a questionnaire was surveyed to collect data from 50 patients who were suffering from mental depression. The questionnaire enquires for the duration of activities related to mental depression. Then, the similarity of these experimental subjects’ activity pattern was measured with those of 50 depressed patients. Finally, data was collected from target subjects’ and applied similarity approach to induce the relation between the target subjects’ and depressed patients. Average similarity value of 90.94% for the depressing subject and 34.99% of the typical subject justifies that this robust system was able to achieve a good performance in terms of measurement of risk factors.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div> </div>

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Automatic Identification of Upper Extremity Rehabilitation Exercise Type and Dose Using Body-Worn Sensors and Machine Learning: A Pilot Study

Digital Biomarkers ◽

10.1159/000516619 ◽

2021 ◽

pp. 158-166

Author(s):

Noah Balestra ◽

Gaurav Sharma ◽

Linda M. Riek ◽

Ania Busza

Keyword(s):

Machine Learning ◽

Upper Extremity ◽

Sensor Data ◽

Inpatient Setting ◽

Accelerometer Data ◽

Data Set ◽

Machine Learning Classification ◽

Exercise Type ◽

Exercise Dose ◽

Rehabilitation Exercises

Background: Prior studies suggest that participation in rehabilitation exercises improves motor function poststroke; however, studies on optimal exercise dose and timing have been limited by the technical challenge of quantifying exercise activities over multiple days. Objectives: The objectives of this study were to assess the feasibility of using body-worn sensors to track rehabilitation exercises in the inpatient setting and investigate which recording parameters and data analysis strategies are sufficient for accurately identifying and counting exercise repetitions. Methods: MC10 BioStampRC® sensors were used to measure accelerometer and gyroscope data from upper extremities of healthy controls (n = 13) and individuals with upper extremity weakness due to recent stroke (n = 13) while the subjects performed 3 preselected arm exercises. Sensor data were then labeled by exercise type and this labeled data set was used to train a machine learning classification algorithm for identifying exercise type. The machine learning algorithm and a peak-finding algorithm were used to count exercise repetitions in non-labeled data sets. Results: We achieved a repetition counting accuracy of 95.6% overall, and 95.0% in patients with upper extremity weakness due to stroke when using both accelerometer and gyroscope data. Accuracy was decreased when using fewer sensors or using accelerometer data alone. Conclusions: Our exploratory study suggests that body-worn sensor systems are technically feasible, well tolerated in subjects with recent stroke, and may ultimately be useful for developing a system to measure total exercise “dose” in poststroke patients during clinical rehabilitation or clinical trials.

Download Full-text

Structural Health Monitoring Using Machine Learning and Cumulative Absolute Velocity Features

Applied Sciences ◽

10.3390/app11125727 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5727

Author(s):

Sifat Muin ◽

Khalid M. Mosalam

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Structural Health Monitoring ◽

Health Monitoring ◽

Degree Of Freedom ◽

Absolute Velocity ◽

Support Vector ◽

Damage State ◽

Structural Health ◽

Cumulative Absolute Velocity

Machine learning (ML)-aided structural health monitoring (SHM) can rapidly evaluate the safety and integrity of the aging infrastructure following an earthquake. The conventional damage features used in ML-based SHM methodologies face the curse of dimensionality. This paper introduces low dimensional, namely, cumulative absolute velocity (CAV)-based features, to enable the use of ML for rapid damage assessment. A computer experiment is performed to identify the appropriate features and the ML algorithm using data from a simulated single-degree-of-freedom system. A comparative analysis of five ML models (logistic regression (LR), ordinal logistic regression (OLR), artificial neural networks with 10 and 100 neurons (ANN10 and ANN100), and support vector machines (SVM)) is performed. Two test sets were used where Set-1 originated from the same distribution as the training set and Set-2 came from a different distribution. The results showed that the combination of the CAV and the relative CAV with respect to the linear response, i.e., RCAV, performed the best among the different feature combinations. Among the ML models, OLR showed good generalization capabilities when compared to SVM and ANN models. Subsequently, OLR is successfully applied to assess the damage of two numerical multi-degree of freedom (MDOF) models and an instrumented building with CAV and RCAV as features. For the MDOF models, the damage state was identified with accuracy ranging from 84% to 97% and the damage location was identified with accuracy ranging from 93% to 97.5%. The features and the OLR models successfully captured the damage information for the instrumented structure as well. The proposed methodology is capable of ensuring rapid decision-making and improving community resiliency.

Download Full-text

Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure

AI Magazine ◽

10.1609/aimag.v33i2.2408 ◽

2012 ◽

Vol 33 (2) ◽

pp. 55 ◽

Cited By ~ 17

Author(s):

Nisarg Vyas ◽

Jonathan Farringdon ◽

David Andre ◽

John Ivo Stivoric

Keyword(s):

Machine Learning ◽

Energy Expenditure ◽

Sensor Data ◽

Machine Learning Techniques ◽

Sensor Technology ◽

Health Goals ◽

System A ◽

Multi Sensor Data Fusion ◽

Learning Techniques ◽

Insight Into

In this article we provide insight into the BodyMedia FIT armband system — a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s success.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Applying Machine Learning for Improving Performance Classification on Driving Behavior

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.56919 ◽

2021 ◽

Vol 4 (1) ◽

pp. 8

Author(s):

Ahmad Iwan Fadli ◽

Selo Sulistyo ◽

Sigit Wibowo

Keyword(s):

Machine Learning ◽

Traffic Accident ◽

Large Scale ◽

Detection System ◽

Difficult Problem ◽

Sensor Data ◽

Driving Safety ◽

Support Vector ◽

Classification Methods ◽

Machine Learning Classification

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation. It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.

Download Full-text