Application of Various Machine Learning Techniques in Predicting Water Saturation in Tight Gas Sandstone Formation

Abstract Water saturation (Sw) is a vital factor for the hydrocarbon in-place calculations. Sw is usually calculated using different equations; however, its values have been inconsistent with the experimental results due to often incorrectness of their underlying assumptions. Moreover, the main hindrance remains in these approaches due to their strong reliance on experimental analysis which are expensive and time-consuming. This study introduces the application of different machine learning (ML) methods to predict Sw from the conventional well logs. Function networks (FN), support vector machine (SVM), and random forests (RF) were implemented to calculate the Sw using gamma-ray (GR) log, Neutron porosity (NPHI) log, and resistivity (Rt) log. A dataset of 782 points from two wells (Well-1 and Well-2) in tight gas sandstone formation was used to build and then validate the different ML models. The data set from Well-1 was applied for the ML models training and testing, then the unseen data from well-2 was used to validate the developed models. The results from FN, SVM and RF models showed their capability of accurately predicting the Sw from the conventional well logging data. The correlation coefficient (R) values between actual and estimated Sw from the FN model were found to be 0.85 and 0.83 compared to 0.98, and 0.95 from the RF model in the case of training and testing sets, respectively. SVM model shows an R-value of 0.95 and 0.85 in the different datasets. The average absolute percentage error (AAPE) was less than 8% in the three ML models. The ML models outperform the empirical correlations that have AAPE greater than 19%. This study provides ML applications to accurately forecast the water saturation using the readily available conventional well logs without additional core analysis or well site interventions.

Download Full-text

Application of Various Machine Learning Techniques in Predicting Total Organic Carbon from Well Logs

Computational Intelligence and Neuroscience ◽

10.1155/2021/7390055 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Osama Siddig ◽

Ahmed Farid Ibrahim ◽

Salaheldin Elkatatny

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Total Organic Carbon ◽

The Other ◽

Well Logs ◽

Machine Learning Techniques ◽

Percentage Error ◽

Average Error ◽

Support Vector ◽

Empirical Correlations

Unconventional resources have recently gained a lot of attention, and as a consequence, there has been an increase in research interest in predicting total organic carbon (TOC) as a crucial quality indicator. TOC is commonly measured experimentally; however, due to sampling restrictions, obtaining continuous data on TOC is difficult. Therefore, different empirical correlations for TOC have been presented. However, there are concerns about the generalization and accuracy of these correlations. In this paper, different machine learning (ML) techniques were utilized to develop models that predict TOC from well logs, including formation resistivity (FR), spontaneous potential (SP), sonic transit time (Δt), bulk density (RHOB), neutron porosity (CNP), gamma ray (GR), and spectrum logs of thorium (Th), uranium (Ur), and potassium (K). Over 1250 data points from the Devonian Duvernay shale were utilized to create and validate the model. These datasets were obtained from three wells; the first was used to train the models, while the data sets from the other two wells were utilized to test and validate them. Support vector machine (SVM), random forest (RF), and decision tree (DT) were the ML approaches tested, and their predictions were contrasted with three empirical correlations. Various AI methods’ parameters were tested to assure the best possible accuracy in terms of correlation coefficient (R) and average absolute percentage error (AAPE) between the actual and predicted TOC. The three ML methods yielded good matches; however, the RF-based model has the best performance. The RF model was able to predict the TOC for the different datasets with R values range between 0.93 and 0.99 and AAPE values less than 14%. In terms of average error, the ML-based models outperformed the other three empirical correlations. This study shows the capability and robustness of ML models to predict the total organic carbon from readily available logging data without the need for core analysis or additional well interventions.

Download Full-text

Artificial neural networks as a tool for pattern recognition and electrofacies analysis in Polish palaeozoic shale gas formations

Acta Geophysica ◽

10.1007/s11600-019-00359-2 ◽

2019 ◽

Vol 67 (6) ◽

pp. 1991-2003 ◽

Cited By ~ 3

Author(s):

Edyta Puskarczyk

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Support Vector Machine ◽

P Wave ◽

Well Logs ◽

Support Vector ◽

Western Slope ◽

Data Set ◽

Artificial Neural ◽

Gas Bearing

Abstract Unconventional oil and gas reservoirs from the lower Palaeozoic basin at the western slope of the East European Craton were taken into account in this study. The aim was to supply and improve standard well logs interpretation based on machine learning methods, especially ANNs. ANNs were used on standard well logging data, e.g. P-wave velocity, density, resistivity, neutron porosity, radioactivity and photoelectric factor. During the calculations, information about lithology or stratigraphy was not taken into account. We apply different methods of classification: cluster analysis, support vector machine and artificial neural network—Kohonen algorithm. We compare the results and analyse obtained electrofacies. Machine learning method–support vector machine SVM was used for classification. For the same data set, SVM algorithm application results were compared to the results of the Kohonen algorithm. The results were very similar. We obtained very good agreement of results. Kohonen algorithm (ANN) was used for pattern recognition and identification of electrofacies. Kohonen algorithm was also used for geological interpretation of well logs data. As a result of Kohonen algorithm application, groups corresponding to the gas-bearing intervals were found. Analysis showed diversification between gas-bearing formations and surrounding beds. It is also shown that internal diversification in gas-saturated beds is present. It is concluded that ANN appeared to be a useful and quick tool for preliminary classification of members and gas-saturated identification.

Download Full-text

Prediction of Water Saturation in Tight Gas Sandstone Formation Using Artificial Intelligence

ACS Omega ◽

10.1021/acsomega.1c04416 ◽

2022 ◽

Author(s):

Ahmed Farid Ibrahim ◽

Salaheldin Elkatatny ◽

Mustafa Al Ramadan

Keyword(s):

Artificial Intelligence ◽

Water Saturation ◽

Tight Gas ◽

Sandstone Formation ◽

Tight Gas Sandstone

Download Full-text

On failure prediction and failure identification modeling in a gas turbine system: a survey of classification approaches in a three-class problem

Annual Conference of the PHM Society ◽

10.36001/phmconf.2021.v13i1.3052 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Catherine Cheung ◽

Calista Biondic ◽

Zouhair Hamaimou ◽

Julio Valdes

Keyword(s):

Machine Learning ◽

Gas Turbine ◽

Health Monitoring ◽

Data Selection ◽

Sensor Data ◽

Sensor Technology ◽

Selection Strategy ◽

Support Vector ◽

Data Set ◽

Unseen Data

Rapid developments in sensor technology, data processing tools and data storage capability have helped fuel an increased appetite for equipment health monitoring in mechanical systems. As a result, the number of sensors and amount of data collected for health monitoring has grown tremendously. It is hoped that by collecting large quantities of operational data, predictive tools can be developed that will provide operational, maintenance and safety benefits. Data mining and machine learning techniques are important tools in addressing the ensuing challenge of extracting useful results from the data collected. In this work, the sensor data from a gas turbine system was analyzed with the objective of failure modeling and prediction. Previous efforts had used a two-class approach for this problem, to distinguish healthy and failed states of the system. In this work, a third class labelled as deteriorated data is added prior to each failure event to explore the ability of machine learning models to provide early warning of upcoming incidents. Several maintenance incidents were recorded by the sensor system in two separate vehicles. Three approaches to selecting training data were used. The first followed a traditional method of randomly selecting data points from all data according to a desired percentage of failed data to include in training, target ratios between failed and healthy data in each data set, as well as target ratios between training and testing data. The second data selection strategy was to consider data related to failure incidents as a whole and select certain incidents to include in training, and the remaining ones to be unseen in testing. The third approach was cross-validation which is typically used as a technique to evaluate how a classifier will perform on unseen data while still using the entirety of the data to train the final classifier. In addition to investigating training and data selection strategies, the effect of hyperparameter optimization was explored as well as the effect of varying the time period of the deteriorated class. Using the gas turbine data, which included 7 failure incidents and 76 predictor variables, a variety of classifier models of the system were developed in a three-class problem to differentiate healthy, deteriorated and failed system states. The classifier methods included support vector machines, Gaussian Naïve Bayes, random forest, adaboost, multilayer perceptron, k-nearest neighbor, and XG boost. Ensemble models were also created to leverage all the individual classifier models that were developed. This paper will describe the comprehensive results that were obtained using the various approaches and combinations, highlighting the respective benefits and limitations.

Download Full-text

The Feasibility of Using Machine Learning to Classify Calls to South African Emergency Dispatch Centres According to Prehospital Diagnosis, by Utilising Caller Descriptions of the Incident

Healthcare ◽

10.3390/healthcare9091107 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1107

Author(s):

Tayla Anthony ◽

Amit Kumar Mishra ◽

Willem Stassen ◽

Jarryd Son

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Parameter Tuning ◽

Critical Conditions ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Unseen Data ◽

The Right ◽

Time Critical

This paper presents the application of machine learning for classifying time-critical conditions namely sepsis, myocardial infarction and cardiac arrest, based off transcriptions of emergency calls from emergency services dispatch centers in South Africa. In this study we present results from the application of four multi-class classification algorithms: Support Vector Machine (SVM), Logistic Regression, Random Forest and K-Nearest Neighbor (kNN). The application of machine learning for classifying time-critical diseases may allow for earlier identification, adequate telephonic triage, and quicker response times of the appropriate cadre of emergency care personnel. The data set consisted of an original data set of 93 examples which was further expanded through the use of data augmentation. Two feature extraction techniques were investigated namely; TF-IDF and handcrafted features. The results were further improved using hyper-parameter tuning and feature selection. In our work, within the limitations of a limited data set, classification results yielded an accuracy of up to 100% when training with 10-fold cross validation, and 95% accuracy when predicted on unseen data. The results are encouraging and show that automated diagnosis based on emergency dispatch centre transcriptions is feasible. When implemented in real time, this can have multiple utilities, e.g. enabling the call-takers to take the right action with the right priority.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Machine Learning for Sensorless Temperature Estimation of a BLDC Motor

Sensors ◽

10.3390/s21144655 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4655

Author(s):

Dariusz Czerwinski ◽

Jakub Gęca ◽

Krzysztof Kolano

Keyword(s):

Machine Learning ◽

Temperature Measurement ◽

Stochastic Gradient Descent ◽

Estimation Accuracy ◽

Coefficient Of Determination ◽

Percentage Error ◽

Support Vector ◽

Bldc Motor ◽

Temperature Estimation ◽

Motor Operation

In this article, the authors propose two models for BLDC motor winding temperature estimation using machine learning methods. For the purposes of the research, measurements were made for over 160 h of motor operation, and then, they were preprocessed. The algorithms of linear regression, ElasticNet, stochastic gradient descent regressor, support vector machines, decision trees, and AdaBoost were used for predictive modeling. The ability of the models to generalize was achieved by hyperparameter tuning with the use of cross-validation. The conducted research led to promising results of the winding temperature estimation accuracy. In the case of sensorless temperature prediction (model 1), the mean absolute percentage error MAPE was below 4.5% and the coefficient of determination R2 was above 0.909. In addition, the extension of the model with the temperature measurement on the casing (model 2) allowed reducing the error value to about 1% and increasing R2 to 0.990. The results obtained for the first proposed model show that the overheating protection of the motor can be ensured without direct temperature measurement. In addition, the introduction of a simple casing temperature measurement system allows for an estimation with accuracy suitable for compensating the motor output torque changes related to temperature.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Distribution Grids Fault Location employing ST based Optimized Machine Learning Approach

Energies ◽

10.3390/en11092328 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2328 ◽

Cited By ~ 12

Author(s):

Md Shafiullah ◽

M. Abido ◽

Taher Abdel-Fattah

Keyword(s):

Machine Learning ◽

Fault Location ◽

Percentage Error ◽

Support Vector ◽

Learning Approach ◽

Efficiency Coefficient ◽

Learning Tools ◽

Performance Indices ◽

Machine Learning Approach ◽

Distribution Grids

Precise information of fault location plays a vital role in expediting the restoration process, after being subjected to any kind of fault in power distribution grids. This paper proposed the Stockwell transform (ST) based optimized machine learning approach, to locate the faults and to identify the faulty sections in the distribution grids. This research employed the ST to extract useful features from the recorded three-phase current signals and fetches them as inputs to different machine learning tools (MLT), including the multilayer perceptron neural networks (MLP-NN), support vector machines (SVM), and extreme learning machines (ELM). The proposed approach employed the constriction-factor particle swarm optimization (CF-PSO) technique, to optimize the parameters of the SVM and ELM for their better generalization performance. Hence, it compared the obtained results of the test datasets in terms of the selected statistical performance indices, including the root mean squared error (RMSE), mean absolute percentage error (MAPE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), coefficient of determination (R2), Willmott’s index of agreement (WIA), and Nash–Sutcliffe model efficiency coefficient (NSEC) to confirm the effectiveness of the developed fault location scheme. The satisfactory values of the statistical performance indices, indicated the superiority of the optimized machine learning tools over the non-optimized tools in locating faults. In addition, this research confirmed the efficacy of the faulty section identification scheme based on overall accuracy. Furthermore, the presented results validated the robustness of the developed approach against the measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle.

Download Full-text