Research Risk Factors in Monitoring Well Drilling—A Case Study Using Machine Learning Methods

This article takes an approach to creating a machine learning model for the oil and gas industry. This task is dedicated to the most up-to-date issues of machine learning and artificial intelligence. One of the goals of this research was to build a model to predict the possible risks arising in the process of drilling wells. Drilling of wells for oil and gas production is a highly complex and expensive part of reservoir development. Thus, together with injury prevention, there is a goal to save cost expenditures on downtime and repair of drilling equipment. Nowadays, companies have begun to look for ways to improve the efficiency of drilling and minimize non-production time with the help of new technologies. To support decisions in a narrow time frame, it is valuable to have an early warning system. Such a decision support system will help an engineer to intervene in the drilling process and prevent high expenses of unproductive time and equipment repair due to a problem. This work describes a comparison of machine learning algorithms for anomaly detection during well drilling. In particular, machine learning algorithms will make it possible to make decisions when determining the geometry of the grid of wells—the nature of the relative position of production and injection wells at the production facility. Development systems are most often subdivided into the following: placement of wells along a symmetric grid, and placement of wells along a non-symmetric grid (mainly in rows). The tested models classify drilling problems based on historical data from previously drilled wells. To validate anomaly detection algorithms, we used historical logs of drilling problems for 67 wells at a large brownfield in Siberia, Russia. Wells with problems were selected and analyzed. It should be noted that out of the 67 wells, 20 wells were drilled without expenses for unproductive time. The experiential results illustrate that a model based on gradient boosting can classify the complications in the drilling process better than other models.

Download Full-text

Risk Factors Evaluation for Monitoring of Well Drilling

10.20944/preprints202105.0657.v1 ◽

2021 ◽

Author(s):

Shamil Islamov ◽

Alexey Grigoriev ◽

Ilya Beloglazov ◽

Sergey Savchenkov ◽

Ove Tobias Gudmestad

Keyword(s):

Anomaly Detection ◽

Oil And Gas ◽

New Technologies ◽

Warning System ◽

Time Frame ◽

Gas Production ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Drilling Process ◽

Well Drilling

Drilling of wells for oil and gas production is a highly complex and expensive part of reservoir development. Thus, together with injury prevention, there is a goal to save cost expenditures on downtime and repair of drilling equipment. Nowadays companies have begun to look for ways to improve the efficiency of drilling and minimize non-production time with the help of new technologies. To support decisions in a narrow time frame, it is valuable to have an early warning system. Such a decision support system will help an engineer to intervene in the drilling process and prevent high expenses of unproductive time and equipment repair due to a problem. This work is describing a comparison of machine learning algorithms for anomaly detection during well drilling. Tested models classify drilling problems based on historical data from previously drilled wells. To validate anomaly detection algorithms, we use historical logs of drilling problems for 67 wells at a large brownfield in Siberia, Russia. Wells with problems were selected and analyzed. It should be noted that out of the 67 wells, 20 wells were drilled without expenses for unproductive time. Experiential results illustrated that a model based on gradient boosting can classify the complications in the drilling process best of all.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Anomaly Detection in Market Data Structures Via Machine Learning Algorithms

SSRN Electronic Journal ◽

10.2139/ssrn.3516028 ◽

2020 ◽

Author(s):

Dirk Röder ◽

Henning Mueller

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Data Structures ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Market Data

Download Full-text

Anomaly Detection Technique for Intrusion Detection in SDN Environment using Continuous Data Stream Machine Learning Algorithms

2021 IEEE International Systems Conference (SysCon) ◽

10.1109/syscon48628.2021.9447092 ◽

2021 ◽

Author(s):

Admilson de Ribamar Lima Ribeiro ◽

Reneilson Yves Carvalho Santos ◽

Anderson Clayton Alves Nascimento

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Data Stream ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Detection Technique ◽

Continuous Data

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Detecting TCP Flood DDoS Attack by Anomaly Detection based on Machine Learning Algorithms

10.1109/ubmk52708.2021.9558989 ◽

2021 ◽

Author(s):

Berkay Ozcam ◽

H. Hakan Kilinc ◽

Abdul Halim Zaim

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ddos Attack

Download Full-text

Application of machine learning algorithms to predict permeability in tight sandstone formations

Nafta-Gaz ◽

10.18668/ng.2021.05.01 ◽

2021 ◽

Vol 77 (5) ◽

pp. 283-292

Author(s):

Tomasz Topór ◽

Keyword(s):

Machine Learning ◽

Oil And Gas ◽

Learning Algorithms ◽

Confining Pressure ◽

Machine Learning Algorithms ◽

Core Material ◽

Confining Stress ◽

Tight Sandstone ◽

Oil And Gas Exploration ◽

Better Than

The application of machine learning algorithms in petroleum geology has opened a new chapter in oil and gas exploration. Machine learning algorithms have been successfully used to predict crucial petrophysical properties when characterizing reservoirs. This study utilizes the concept of machine learning to predict permeability under confining stress conditions for samples from tight sandstone formations. The models were constructed using two machine learning algorithms of varying complexity (multiple linear regression [MLR] and random forests [RF]) and trained on a dataset that combined basic well information, basic petrophysical data, and rock type from a visual inspection of the core material. The RF algorithm underwent feature engineering to increase the number of predictors in the models. In order to check the training models’ robustness, 10-fold cross-validation was performed. The MLR and RF applications demonstrated that both algorithms can accurately predict permeability under constant confining pressure (R2 0.800 vs. 0.834). The RF accuracy was about 3% better than that of the MLR and about 6% better than the linear reference regression (LR) that utilized only porosity. Porosity was the most influential feature of the models’ performance. In the case of RF, the depth was also significant in the permeability predictions, which could be evidence of hidden interactions between the variables of porosity and depth. The local interpretation revealed the common features among outliers. Both the training and testing sets had moderate-low porosity (3–10%) and a lack of fractures. In the test set, calcite or quartz cementation also led to poor permeability predictions. The workflow that utilizes the tidymodels concept will be further applied in more complex examples to predict spatial petrophysical features from seismic attributes using various machine learning algorithms.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

A fault sensitivity analysis for anomaly detection in water distribution systems using Machine Learning algorithms

2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP) ◽

10.1109/iccp.2018.8516643 ◽

2018 ◽

Author(s):

Alexandru Predescu ◽

Mariana Mocanu ◽

Ciprian Lupu

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Anomaly Detection ◽

Distribution Systems ◽

Water Distribution ◽

Learning Algorithms ◽

Water Distribution Systems ◽

Machine Learning Algorithms ◽

Fault Sensitivity

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text