Petrofacies classification using machine learning algorithms

Carbonate reservoirs represent a large portion of the world’s oil and gas reserves, exhibiting specific characteristics that pose complex challenges to the reservoirs’ characterization, production, and management. Therefore, the evaluation of the relationships between the key parameters, such as porosity, permeability, water saturation, and pore size distribution, is a complex task considering only well-log data, due to the geologic heterogeneity. Hence, the petrophysical parameters are the key to assess the original composition and postsedimentological aspects of the carbonate reservoirs. The concept of reservoir petrofacies was proposed as a tool for the characterization and prediction of the reservoir quality as it combines primary textural analysis with laboratory measurements of porosity, permeability, capillary pressure, photomicrograph descriptions, and other techniques, which contributes to understanding the postdiagenetic events. We have adopted a workflow to petrofacies classification of a carbonate reservoir from the Campos Basin in southeastern Brazil, using the following machine learning methods: decision tree, random forest, gradient boosting, K-nearest neighbors, and naïve Bayes. The data set comprised 1477 wireline data from two wells (A3 and A10) that had petrofacies classes already assigned based on core descriptions. It was divided into two subsets, one for training and one for testing the capability of the trained models to assign petrofacies. The supervised-learning models have used labeled training data to learn the relationships between the input measurements and the petrofacies to be assigned. Additionally, we have developed a comparison of the models’ performance using the testing set according to accuracy, precision, recall, and F1-score evaluation metrics. Our approach has proved to be a valuable ally in petrofacies classification, especially for analyzing a well-logging database with no prior petrophysical information.

Download Full-text

Prediction of Water Saturation from Well Log Data by Machine Learning Algorithms: Boosting and Super Learner

Journal of Marine Science and Engineering ◽

10.3390/jmse9060666 ◽

2021 ◽

Vol 9 (6) ◽

pp. 666

Author(s):

Fahimeh Hadavimoghaddam ◽

Mehdi Ostadhassan ◽

Mohammad Ali Sadri ◽

Tatiana Bondarenko ◽

Igor Chebyshev ◽

...

Keyword(s):

Machine Learning ◽

Water Saturation ◽

Machine Learning Algorithms ◽

Rock Properties ◽

Gradient Boosting ◽

Data Set ◽

Log Data ◽

Gamma Density ◽

Super Learner ◽

Resistivity Log

Intelligent predictive methods have the power to reliably estimate water saturation (Sw) compared to conventional experimental methods commonly performed by petrphysicists. However, due to nonlinearity and uncertainty in the data set, the prediction might not be accurate. There exist new machine learning (ML) algorithms such as gradient boosting techniques that have shown significant success in other disciplines yet have not been examined for Sw prediction or other reservoir or rock properties in the petroleum industry. To bridge the literature gap, in this study, for the first time, a total of five ML code programs that belong to the family of Super Learner along with boosting algorithms: XGBoost, LightGBM, CatBoost, AdaBoost, are developed to predict water saturation without relying on the resistivity log data. This is important since conventional methods of water saturation prediction that rely on resistivity log can become problematic in particular formations such as shale or tight carbonates. Thus, to do so, two datasets were constructed by collecting several types of well logs (Gamma, density, neutron, sonic, PEF, and without PEF) to evaluate the robustness and accuracy of the models by comparing the results with laboratory-measured data. It was found that Super Learner and XGBoost produced the highest accurate output (R2: 0.999 and 0.993, respectively), and with considerable distance, Catboost and LightGBM were ranked third and fourth, respectively. Ultimately, both XGBoost and Super Learner produced negligible errors but the latest is considered as the best amongst all.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

Can Short and Partial Observations Reduce Model Error and Facilitate Machine Learning Prediction?

Entropy ◽

10.3390/e22101075 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1075

Author(s):

Nan Chen

Keyword(s):

Machine Learning ◽

Model Error ◽

Machine Learning Algorithms ◽

Training Data ◽

Conditional Sampling ◽

Data Set ◽

Partial Observations ◽

Sampling Algorithm ◽

Highly Nonlinear ◽

Non Gaussian

Predicting complex nonlinear turbulent dynamical systems is an important and practical topic. However, due to the lack of a complete understanding of nature, the ubiquitous model error may greatly affect the prediction performance. Machine learning algorithms can overcome the model error, but they are often impeded by inadequate and partial observations in predicting nature. In this article, an efficient and dynamically consistent conditional sampling algorithm is developed, which incorporates the conditional path-wise temporal dependence into a two-step forward-backward data assimilation procedure to sample multiple distinct nonlinear time series conditioned on short and partial observations using an imperfect model. The resulting sampled trajectories succeed in reducing the model error and greatly enrich the training data set for machine learning forecasts. For a rich class of nonlinear and non-Gaussian systems, the conditional sampling is carried out by solving a simple stochastic differential equation, which is computationally efficient and accurate. The sampling algorithm is applied to create massive training data of multiscale compressible shallow water flows from highly nonlinear and indirect observations. The resulting machine learning prediction significantly outweighs the imperfect model forecast. The sampling algorithm also facilitates the machine learning forecast of a highly non-Gaussian climate phenomenon using extremely short observations.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

Research Risk Factors in Monitoring Well Drilling—A Case Study Using Machine Learning Methods

Symmetry ◽

10.3390/sym13071293 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1293

Author(s):

Shamil Islamov ◽

Alexey Grigoriev ◽

Ilia Beloglazov ◽

Sergey Savchenkov ◽

Ove Tobias Gudmestad

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Oil And Gas ◽

New Technologies ◽

Learning Algorithms ◽

Gas Production ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Drilling Process ◽

Well Drilling

This article takes an approach to creating a machine learning model for the oil and gas industry. This task is dedicated to the most up-to-date issues of machine learning and artificial intelligence. One of the goals of this research was to build a model to predict the possible risks arising in the process of drilling wells. Drilling of wells for oil and gas production is a highly complex and expensive part of reservoir development. Thus, together with injury prevention, there is a goal to save cost expenditures on downtime and repair of drilling equipment. Nowadays, companies have begun to look for ways to improve the efficiency of drilling and minimize non-production time with the help of new technologies. To support decisions in a narrow time frame, it is valuable to have an early warning system. Such a decision support system will help an engineer to intervene in the drilling process and prevent high expenses of unproductive time and equipment repair due to a problem. This work describes a comparison of machine learning algorithms for anomaly detection during well drilling. In particular, machine learning algorithms will make it possible to make decisions when determining the geometry of the grid of wells—the nature of the relative position of production and injection wells at the production facility. Development systems are most often subdivided into the following: placement of wells along a symmetric grid, and placement of wells along a non-symmetric grid (mainly in rows). The tested models classify drilling problems based on historical data from previously drilled wells. To validate anomaly detection algorithms, we used historical logs of drilling problems for 67 wells at a large brownfield in Siberia, Russia. Wells with problems were selected and analyzed. It should be noted that out of the 67 wells, 20 wells were drilled without expenses for unproductive time. The experiential results illustrate that a model based on gradient boosting can classify the complications in the drilling process better than other models.

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Deep Neural Network for Multi-Class Prediction of Student Performance in Educational Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2155.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 5073-5081

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Student Performance ◽

Activation Function ◽

Machine Learning Algorithms ◽

Training Data ◽

Fine Tuning ◽

Academic Excellence ◽

Data Set

Prediction of student performance is the significant part in processing the educational data. Machine learning algorithms are leading the role in this process. Deep learning is one of the important concepts of machine learning algorithm. In this paper, we applied the deep learning technique for prediction of the academic excellence of the students using R Programming. Keras and Tensorflow libraries utilized for making the model using neural network on the Kaggle dataset. The data is separated into testing data training data set. Plot the neural network model using neuralnet method and created the Deep Learning model using two hidden layers using ReLu activation function and one output layer using softmax activation function. After fine tuning process until the stable changes; this model produced accuracy as 85%.

Download Full-text

Earthquake Prediction using Machine Learning Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e9110.018620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4684-4688

Keyword(s):

Machine Learning ◽

Structural Damage ◽

Data Science ◽

Learning Algorithm ◽

Economic Loss ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Science Data ◽

Data Set

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.

Download Full-text

Drilling Optimization Applying Machine Learning Regression Algorithms

10.4043/30934-ms ◽

2021 ◽

Author(s):

Freddy J. Marquez

Keyword(s):

Machine Learning ◽

Parameter Optimization ◽

Oil And Gas ◽

Model Performance ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Paper Machine ◽

Drilling Performance ◽

Sensitivity Parameter ◽

Extreme Gradient Boosting

Abstract Machine Learning is an artificial intelligence subprocess applied to automatically and quickly perform mathematical calculations to data in order to build models used to make predictions. Technical papers related to machine learning algorithms applications have being increasingly published in many oil and gas disciplines over the last five years, revolutionizing the way engineers approach to their works, and sharing innovating solutions that contributes to an increase in efficiency. In this paper, Machine Learning models are built to predict inverse rate of penetration (ROPI) and surface torque for a well located at Gulf of Mexico shallow waters. Three type of analysis were performed. Pre-drill analysis, predicting the parameters without any data of the target well in the database. Drilling analysis, running the model every sixty meters, updating the database with information of the target well and predicting the parameters ahead the bit. Sensitivity parameter optimization analysis was performed iterating weight on bit and rotary speed values as model inputs in order identify the optimum combination to deliver the best drilling performance under the given conditions. The Extreme Gradient Boosting (XGBoost) library in Python programming language environment, was used to build the models. Model performance was satisfactory, overcoming the challenge of using drilling parameters input manually by drilling bit engineers. The database was built with data from different fields and wells. Two databases were created to build the models, one of the models did not consider logging while drilling (LWD) data in order to determine its importance on the predictions. Pre-drill surface torque prediction showed better performance than ROPI. Predictions ahead the bit performance was good both for torque and ROPI. Sensitivity parameter optimization showed better resolution with the database that includes LWD data.

Download Full-text