Predicting the Rock Sonic Logs While Drilling by Random Forest and Decision Tree-Based Algorithms

Abstract The sonic data provides significant rock properties that are commonly used for designing the operational programs for drilling, rock fracturing, and development operations. The conventional methods for acquiring the rock sonic data in terms of compressional and shear slowness (ΔTc and ΔTs) are considered costly and time-consuming operations. The target of this paper is to proposed machine learning models for predicting the sonic logs from the drilling data in real-time. Decision tree (DT) and random forest (RF) were employed as train-based algorithms for building the sonic prediction models for drilling complex lithology rocks that have limestone, sandstone, shale, and carbonate formations. The input data for the models include the surface drilling parameters to predict the shear and compressional slowness. The study employed data set of 2888 data points for building and testing the model, while another collected 2863 data set was utilized for further validation for the sonic models. Sensitivity investigations were performed for DT and RF models to confirm optimal accuracy. The correlation of coefficient (R), and average absolute percentage error (AAPE) were used to check the models' accuracy between the actual values and models` outputs, in addition to, the sonic log profiles. The results indicated that the developed sonic models have a high capability for the sonic prediction from the drilling data as DT model recorded R higher than 0.967 and AAPE less than 2.76% for ΔTc and ΔTs models, while RF showed R higher than 0.991 with AAPE less than 1.07%. The further validation process for the developed models indicated the great results for the sonic prediction and RF model outperformed DT models as RF showed R higher than 0.986 with AAPE less than 1.12% while DT prediction recorded R greater than 0.93 with AAPE less than 1.95%. The sonic prediction through the developed models will save the cost and time for acquiring the sonic data through the conventional methods and will provide real-time estimation from the drilling parameters.

Download Full-text

Rock Strength Prediction in Real-Time while Drilling Employing Random Forest and Functional Network Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4050843 ◽

2021 ◽

pp. 1-21

Author(s):

Hany Gamal ◽

Ahmed Alsaihati ◽

Salaheldin Elkatatny ◽

Saleh Haidary ◽

Abdulazeez Abdulraheem

Keyword(s):

Random Forest ◽

Real Time ◽

Rock Strength ◽

Prediction Models ◽

Functional Network ◽

Percentage Error ◽

Data Set ◽

Unseen Data ◽

Drilling Data ◽

Data Points

Abstract The rock unconfined compressive strength (UCS) is one of the key parameters for geomechanical and reservoir modeling in the petroleum industry. Obtaining the UCS by conventional methods such as experimental work or empirical correlation from logging data are time consuming and highly cost. To overcome these drawbacks, this paper utilized the help of artificial intelligence (AI) to predict (in a real-time) the rock strength from the drilling parameters using two AI tools. Random forest (RF) based on principal component analysis (PCA), and functional network (FN) techniques were employed to build two UCS prediction models based on the drilling data such as weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q), and the rate of penetration (ROP). The models were built using 2,333 data points from well (A) with 70:30 training to testing ratio. The models were validated using unseen data set (1,300 data points) of Well (B) which is located in the same field and drilled across the same complex lithology. The results of the PCA-based RF model outperformed the FN in terms of correlation coefficient (R) and average absolute percentage error (AAPE). The overall accuracy for PCA-based RF was R of 0.99 and AAPE of 4.3 %, and for FN yielded R of 0.97 and AAPE of 8.5%. The validation results showed that R was 0.99 for RF and 0.96 for FN, while the AAPE was 4 and 7.9 % for RF and FN models, respectively. The developed PCA-based RF and FN models provide an accurate UCS estimation in real-time from the drilling data, saving time and cost and enhancing the well stability by generating UCS log from the rig drilling data.

Download Full-text

Application of Machine Learning for Lithology-on-Bit Prediction using Drilling Data in Real-Time

10.2118/206622-ms ◽

2021 ◽

Author(s):

Temirlan Zhekenov ◽

Artem Nechaev ◽

Kamilla Chettykbayeva ◽

Alexey Zinovyev ◽

German Sardarov ◽

...

Keyword(s):

Machine Learning ◽

Data Quality ◽

Real Time ◽

Hybrid Modeling ◽

Data Set ◽

Drilling Parameters ◽

Drilling Data ◽

Stable Solutions ◽

Mud Logging

SUMMARY Researchers base their analysis on basic drilling parameters obtained during mud logging and demonstrate impressive results. However, due to limitations imposed by data quality often present during drilling, those solutions often tend to lose their stability and high levels of predictivity. In this work, the concept of hybrid modeling was introduced which allows to integrate the analytical correlations with algorithms of machine learning for obtaining stable solutions consistent from one data set to another.

Download Full-text

Predicting employee attrition using tree-based models

International Journal of Organizational Analysis ◽

10.1108/ijoa-10-2019-1903 ◽

2020 ◽

Vol 28 (6) ◽

pp. 1273-1291

Author(s):

Nesreen El-Rayes ◽

Ming Fang ◽

Michael Smith ◽

Stephen M. Taylor

Keyword(s):

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Binary Classification ◽

Primary Role ◽

Classification Models ◽

Job Transition ◽

Data Set ◽

Content Type ◽

Employee Attrition

Purpose The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes. Design/methodology/approach A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition. Findings Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm. Practical implications This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models. Originality/value This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Download Full-text

Intelligent Prediction of The Equivalent Circulating Density From Surface Data Sensors During Drilling By Employing Machine Learning Techniques

10.21203/rs.3.rs-154257/v1 ◽

2021 ◽

Author(s):

Hany Gamal ◽

Ahmed Abdelaal ◽

Salaheldin Elkatatny

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Machine Learning Techniques ◽

Sensitivity Analyses ◽

Percentage Error ◽

Model Parameters ◽

Data Set ◽

Inference System ◽

Mathematical Correlation ◽

Drilling Parameters

Abstract The precise control for the equivalent circulating density (ECD) will lead to evade well control issues like loss of circulation, formation fracturing, underground blowout, and surface blowout. Predicting the ECD from the drilling parameters is a new horizon in drilling engineering practices and this is because of the drawbacks of the cost of downhole ECD tools and the low accuracy of the mathematical models. Machine learning methods can offer a superior prediction accuracy over the traditional and statistical models due to the advanced computing capacity. Hence, the objective of this paper is to use the artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) techniques to develop ECD prediction models. The novel contribution for this study is predicting the downhole ECD without any need for downhole measurements but only the available surface drilling parameters. The data in this study covered the drilling data for a horizontal section with 3,570 readings for each input after data preprocessing. The data covered the mud rate, rate of penetration, drill string speed, standpipe pressure, weight on bit, and the drilling torque. The data used to build the model with a 77:23 training to testing ratio. Another data set (1,150 data points) from the same field was used for models` validation. Many sensitivity analyses were done to optimize the ANN and ANFIS model parameters. The prediction of the developed machine learning models provided a high performance and accuracy level with a correlation coefficient (R) of 0.99 for the models' training and testing data sets, and an average absolute percentage error (AAPE) less than 0.24%. The validation results showed R of 0.98 and 0.96 and AAPE of 0.30% and 0.69% for ANN and ANFIS models respectively. Besides, a mathematical correlation was developed for estimating ECD based on the inputs as a white-box model.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Generation of a Complete Profile for Porosity Log While Drilling Complex Lithology by Employing the Artificial Intelligence

10.2118/208642-ms ◽

2021 ◽

Author(s):

Ahmed Al-Sabaa ◽

Hany Gamal ◽

Salaheldin Elkatatny

Keyword(s):

Artificial Intelligence ◽

Prediction Model ◽

Real Time ◽

Storage Capacity ◽

Data Set ◽

Drilling Parameters ◽

Unseen Data ◽

Rock Porosity ◽

Data Points ◽

Logging Tool

Abstract The formation porosity of drilled rock is an important parameter that determines the formation storage capacity. The common industrial technique for rock porosity acquisition is through the downhole logging tool. Usually logging while drilling, or wireline porosity logging provides a complete porosity log for the section of interest, however, the operational constraints for the logging tool might preclude the logging job, in addition to the job cost. The objective of this study is to provide an intelligent prediction model to predict the porosity from the drilling parameters. Artificial neural network (ANN) is a tool of artificial intelligence (AI) and it was employed in this study to build the porosity prediction model based on the drilling parameters as the weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q). The novel contribution of this study is to provide a rock porosity model for complex lithology formations using drilling parameters in real-time. The model was built using 2,700 data points from well (A) with 74:26 training to testing ratio. Many sensitivity analyses were performed to optimize the ANN model. The model was validated using unseen data set (1,000 data points) of Well (B), which is located in the same field and drilled across the same complex lithology. The results showed the high performance for the model either for training and testing or validation processes. The overall accuracy for the model was determined in terms of correlation coefficient (R) and average absolute percentage error (AAPE). Overall, R was higher than 0.91 and AAPE was less than 6.1 % for the model building and validation. Predicting the rock porosity while drilling in real-time will save the logging cost, and besides, will provide a guide for the formation storage capacity and interpretation analysis.

Download Full-text

MACHINE LEARNING ON CONGESTION ANALYSIS BASED REAL-TIME NAVIGATION SYSTEM

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011000346 ◽

2011 ◽

Vol 20 (04) ◽

pp. 753-781

Author(s):

KAI CHEN ◽

KIA MAKKI ◽

NIKI PISSINOU

Keyword(s):

Machine Learning ◽

Travel Time ◽

Real Time ◽

Navigation System ◽

Prediction Models ◽

Urban Traffic ◽

Route Guidance ◽

Highway Traffic ◽

Data Set ◽

Guidance Systems

In the metropolitan region, most congestion or traffic jams are caused by the uneven distribution of traffic flow that creates bottleneck points where the traffic volume exceeds the road capacity. Additionally, unexpected incidents are the next most probable cause of these bottleneck regions. Moreover, most drivers are driving based on their empirical experience without awareness of real-time traffic situations. This unintelligent traffic behavior can make the congestion problem worse. Prediction based route guidance systems show great improvements in solving the inefficient diversion strategy problem by estimating future travel time when calculating accurate travel time is difficult. However, performances of machine learning based prediction models that are based on the historical data set degrade sharply during a congestion situation. This paper develops a new navigation system for reducing travel time of an individual driver and distributing the flow of urban traffic efficiently in order to reduce the occurrence of congestion. Compared with previous route guidance systems, the results reveal that our system, applying the advanced multi-lane prediction based real-time fastest path (AMPRFP) algorithm, can significantly reduce the travel time especially when drivers travel in a complex route environment and face frequent congestion problems. Unlike the previous system,1 it can be applied either for single lane or multi-lane urban traffic networks where the reason for congestion is significantly complex. We also demonstrate the advantages of this system and verify the results using real highway traffic data and a synthetic experiment.

Download Full-text

A Pattern Recognition Diagnostic Model to Restore and Emulate Knee Mobility

10.1101/2021.12.23.21267314 ◽

2021 ◽

Author(s):

Bayu Sukmanto ◽

Sadaira Packer ◽

Muhammad Gulfam ◽

David Hollinger

Keyword(s):

Pattern Recognition ◽

Random Forest ◽

Decision Tree ◽

Least Squares ◽

Real Time ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Secondary Outcome ◽

Least Squares Regression ◽

Tree Classifier

Electromyography (EMG) is an electrical voltage potential linked to muscle contraction, resulting in human joint motion, such as knee flexion. Knee injuries, such as knee osteoarthritis (KOA), disrupt functional mobility of the knee joint and subsequently atrophy the muscles controlling knee movement during activities of daily living (ADL). Consequently, weakened muscles exhibiting deteriorated EMG signal fidelity are hypothesized to have discernible signal patterns from a healthy individual's EMG signals. Pattern recognition algorithms are useful for mapping a set of complex inputs (EMG signals and knee angles) to classify knee health status (injured vs. healthy). A secondary outcome is to predict future knee angles from previous input signals to inform a robotic knee exoskeleton to apply real-time torque assistance to a patient during ADL. A Decision Tree Classifier, Random Forest, Naive Bayes, and a Feed-Forward Neural Network (Fully Connected) were used for binary classification (healthy vs. injured). Partial Least Squares Regression, Decision Tree Regressor, and XGBoost were used to predict future joint angles for the regression task (knee angle prediction). Overall, the Random Forest Classifier had the best overall classification performance. XGBoost and Decision Tree Regression performed the best among regression algorithms for predicting real-time angles during walking while Partial Least Squares Regression performed the best during the standing tasks. In summary, our Machine Learning methods are useful for assisting clinicians and patients during physical rehabilitation by providing quantitative insight into the patient's neuromuscular control of the knee.

Download Full-text

Booking Prediction Models for Peer-to-peer Accommodation Listings using Logistics Regression, Decision Tree, K-Nearest Neighbor, and Random Forest Classifiers

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.6.2.123-132 ◽

2020 ◽

Vol 6 (2) ◽

pp. 123

Author(s):

Mochammad Agus Afrianto ◽

Meditya Wasesa

Keyword(s):

Random Forest ◽

Decision Tree ◽

Revenue Management ◽

Nearest Neighbor ◽

Prediction Models ◽

Model Development ◽

Peer To Peer ◽

K Nearest Neighbor ◽

Logistics Regression ◽

Roc Score

Background: Literature in the peer-to-peer accommodation has put a substantial focus on accommodation listings' price determinants. Developing prediction models related to the demand for accommodation listings is vital in revenue management because accurate price and demand forecasts will help determine the best revenue management responses.Objective: This study aims to develop prediction models to determine the booking likelihood of accommodation listings.Methods: Using an Airbnb dataset, we developed four machine learning models, namely Logistics Regression, Decision Tree, K-Nearest Neighbor (KNN), and Random Forest Classifiers. We assessed the models using the AUC-ROC score and the model development time by using the ten-fold three-way split and the ten-fold cross-validation procedures.Results: In terms of average AUC-ROC score, the Random Forest Classifiers outperformed other evaluated models. In three-ways split procedure, it had a 15.03% higher AUC-ROC score than Decision Tree, 2.93 % higher than KNN, and 2.38% higher than Logistics Regression. In the cross-validation procedure, it has a 26,99% higher AUC-ROC score than Decision Tree, 4.41 % higher than KNN, and 3.31% higher than Logistics Regression. It should be noted that the Decision Tree model has the lowest AUC-ROC score, but it has the smallest model development time.Conclusion: The performance of random forest models in predicting booking likelihood of accommodation listings is the most superior. The model can be used by peer-to-peer accommodation owners to improve their revenue management responses.

Download Full-text

A Comparative Analysis of Machine/Deep Learning Models for Parking Space Availability Prediction

Sensors ◽

10.3390/s20010322 ◽

2020 ◽

Vol 20 (1) ◽

pp. 322 ◽

Cited By ~ 9

Author(s):

Faraz Malik Awan ◽

Yasir Saleem ◽

Roberto Minerva ◽

Noel Crespi

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Random Forest ◽

Decision Tree ◽

Multilayer Perceptron ◽

Large Data ◽

Data Sets ◽

Application Domain ◽

Parking Space ◽

Data Set

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander’s parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.

Download Full-text