Boosting Algorithm Choice in Predictive Machine Learning Models for Fracturing Applications

Mapping Intimacies ◽

10.2118/205642-ms ◽

2021 ◽

Author(s):

Abdul Muqtadir Khan

Keyword(s):

Machine Learning ◽

Data Science ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Injection Rate ◽

Model Construction ◽

Gradient Boosting ◽

Light Gradient ◽

Fracture Damage ◽

Boosting Technique

Abstract With the advancement in machine learning (ML) applications, some recent research has been conducted to optimize fracturing treatments. There are a variety of models available using various objective functions for optimization and different mathematical techniques. There is a need to extend the ML techniques to optimize the choice of algorithm. For fracturing treatment design, the literature for comparative algorithm performance is sparse. The research predominantly shows that compared to the most commonly used regressors and classifiers, some sort of boosting technique consistently outperforms on model testing and prediction accuracy. A database was constructed for a heterogeneous reservoir. Four widely used boosting algorithms were used on the database to predict the design only from the output of a short injection/falloff test. Feature importance analysis was done on eight output parameters from the falloff analysis, and six were finalized for the model construction. The outputs selected for prediction were fracturing fluid efficiency, proppant mass, maximum proppant concentration, and injection rate. Extreme gradient boost (XGBoost), categorical boost (CatBoost), adaptive boost (AdaBoost), and light gradient boosting machine (LGBM) were the algorithms finalized for the comparative study. The sensitivity was done for a different number of classes (four, five, and six) to establish a balance between accuracy and prediction granularity. The results showed that the best algorithm choice was between XGBoost and CatBoost for the predicted parameters under certain model construction conditions. The accuracy for all outputs for the holdout sets varied between 80 and 92%, showing robust significance for a wider utilization of these models. Data science has contributed to various oil and gas industry domains and has tremendous applications in the stimulation domain. The research and review conducted in this paper add a valuable resource for the user to build digital databases and use the appropriate algorithm without much trial and error. Implementing this model reduced the complexity of the proppant fracturing treatment redesign process, enhanced operational efficiency, and reduced fracture damage by eliminating minifrac steps with crosslinked gel.

Download Full-text

Machine Learning and Data Science in the Oil and Gas Industry

10.1016/c2019-0-02033-x ◽

2021 ◽

Keyword(s):

Machine Learning ◽

Data Science ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Gas Industry

Download Full-text

Case Study ROP Modeling Using Random Forest Regression and Gradient Boosting in the Hanover Region in Germany

Volume 11: Petroleum Technology ◽

10.1115/omae2020-18677 ◽

2020 ◽

Author(s):

Patrick Höhn ◽

Felix Odebrett ◽

Carlos Paz ◽

Joachim Oppelt

Keyword(s):

Random Forest ◽

Data Science ◽

Oil And Gas ◽

Large Data ◽

Oil And Gas Industry ◽

Study Data ◽

Gradient Boosting ◽

Data Sets ◽

Lower Saxony ◽

Drilling Performance

Abstract Reduction of drilling costs in the oil and gas industry and the geothermal energy sector is the main driver for major investments in drilling optimization research. The best way to reduce drilling costs is to minimize the overall time needed for drilling a well. This can be accomplished by optimizing the non-productive time during an operation, and through increasing the rate of penetration (ROP) while actively drilling. ROP has already been modeled in the past using empirical correlations. However, nowadays, methods from data science can be applied to the large data sets obtained during drilling operations, both for real-time prediction of drilling performance and for analysis of historical data sets during the evaluation of previous drilling activities. In the current study, data from a geothermal well in the Hanover region in Lower Saxony (Germany) were used to train machine learning models using Random Forest™ regression and Gradient Boosting. Both techniques showed promising results for modeling ROP.

Download Full-text

Review of machine learning algorithms' application in pharmaceutical technology

Arhiv za farmaciju ◽

10.5937/arhfarm71-32499 ◽

2021 ◽

Vol 71 (4) ◽

pp. 302-317

Author(s):

Jelena Đuriš ◽

Ivana Kurćubić ◽

Svetlana Ibrić

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Formulation Development ◽

Light Gradient ◽

Pharmaceutical Technology ◽

Wide Range

Machine learning algorithms, and artificial intelligence in general, have a wide range of applications in the field of pharmaceutical technology. Starting from the formulation development, through a great potential for integration within the Quality by design framework, these data science tools provide a better understanding of the pharmaceutical formulations and respective processing. Machine learning algorithms can be especially helpful with the analysis of the large volume of data generated by the Process analytical technologies. This paper provides a brief explanation of the artificial neural networks, as one of the most frequently used machine learning algorithms. The process of the network training and testing is described and accompanied with illustrative examples of machine learning tools applied in the context of pharmaceutical formulation development and related technologies, as well as an overview of the future trends. Recently published studies on more sophisticated methods, such as deep neural networks and light gradient boosting machine algorithm, have been described. The interested reader is also referred to several official documents (guidelines) that pave the way for a more structured representation of the machine learning models in their prospective submissions to the regulatory bodies.

Download Full-text

Sarcasm detection of tweets without #sarcasm: data science approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i2.pp993-1001 ◽

2021 ◽

Vol 23 (2) ◽

pp. 993

Author(s):

Rupali Amit Bagate ◽

R. Suguna

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Short Term Memory ◽

Confusion Matrix ◽

Research Work ◽

Gradient Boosting ◽

Specific Context ◽

Machine Learning Classification ◽

Light Gradient

Identifying sarcasm present in the text could be a challenging work. In sarcasm, a negative word can flip the polarity of a positive sentence. Sentences can be classified as sarcastic or non-sarcastic. It is easier to identify sarcasm using facial expression or tonal weight rather detecting from plain text. Thus, sarcasm detection using natural language processing is major challenge without giving away any specific context or clue such as #sarcasm present in a tweet. Therefore, research tries to solve this classification problem using various optimized models. Proposed model, analyzes whether a given tweet, is sarcastic or not without the presnece of hashtag sarcasm or any kind of specific context present in text. To achieve better results, we used different machine learning classification methodology along with deep learning embedding techniques. Our optimized model uses a stacking technique which combines the result of logistic regression and long short-term memory (LSTM) recurrent neural net feed to light gradient boosting technique which generates better result as compare to existing machine learning and neural network algorithm. The key difference of our research work is sarcasm detection done without #sarcasm which has not been much explored earlier by any researcher. The metrics used for evolutionis F1-score and confusion matrix.

Download Full-text

A systematic review of data science and machine learning applications to the oil and gas industry

Journal of Petroleum Exploration and Production Technology ◽

10.1007/s13202-021-01302-2 ◽

2021 ◽

Author(s):

Zeeshan Tariq ◽

Murtada Saleh Aljawad ◽

Amjed Hasan ◽

Mobeen Murtaza ◽

Emad Mohammed ◽

...

Keyword(s):

Machine Learning ◽

Analytical Solutions ◽

Data Science ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Petroleum Exploration ◽

Petroleum Engineering ◽

Oil Well ◽

Data Generation ◽

Gas Industry

AbstractThis study offered a detailed review of data sciences and machine learning (ML) roles in different petroleum engineering and geosciences segments such as petroleum exploration, reservoir characterization, oil well drilling, production, and well stimulation, emphasizing the newly emerging field of unconventional reservoirs. The future of data science and ML in the oil and gas industry, highlighting what is required from ML for better prediction, is also discussed. This study also provides a comprehensive comparison of different ML techniques used in the oil and gas industry. With the arrival of powerful computers, advanced ML algorithms, and extensive data generation from different industry tools, we see a bright future in developing solutions to the complex problems in the oil and gas industry that were previously beyond the grip of analytical solutions or numerical simulation. ML tools can incorporate every detail in the log data and every information connected to the target data. Despite their limitations, they are not constrained by limiting assumptions of analytical solutions or by particular data and/or power processing requirements of numerical simulators. This detailed and comprehensive study can serve as an exclusive reference for ML applications in the industry. Based on the review conducted, it was found that ML techniques offer a great potential in solving problems in almost all areas of the oil and gas industry involving prediction, classification, and clustering. With the generation of huge data in everyday oil and gas industry activates, machine learning and big data handling techniques are becoming a necessity toward a more efficient industry.

Download Full-text

An Online Microcredential Certification Program to Upskill Petrotechnical Professionals in Data Analytics and Machine Learning with an Upstream Oil and Gas Industry Focus

10.2118/205921-ms ◽

2021 ◽

Author(s):

Kalyanaraman Venugopal ◽

Dvijesh Shastri ◽

Suryanarayanan Radhakrishnan ◽

Ramanan Krishnamoorti

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Data Science ◽

Oil And Gas ◽

Opportunity To Learn ◽

Oil And Gas Industry ◽

Visual Programming ◽

Digital Transformation ◽

Pilot Program ◽

Certification Program

Abstract The upstream oil and gas industry's digital transformation over the last few years has accelerated because of the COVID-19 pandemic. Data analytics and machine learning are key components of this digital transformation and have become essential skills for experienced petrotechnical professionals (PTPs) and aspiring entrants into the field. The objective of our work was to design and deliver a practical, engaging, and online microcredential certification program in upstream energy data analytics for PTPs. The program was conceived as a collaboration between academia (University of Houston's UH Energy) and industry (NExT, a Schlumberger company). It was designed as three belt levels (Bronze, Silver, and Gold), each containing three stackable badges of 12 to 15 hours duration per badge. Key design points included Identifying an online platform for administration Delivering convenient, interactive, live online sessions Delivering hybrid classes blending lectures and hands-on laboratories Designing laboratories using upstream datasets across various stages of oilfield expertise Administering test and quizzes, Kaggle competitions, and team projects. The program contents were designed incorporating appropriate instructional design practices for effective online class delivery. The design and delivery of the laboratories using a code-free approach by leveraging visual programming offers PTPs and new entrants a unique opportunity to learn data analytics concepts without the traditional concern of learning to code. Additionally, the collaboration between academia and industry enables delivering a program that combines academic rigor with application of the skills and knowledge to solve problems facing the industry using the real-world datasets. As a pilot program, all three badges of the Bronze belt were scheduled and successfully delivered during July and August 2020, as six 2-hour sessions per badge. From a total of 26 students registered in badge 1, 24 completed it, resulting in a completion rate of 92%. Out of these students, 19 registered and completed badge 2 and badge 3, resulting in the completion rates of 100%. Based on the success of the pilot program, a second delivery of the Bronze belt with 18 participants was offered from October 2020 through January 2021. All 18 participants completed all three badges. Feedback from participants attests to the success of the pilot program as seen in the following excerpts: "A very good course and instructors. I have already recommended the course to a friend and I will continue to be an advocate for the course." "Teachers are very receptive to questions and it is a joy to hear their lectures." "I found the University of Houston course to be both highly engaging and incredibly informative. The course teaches basic principles of data science without being bogged down by the specific coding language."

Download Full-text

Simulation of proppant transport and fracture plugging in the framework of a radial hydraulic fracturing model

Russian Journal of Numerical Analysis and Mathematical Modelling ◽

10.1515/rnam-2020-0027 ◽

2020 ◽

Vol 35 (6) ◽

pp. 325-339

Author(s):

Vasily N. Lapin ◽

Denis V. Esipov

Keyword(s):

Numerical Model ◽

Hydraulic Fracturing ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Injection Rate ◽

Fluid Injection ◽

Subtraction Technique ◽

Gas Industry ◽

Proppant Transport ◽

Hydraulic Fracture Propagation

AbstractHydraulic fracturing technology is widely used in the oil and gas industry. A part of the technology consists in injecting a mixture of proppant and fluid into the fracture. Proppant significantly increases the viscosity of the injected mixture and can cause plugging of the fracture. In this paper we propose a numerical model of hydraulic fracture propagation within the framework of the radial geometry taking into account the proppant transport and possible plugging. The finite difference method and the singularity subtraction technique near the fracture tip are used in the numerical model. Based on the simulation results it was found that depending on the parameters of the rock, fluid, and fluid injection rate, the plugging can be caused by two reasons. A parameter was introduced to separate these two cases. If this parameter is large enough, then the plugging occurs due to reaching the maximum possible concentration of proppant far from the fracture tip. If its value is small, then the plugging is caused by the proppant reaching a narrow part of the fracture near its tip. The numerical experiments give an estimate of the radius of the filled with proppant part of the fracture for various injection rates and leakages into the rock.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Data Science Applied to Pedagogical Methodologies Focused on Changing the Negative Perception of the Oil and Gas Industry in Colombia

10.2118/201582-ms ◽

2020 ◽

Author(s):

Israel Guevara ◽

David Ardila ◽

Kevin Daza ◽

Oscar Ovalle ◽

Paola Pastor ◽

...

Keyword(s):

Data Science ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Gas Industry ◽

Negative Perception

Download Full-text

RegioML: Predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning

10.33774/chemrxiv-2021-l2fvl ◽

2021 ◽

Author(s):

Nicolai Ree ◽

Andreas H. Göller ◽

Jan H. Jensen

Keyword(s):

Machine Learning ◽

Tight Binding ◽

Reaction Centers ◽

Gradient Boosting ◽

Electrophilic Aromatic Substitution ◽

Aromatic Substitution ◽

Substitution Reactions ◽

Test Set ◽

Light Gradient ◽

Out Of Sample

We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with the ensemble decision tree variant light gradient boosting machine (LightGBM). The model is trained and tested on 21,201 bromination reactions with 101K reaction centers, which is split into a training, test, and out-of-sample datasets with 58K, 15K, and 27K reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to the graph-based WLN method developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (J. Cheminform. 2021, 13:10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data.

Download Full-text