Well Completion Optimization in Unconventional Reservoirs Using Machine Learning Methods

Mapping Intimacies ◽

10.2118/206241-ms ◽

2021 ◽

Author(s):

Sohrat Baki ◽

Cenk Temizel ◽

Serkan Dursun

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Mapping Function ◽

Volume Effect ◽

Production Performance ◽

Unconventional Reservoirs ◽

Support Vector ◽

Feature Engineering ◽

Well Completion ◽

The Impact

Abstract Unconventional reservoirs, mainly shale oil and natural gas, will continue to significantly help meet the ever-growing energy demands of global markets. Being complex in nature and having ultra-tight producing zones, unconventionals depends on effective well completion and stimulation treatments in order to be successful and economical. Within the last decade, thousands of unconventional wells have been drilled, completed and produced in North America. The scope of this work is exploring the primary impact of completion parameters such as lateral length, frac type, number of stages, proppant and fluid volume effect on the production performance of the wells in unconventional fields. The key attributes in completion, stimulation, and production for the wells were considered in machine learning workflow for building predictive models. Predictive models based on Neural Networks, Support Vector Machines or Decision Tree Based ensemble models, serves as mapping function from completion parameters to production in each well in the field. The completion parameters were analyzed in the workflow with respect to feature engineering and interpretation. This analysis resulted in key performance indicators for the region. Then the optimum values for the best production performing completions were identified for each well. Predictive models in the workflow were analyzed in accuracy and best model is used to understand the impact of completion parameters on the production rates. This study outlines an overall machine learning workflow, from feature engineering to interpretation of the machine learning models to quantify the effects of completion parameters on the production rate of the wells in unconventional fields

Download Full-text

A Data-Analytics Tutorial: Building Predictive Models for Oil Production in an Unconventional Shale Reservoir

SPE Journal ◽

10.2118/189969-pa ◽

2018 ◽

Vol 23 (04) ◽

pp. 1075-1089 ◽

Cited By ~ 14

Author(s):

Jared Schuetter ◽

Srikanta Mishra ◽

Ming Zhong ◽

Randy LaFollette (ret.)

Keyword(s):

Predictive Models ◽

Decision Rules ◽

Regression Tree ◽

Production Performance ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Training Set ◽

Test Set ◽

Well Completion

Summary Considerable amounts of data are being generated during the development and operation of unconventional reservoirs. Statistical methods that can provide data-driven insights into production performance are gaining in popularity. Unfortunately, the application of advanced statistical algorithms remains somewhat of a mystery to petroleum engineers and geoscientists. The objective of this paper is to provide some clarity to this issue, focusing on how to build robust predictive models and how to develop decision rules that help identify factors separating good wells from poor performers. The data for this study come from wells completed in the Wolfcamp Shale Formation in the Permian Basin. Data categories used in the study included well location and assorted metrics capturing various aspects of well architecture, well completion, stimulation, and production. Predictive models for the production metric of interest are built using simple regression and other advanced methods such as random forests (RFs), support-vector regression (SVR), gradient-boosting machine (GBM), and multidimensional Kriging. The data-fitting process involves splitting the data into a training set and a test set, building a regression model on the training set and validating it with the test set. Repeated application of a “cross-validation” procedure yields valuable information regarding the robustness of each regression-modeling approach. Furthermore, decision rules that can identify extreme behavior in production wells (i.e., top x% of the wells vs. bottom x%, as ranked by the production metric) are generated using the classification and regression-tree algorithm. The resulting decision tree (DT) provides useful insights regarding what variables (or combinations of variables) can drive production performance into such extreme categories. The main contributions of this paper are to provide guidelines on how to build robust predictive models, and to demonstrate the utility of DTs for identifying factors responsible for good vs. poor wells.

Download Full-text

Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning

Electronics ◽

10.3390/electronics9020374 ◽

2020 ◽

Vol 9 (2) ◽

pp. 374 ◽

Cited By ~ 2

Author(s):

Sudhanshu Kumar ◽

Monika Gahalawat ◽

Partha Pratim Roy ◽

Debi Prosad Dogra ◽

Byung-Gyu Kim

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Age Groups ◽

Modern World ◽

Support Vector ◽

Digital Information ◽

Age And Gender ◽

And Gender ◽

The Impact

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Development of Machine Learning Models to Evaluate the Toughness of OPH Alloys

Materials ◽

10.3390/ma14216713 ◽

2021 ◽

Vol 14 (21) ◽

pp. 6713

Author(s):

Omid Khalaj ◽

Moslem Ghobadi ◽

Ehsan Saebnoori ◽

Alireza Zarezadeh ◽

Mohammadreza Shishesaz ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Mechanical Alloying ◽

Fuzzy Inference ◽

Oxide Dispersion Strengthened ◽

Machine Learning Techniques ◽

Support Vector ◽

Anfis Model ◽

Inference Systems ◽

The Impact

Oxide Precipitation-Hardened (OPH) alloys are a new generation of Oxide Dispersion-Strengthened (ODS) alloys recently developed by the authors. The mechanical properties of this group of alloys are significantly influenced by the chemical composition and appropriate heat treatment (HT). The main steps in producing OPH alloys consist of mechanical alloying (MA) and consolidation, followed by hot rolling. Toughness was obtained from standard tensile test results for different variants of OPH alloy to understand their mechanical properties. Three machine learning techniques were developed using experimental data to simulate different outcomes. The effectivity of the impact of each parameter on the toughness of OPH alloys is discussed. By using the experimental results performed by the authors, the composition of OPH alloys (Al, Mo, Fe, Cr, Ta, Y, and O), HT conditions, and mechanical alloying (MA) were used to train the models as inputs and toughness was set as the output. The results demonstrated that all three models are suitable for predicting the toughness of OPH alloys, and the models fulfilled all the desired requirements. However, several criteria validated the fact that the adaptive neuro-fuzzy inference systems (ANFIS) model results in better conditions and has a better ability to simulate. The mean square error (MSE) for artificial neural networks (ANN), ANFIS, and support vector regression (SVR) models was 459.22, 0.0418, and 651.68 respectively. After performing the sensitivity analysis (SA) an optimized ANFIS model was achieved with a MSE value of 0.003 and demonstrated that HT temperature is the most significant of these parameters, and this acts as a critical rule in training the data sets.

Download Full-text

Prioritizing Small Molecule as Candidates for Drug Repositioning using Machine Learning

10.1101/331975 ◽

2018 ◽

Author(s):

Khader Shameer ◽

Kipp W. Johnson ◽

Benjamin S. Glicksberg ◽

Rachel Hodos ◽

Ben Readhead ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Drug Repositioning ◽

Chemical Properties ◽

Support Vector ◽

Feature Engineering ◽

Connectivity Map ◽

Molecular Features

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.

Download Full-text

Fault detection for air conditioning system using machine learning

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i1.pp109-116 ◽

2020 ◽

Vol 9 (1) ◽

pp. 109

Author(s):

Noor Asyikin Sulaiman ◽

Md Pauzi Abdullah ◽

Hayati Abdullah ◽

Muhammad Noorazlan Shah Zainudin ◽

Azdiana Md Yusop

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Air Conditioning ◽

Machine Learning Algorithms ◽

Coefficient Of Performance ◽

Support Vector ◽

Air Conditioning System ◽

Learning Classifier ◽

Negative Impacts ◽

The Impact

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.

Download Full-text

Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2021/5524356 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hengrui Chen ◽

Hong Chen ◽

Ruiyu Zhou ◽

Zhizhen Liu ◽

Xiaoke Sun

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Crash Severity ◽

Apriori Algorithm ◽

Driving Mode ◽

Extreme Gradient Boosting ◽

The Impact

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

Download Full-text

COVID-19 Risk Factors, Economic Factors, and Epidemiological Factors nexus on Economic Impact: Machine Learning and Structural Equation Modelling Approaches

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.173 ◽

2021 ◽

pp. 395-405

Author(s):

David Opeoluwa Oyewola ◽

Emmanuel Gbenga Dada ◽

Juliana Ngozi Ndunagu ◽

Terrang Abubakar Umar ◽

Akinwunmi S.A

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Structural Equation Modeling ◽

Latent Variables ◽

Structural Equation ◽

Equation Modeling ◽

Support Vector ◽

Economic Factors ◽

Negative Effects ◽

The Impact

Since the declaration of COVID-19 as a global pandemic, it has been transmitted to more than 200 nations of the world. The harmful impact of the pandemic on the economy of nations is far greater than anything suffered in almost a century. The main objective of this paper is to apply Structural Equation Modeling (SEM) and Machine Learning (ML) to determine the relationships among COVID-19 risk factors, epidemiology factors and economic factors. Structural equation modeling is a statistical technique for calculating and evaluating the relationships of manifest and latent variables. It explores the causal relationship between variables and at the same time taking measurement error into account. Bagging (BAG), Boosting (BST), Support Vector Machine (SVM), Decision Tree (DT) and Random Forest (RF) Machine Learning techniques was applied to predict the impact of COVID-19 risk factors. Data from patients who came into contact with coronavirus disease were collected from Kaggle database between 23 January 2020 and 24 June 2020. Results indicate that COVID-19 risk factors have negative effects on epidemiology factors. It also has negative effects on economic factors.

Download Full-text

Flood Prediction and Warning System using Dam Data Monitoring

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7152.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 294-298

Keyword(s):

Machine Learning ◽

Flood Hazard ◽

Warning System ◽

Support Vector ◽

Flood Prediction ◽

Loss Of Life ◽

Related Data ◽

Dam Management ◽

Weather Parameters ◽

The Impact

Flood is one of the most devastating natural calamities affecting parts of the state from past few years. The recurring calamity necessitates an efficient early warning system since anticipation and preparedness play a key role in mitigating the impact. Though heavy and erratic rainfall has been marked as one of the main reasons for flood in several places, flood witnessed by various regions of Kerala was the result of sudden opening of reservoirs indicating poor dam management. The unforeseen flow of water often provided less time for evacuation. Prediction thus plays key role in avoiding loss of life and property, followed by such calamities. The vast benefits and potentials offered by Machine Learning makes it the most promising approach. The developed system is a model by taking Malampuzha Dam as reference. Support Vector Machine (SVM) is used as machine learning method for prediction and is programmed in python. The idea has been to create early flood prediction and warning system by monitoring different weather parameters and dam-related data. The feature vectors include current live storage, current reservoir level, rainfall and relative humidity from the period 2016-2019. Based on the analysis of these parameters, the open/closure of shutters of the dam is predicted. Release of shutters has varied impacts in the nearby regions and is measured by succeeding prediction, by mapping regions on grounds of level warning to be issued. Warning is issued through Flask-based server, by identifying vulnerable areas based on flood hazard reference for regions. The dam status prediction model delivered highest prediction accuracy of 99.14% and associated levels of warning has been generated in the development server, thus preventing unexpected release.

Download Full-text

The Influence of Inhomogeneous Input Data from Different Waves on Predictive Model Development for COVID-19 ICU Patients (Preprint)

10.2196/preprints.31539 ◽

2021 ◽

Author(s):

Sebastian Johannes Fritsch ◽

Konstantin Sharafutdinov ◽

Moein Einollahzadeh Samadi ◽

Gernot Marx ◽

Andreas Schuppert ◽

...

Keyword(s):

Machine Learning ◽

Convex Hull ◽

Prediction Models ◽

Model Development ◽

Predictive Performance ◽

Support Vector ◽

Good Prediction ◽

The Impact ◽

Second Wave ◽

Over Time

BACKGROUND During the course of the COVID-19 pandemic, a variety of machine learning models were developed to predict different aspects of the disease, such as long-term causes, organ dysfunction or ICU mortality. The number of training datasets used has increased significantly over time. However, these data now come from different waves of the pandemic, not always addressing the same therapeutic approaches over time as well as changing outcomes between two waves. The impact of these changes on model development has not yet been studied. OBJECTIVE The aim of the investigation was to examine the predictive performance of several models trained with data from one wave predicting the second wave´s data and the impact of a pooling of these data sets. Finally, a method for comparison of different datasets for heterogeneity is introduced. METHODS We used two datasets from wave one and two to develop several predictive models for mortality of the patients. Four classification algorithms were used: logistic regression (LR), support vector machine (SVM), random forest classifier (RF) and AdaBoost classifier (ADA). We also performed a mutual prediction on the data of that wave which was not used for training. Then, we compared the performance of models when a pooled dataset from two waves was used. The populations from the different waves were checked for heterogeneity using a convex hull analysis. RESULTS 63 patients from wave one (03-06/2020) and 54 from wave two (08/2020-01/2021) were evaluated. For both waves separately, we found models reaching sufficient accuracies up to 0.79 AUROC (95%-CI 0.76-0.81) for SVM on the first wave and up 0.88 AUROC (95%-CI 0.86-0.89) for RF on the second wave. After the pooling of the data, the AUROC decreased relevantly. In the mutual prediction, models trained on second wave´s data showed, when applied on first wave´s data, a good prediction for non-survivors but an insufficient classification for survivors. The opposite situation (training: first wave, test: second wave) revealed the inverse behaviour with models correctly classifying survivors and incorrectly predicting non-survivors. The convex hull analysis for the first and second wave populations showed a more inhomogeneous distribution of underlying data when compared to randomly selected sets of patients of the same size. CONCLUSIONS Our work demonstrates that a larger dataset is not a universal solution to all machine learning problems in clinical settings. Rather, it shows that inhomogeneous data used to develop models can lead to serious problems. With the convex hull analysis, we offer a solution for this problem. The outcome of such an analysis can raise concerns if the pooling of different datasets would cause inhomogeneous patterns preventing a better predictive performance.

Download Full-text