Machine Learning Strategies for Accurate Log Prediction in Reservoir Characterization: Self-Calibrating Versus Domain-Knowledge

2021 ◽  
Author(s):  
Ahmed Reda Ali ◽  
Makky Sandra Jaya ◽  
Ernest A. Jones

Abstract Petrophysical evaluation is a crucial task for reservoir characterization but it is often complicated, time-consuming and associated with uncertainties. Moreover, this job is subjective and ambiguous depending on the petrophysicist's experience. Utilizing the flourishing Artificial Intelligence (AI)/Machine Learning (ML) is a way to build an automating process with minimal human intervention, improving consistency and efficiency of well log prediction and interpretation. Nowadays, the argument is whether AI-ML should base on a statistically self-calibrating or knowledge-based prediction framework! In this study, we develop a petrophysically knowledge-based AI-ML workflow that upscale sparsely-sampled core porosity and permeability into continuous curves along the entire well interval. AI-ML focuses on making predictions from analyzing data by learning and identifying patterns. The accuracy of the self-calibrating statistical models is heavily dependent on the volume of training data. The proposed AI-ML workflow uses raw well logs (gamma-ray, neutron and density) to predict porosity and permeability over the well interval using sparsely core data. The challenge in building the AI-ML model is the number of data points used for training showed an imbalance in the relative sampling of plugs, i.e. the number of core data (used as target variable) is less than 10%. Ensemble learning and stacking ML approaches are used to obtain maximum predictive performance of self-calibrating learning strategy. Alternatively, a new petrophysical workflow is established to debrief the domain experience in the feature selection that is used as an important weight in the regression problem. This helps ML model to learn more accurately by discovering hidden relationships between independent and target variables. This workflow is the inference engine of the AI-ML model to extract relevant domain-knowledge within the system that leads to more accurate predictions. The proposed knowledge-driven ML strategy achieved a prediction accuracy of R2 score = 87% (Correlation Coefficient (CC) of 96%). This is a significant improvement by R2 = 57% (CC = 62%) compared to the best performing self-calibrating ML models. The predicted properties are upscaled automatically to predict uncored intervals, improving data coverage and property population in reservoir models leading to the improvement of the model robustness. The high prediction accuracy demonstrates the potential of knowledge-driven AI-ML strategy in predicting rock properties under data sparsity and limitations and saving significant cost and time. This paper describes an AI-ML workflow that predicts high-resolution continuous porosity and permeability logs from imbalanced and sparse core plug data. The method successfully incorporates new type petrophysical facies weight as a feature augmentation engine for ML domain-knowledge framework. The workflow consisted of petrophysical treatment of raw data includes log quality control, preconditioning, processing, features augmentation and labelling, followed by feature selection to impersonate domain experience.

2015 ◽  
Vol 7 (4) ◽  
pp. 20-35 ◽  
Author(s):  
Chun-Kit Ngan ◽  
Lin Li

The authors propose a Hypoglycemic Expert Query Parametric Estimation (H-EQPE) model and a Linear Checkpoint (L-Checkpoint) algorithm to detect hypoglycemia of diabetes patients. The proposed approach combines the strengths of both domain-knowledge-based and machine-learning-based approaches to learn the optimal decision parameter over time series for monitoring the symptoms, in which the objective function (i.e., the maximal number of detections of hypoglycemia) is dependent on the optimal time point from which the parameter is learned. To evaluate the approach, the authors conducted an experiment on a dataset from the Diabetes Research in Children Network group. The L-Checkpoint algorithm learned the optimal monitoring decision parameter, 99 mg/dL, and achieved the maximal number of detections of hypoglycemic symptoms. The experiment shows that the proposed approach produces the results that are superior to those of the domain-knowledge-based and the machine-learning-based approaches, resulting in a 99.2% accuracy, 100% sensitivity, and 98.8% specificity.


Geophysics ◽  
2021 ◽  
pp. 1-67
Author(s):  
Luanxiao Zhao ◽  
Caifeng Zou ◽  
Yuanyuan Chen ◽  
Wenlong Shen ◽  
Yirong Wang ◽  
...  

Seismic prediction of fluid and lithofacies distributions is of great interest to reservoir characterization, geological model building, and flow unit delineation. Inferring fluids and lithofacies from seismic data under the framework of machine learning is commonly subject to issues of limited features, imbalanced data sets, and spatial constraints. As a consequence, an XGBoost based workflow, which takes feature engineering, data balancing, and spatial constraints into account, is proposed to predict the fluid and lithofacies distribution by integrating well-log and seismic data. The constructed feature set based on simple mathematical operations and domain knowledge outperforms the benchmark group consisting of conventional elastic attributes of P-impedance and Vp/Vs ratio. A radial basis function characterizing the weights of training samples according to the distances from the available wells to the target region is developed to impose spatial constraints on the model training process, significantly improving the prediction accuracy and reliability of gas sandstone. The strategy combining the synthetic minority oversampling technique (SMOTE) and spatial constraints further increases the F1 score of gas sandstone and also benefits the overall prediction performance of all the facies. The application of the combined strategy on prestack seismic inversion results generates a more geologically reasonable spatial distribution of fluids, thus verifying the robustness and effectiveness of the proposed workflow.


2020 ◽  
Author(s):  
Yulan Liang ◽  
Amin Gharipour ◽  
Erik Kelemen ◽  
Arpad Kelemen

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261571
Author(s):  
Sebastian Sager ◽  
Felix Bernhardt ◽  
Florian Kehrle ◽  
Maximilian Merkert ◽  
Andreas Potschka ◽  
...  

We propose a new method for the classification task of distinguishing atrial fibrillation (AFib) from regular atrial tachycardias including atrial flutter (AFlu) based on a surface electrocardiogram (ECG). Recently, many approaches for an automatic classification of cardiac arrhythmia were proposed and to our knowledge none of them can distinguish between these two. We discuss reasons why deep learning may not yield satisfactory results for this task. We generate new and clinically interpretable features using mathematical optimization for subsequent use within a machine learning (ML) model. These features are generated from the same input data by solving an additional regression problem with complicated combinatorial substructures. The resultant can be seen as a novel machine learning model that incorporates expert knowledge on the pathophysiology of atrial flutter. Our approach achieves an unprecedented accuracy of 82.84% and an area under the receiver operating characteristic (ROC) curve of 0.9, which classifies as “excellent” according to the classification indicator of diagnostic tests. One additional advantage of our approach is the inherent interpretability of the classification results. Our features give insight into a possibly occurring multilevel atrioventricular blocking mechanism, which may improve treatment decisions beyond the classification itself. Our research ideally complements existing textbook cardiac arrhythmia classification methods, which cannot provide a classification for the important case of AFib↔AFlu. The main contribution is the successful use of a novel mathematical model for multilevel atrioventricular block and optimization-driven inverse simulation to enhance machine learning for classification of the arguably most difficult cases in cardiac arrhythmia. A tailored Branch-and-Bound algorithm was implemented for the domain knowledge part, while standard algorithms such as Adam could be used for training.


2021 ◽  
Author(s):  
Mohammed A. Abbas ◽  
Watheq J. Al-Mudhafar

Abstract Estimating rock facies from petrophysical logs in non-cored wells in complex carbonates represents a crucial task for improving reservoir characterization and field development. Thus, it most essential to identify the lithofacies that discriminate the reservoir intervals based on their flow and storage capacity. In this paper, an innovative procedure is adopted for lithofacies classification using data-driven machine learning in a well from the Mishrif carbonate reservoir in the giant Majnoon oil field, Southern Iraq. The Random Forest method was adopted for lithofacies classification using well logging data in a cored well to predict their distribution in other non-cored wells. Furthermore, three advanced statistical algorithms: Logistic Boosting Regression, Bagging Multivariate Adaptive Regression Spline, and Generalized Boosting Modeling were implemented and compared to the Random Forest approach to attain the most realistic lithofacies prediction. The dataset includes the measured discrete lithofacies distribution and the original log curves of caliper, gamma ray, neutron porosity, bulk density, sonic, deep and shallow resistivity, all available over the entire reservoir interval. Prior to applying the four classification algorithms, a random subsampling cross-validation was conducted on the dataset to produce training and testing subsets for modeling and prediction, respectively. After predicting the discrete lithofacies distribution, the Confusion Table and the Correct Classification Rate Index (CCI) were employed as further criteria to analyze and compare the effectiveness of the four classification algorithms. The results of this study revealed that Random Forest was more accurate in lithofacies classification than other techniques. It led to excellent matching between the observed and predicted discrete lithofacies through attaining 100% of CCI based on the training subset and 96.67 % of the CCI for the validating subset. Further validation of the resulting facies model was conducted by comparing each of the predicted discrete lithofacies with the available ranges of porosity and permeability obtained from the NMR log. We observed that rudist-dominated lithofacies correlates to rock with higher porosity and permeability. In contrast, the argillaceous lithofacies correlates to rocks with lower porosity and permeability. Additionally, these high-and low-ranges of permeability were later compared with the oil rate obtained from the PLT log data. It was identified that the high-and low-ranges of permeability correlate well to the high- and low-oil rate logs, respectively. In conclusion, the high quality estimation of lithofacies in non-cored intervals and wells is a crucial reservoir characterization task in order to obtain meaningful permeability-porosity relationships and capture realistic reservoir heterogeneity. The application of machine learning techniques drives down costs, provides for time-savings, and allows for uncertainty mitigation in lithofacies classification and prediction. The entire workflow was done through R, an open-source statistical computing language. It can easily be applied to other reservoirs to attain for them a similar improved overall reservoir characterization.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Md Rahat Hossain ◽  
Amanullah Maung Than Oo ◽  
A. B. M. Shawkat Ali

This paper empirically shows that the effect of applying selected feature subsets on machine learning techniques significantly improves the accuracy for solar power prediction. Experiments are performed using five well-known wrapper feature selection methods to obtain the solar power prediction accuracy of machine learning techniques with selected feature subsets. For all the experiments, the machine learning techniques, namely, least median square (LMS), multilayer perceptron (MLP), and support vector machine (SVM), are used. Afterwards, these results are compared with the solar power prediction accuracy of those same machine leaning techniques (i.e., LMS, MLP, and SVM) but without applying feature selection methods (WAFS). Experiments are carried out using reliable and real life historical meteorological data. The comparison between the results clearly shows that LMS, MLP, and SVM provide better prediction accuracy (i.e., reduced MAE and MASE) with selected feature subsets than without selected feature subsets. Experimental results of this paper facilitate to make a concrete verdict that providing more attention and effort towards the feature subset selection aspect (e.g., selected feature subsets on prediction accuracy which is investigated in this paper) can significantly contribute to improve the accuracy of solar power prediction.


2021 ◽  
Vol 11 (10) ◽  
pp. 4499
Author(s):  
Mei-Ling Huang ◽  
Yun-Zhi Li

Major League Baseball (MLB) is the highest level of professional baseball in the world and accounts for some of the most popular international sporting events. Many scholars have conducted research on predicting the outcome of MLB matches. The accuracy in predicting the results of baseball games is low. Therefore, deep learning and machine learning methods were used to build models for predicting the outcomes (win/loss) of MLB matches and investigate the differences between the models in terms of their performance. The match data of 30 teams during the 2019 MLB season with only the starting pitcher or with all pitchers in the pitcher category were collected to compare the prediction accuracy. A one-dimensional convolutional neural network (1DCNN), a traditional machine learning artificial neural network (ANN), and a support vector machine (SVM) were used to predict match outcomes with fivefold cross-validation to evaluate model performance. The highest prediction accuracies were 93.4%, 93.91%, and 93.90% with the 1DCNN, ANN, SVM models, respectively, before feature selection; after feature selection, the highest accuracies obtained were 94.18% and 94.16% with the ANN and SVM models, respectively. The prediction results obtained with the three models were similar, and the prediction accuracies were much higher than those obtained in related studies. Moreover, a 1DCNN was used for the first time for predicting the outcome of MLB matches, and it achieved a prediction accuracy similar to that achieved by machine learning methods.


Sign in / Sign up

Export Citation Format

Share Document