Rock Type Classification Models Interpretability Using Shapley Values

2021 ◽  
Author(s):  
Anton Georgievich Voskresenskiy ◽  
Nikita Vladimirovich Bukhanov ◽  
Maria Alexandrovna Kuntsevich ◽  
Oksana Anatolievna Popova ◽  
Alexey Sergeevich Goncharov

Abstract We propose a methodology to improve rock type classification using machine learning (ML) techniques and to reveal causal inferences between reservoir quality and well log measurements. Rock type classification is an essential step in accurate reservoir modeling and forecasting. Machine learning approaches allow to automate rock type classification based on different well logs and core data. In order to choose the best model which does not progradate uncertainty further into the workflow it is important to interpret machine learning results. Feature importance and feature selection methods are usually employed for that. We propose an extension to existing approaches - model agnostic sensitivity algorithm based on Shapley values. The paper describes a full workflow to rock type prediction using well log data: from data preparation, model building, feature selection to causal inference analysis. We made ML models that classify rock types using well logs (sonic, gamma, density, photoelectric and resistivity) from 21 wells as predictors and conduct a causal inference analysis between reservoir quality and well logs responses using Shapley values (a concept from a game theory). As a result of feature selection, we obtained predictors which are statistically significant and at the same time relevant in causal relation context. Macro F1-score of the best obtained models for both cases is 0.79 and 0.85 respectively. It was found that the ML models can infer domain knowledge, which allows us to confirm the adequacy of the built ML model for rock types prediction. Our insight was to recognize the need to properly account for the underlying causal structure between the features and rock types in order to derive meaningful and relevant predictors that carry a significant amount of information contributing to the final outcome. Also, we demonstrate the robustness of revealed patterns by applying the Shapley values methodology to a number of ML models and show consistency in order of the most important predictors. Our analysis shows that machine learning classifiers gaining high accuracy tend to mimic physical principles behind different logging tools, in particular: the longer the travel time of an acoustic wave the higher probability that media is represented by reservoir rock and vice versa. On the contrary lower values of natural radioactivity and density of rock highlight the presence of a reservoir. The article presents causal inference analysis of ML classification models using Shapley values on 2 real-world reservoirs. The rock class labels from core data are used to train a supervised machine learning algorithm to predict classes from well log response. The aim of supervised learning is to label a small portion of a dataset and allow the algorithm to automate the rest. Such data-driven analysis may optimize well logging, coring, and core analysis programs. This algorithm can be extended to any other reservoir to improve rock type prediction. The novelty of the paper is that such analysis reveals the nature of decisions made by the ML model and allows to apply truly robust and reliable petrophysics-consistent ML models for rock type classification.

SPE Journal ◽  
2020 ◽  
Vol 25 (05) ◽  
pp. 2778-2800 ◽  
Author(s):  
Harpreet Singh ◽  
Yongkoo Seol ◽  
Evgeniy M. Myshakin

Summary The application of specialized machine learning (ML) in petroleum engineering and geoscience is increasingly gaining attention in the development of rapid and efficient methods as a substitute to existing methods. Existing ML-based studies that use well logs contain two inherent limitations. The first limitation is that they start with one predefined combination of well logs that by default assumes that the chosen combination of well logs is poised to give the best outcome in terms of prediction, although the variation in accuracy obtained through different combinations of well logs can be substantial. The second limitation is that most studies apply unsupervised learning (UL) for classification problems, but it underperforms by a substantial margin compared with nearly all the supervised learning (SL) algorithms. In this context, this study investigates a variety of UL and SL ML algorithms applied on multiple well-log combinations (WLCs) to automate the traditional workflow of well-log processing and classification, including an optimization step to achieve the best output. The workflow begins by processing the measured well logs, which includes developing different combinations of measured well logs and their physics-motivated augmentations, followed by removal of potential outliers from the input WLCs. Reservoir lithology with four different rock types is investigated using eight UL and seven SL algorithms in two different case studies. The results from the two case studies are used to identify the optimal set of well logs and the ML algorithm that gives the best matching reservoir lithology to its ground truth. The workflow is demonstrated using two wells from two different reservoirs on Alaska North Slope to distinguish four different rock types along the well (brine-dominated sand, hydrate-dominated sand, shale, and others/mixed compositions). The results show that the automated workflow investigated in this study can discover the ground truth for the lithology with up to 80% accuracy with UL and up to 90% accuracy with SL, using six routine well logs [vp, vs, ρb, ϕneut, Rt, gamma ray (GR)], which is a significant improvement compared with the accuracy reported in the current state of the art, which is less than 70%.


2018 ◽  
Vol 6 (3) ◽  
pp. T555-T567
Author(s):  
Zhuoying Fan ◽  
Jiagen Hou ◽  
Chengyan Lin ◽  
Xinmin Ge

Classification and well-logging evaluation of carbonate reservoir rock is very difficult. On one side, there are many reservoir pore spaces developed in carbonate reservoirs, including large karst caves, dissolved pores, fractures, intergranular dissolved pores, intragranular dissolved pores, and micropores. On the other side, conventional well-logging response characteristics of the various pore systems can be similar, making it difficult to identify the type of pore systems. We have developed a new reservoir rock-type characterization workflow. First, outcrop observations, cores, well logs, and multiscale data were used to clarify the carbonate reservoir types in the Ordovician carbonates of the Tahe Oilfield. Three reservoir rock types were divided based on outcrop, core observation, and thin section analysis. Microscopic and macroscopic characteristics of various rock types and their corresponding well-log responses were evaluated. Second, conventional well-log data were decomposed into multiple band sets of intrinsic mode functions using empirical mode decomposition method. The energy entropy of each log curve was then investigated. Based on the decomposition results, the characteristics of each reservoir type were summarized. Finally, by using the Fisher discriminant, the rock types of the carbonate reservoirs could be identified reliably. Comparing with conventional rock type identification methods based on conventional well-log responses only, the new workflow proposed in this paper can effectively cluster data within each rock types and increase the accuracy of reservoir type-based hydrocarbon production prediction. The workflow was applied to 213 reservoir intervals from 146 wells in the Tahe Oilfield. The results can improve the accuracy of oil-production interval prediction using well logs over conventional methods.


2021 ◽  
Author(s):  
Mohamed Masoud ◽  
W. Scott Meddaugh ◽  
Masoud Eljaroshi ◽  
Khaled Elghanduri

Abstract The Harash Formation was previously known as the Ruaga A and is considered to be one of the most productive reservoirs in the Zelten field in terms of reservoir quality, areal extent, and hydrocarbon quantity. To date, nearly 70 wells were drilled targeting the Harash reservoir. A few wells initially naturally produced but most had to be stimulated which reflected the field drilling and development plan. The Harash reservoir rock typing identification was essential in understanding the reservoir geology implementation of reservoir development drilling program, the construction of representative reservoir models, hydrocarbons volumetric calculations, and historical pressure-production matching in the flow modelling processes. The objectives of this study are to predict the permeability at un-cored wells and unsampled locations, to classify the reservoir rocks into main rock typing, and to build robust reservoir properties models in which static petrophysical properties and fluid properties are assigned for identified rock type and assessed the existed vertical and lateral heterogeneity within the Palaeocene Harash carbonate reservoir. Initially, an objective-based workflow was developed by generating a training dataset from open hole logs and core samples which were conventionally and specially analyzed of six wells. The developed dataset was used to predict permeability at cored wells through a K-mod model that applies Neural Network Analysis (NNA) and Declustring (DC) algorithms to generate representative permeability and electro-facies. Equal statistical weights were given to log responses without analytical supervision taking into account the significant log response variations. The core data was grouped on petrophysical basis to compute pore throat size aiming at deriving and enlarging the interpretation process from the core to log domain using Indexation and Probabilities of Self-Organized Maps (IPSOM) classification model to develop a reliable representation of rock type classification at the well scale. Permeability and rock typing derived from the open-hole logs and core samples analysis are the main K-mod and IPSOM classification model outputs. The results were propagated to more than 70 un-cored wells. Rock typing techniques were also conducted to classify the Harash reservoir rocks in a consistent manner. Depositional rock typing using a stratigraphic modified Lorenz plot and electro-facies suggest three different rock types that are probably linked to three flow zones. The defined rock types are dominated by specifc reservoir parameters. Electro-facies enables subdivision of the formation into petrophysical groups in which properties were assigned to and were characterized by dynamic behavior and the rock-fluid interaction. Capillary pressure and relative permeability data proved the complexity in rock capillarity. Subsequently, Swc is really rock typing dependent. The use of a consistent representative petrophysical rock type classification led to a significant improvement of geological and flow models.


2021 ◽  
Author(s):  
Ryan Banas ◽  
◽  
Andrew McDonald ◽  
Tegwyn Perkins ◽  
◽  
...  

Subsurface analysis-driven field development requires quality data as input into analysis, modelling, and planning. In the case of many conventional reservoirs, pay intervals are often well consolidated and maintain integrity under drilling and geological stresses providing an ideal logging environment. Consequently, editing well logs is often overlooked or dismissed entirely. Petrophysical analysis however is not always constrained to conventional pay intervals. When developing an unconventional reservoir, pay sections may be comprised of shales. The requirement for edited and quality checked logs becomes crucial to accurately assess storage volumes in place. Edited curves can also serve as inputs to engineering studies, geological and geophysical models, reservoir evaluation, and many machine learning models employed today. As an example, hydraulic fracturing model inputs may span over adjacent shale beds around a target reservoir, which are frequently washed out. These washed out sections may seriously impact logging measurements of interest, such as bulk density and acoustic compressional slowness, which are used to generate elastic properties and compute geomechanical curves. Two classifications of machine learning algorithms for identifying outliers and poor-quality data due to bad hole conditions are discussed: supervised and unsupervised learning. The first allows the expert to train a model from existing and categorized data, whereas unsupervised learning algorithms learn from a collection of unlabeled data. Each classification type has distinct advantages and disadvantages. Identifying outliers and conditioning well logs prior to a petrophysical analysis or machine learning model can be a time-consuming and laborious process, especially when large multi-well datasets are considered. In this study, a new supervised learning algorithm is presented that utilizes multiple-linear regression analysis to repair well log data in an iterative and automated routine. This technique allows outliers to be identified and repaired whilst improving the efficiency of the log data editing process without compromising accuracy. The algorithm uses sophisticated logic and curve predictions derived via multiple linear regression in order to systematically repair various well logs. A clear improvement in efficiency is observed when the algorithm is compared to other currently used methods. These include manual processing by a petrophysicist and unsupervised outlier detection methods. The algorithm can also be leveraged over multiple wells to produce more generalized predictions. Through a platform created to quickly identify and repair invalid log data, the results are controlled through input and supervision by the user. This methodology is not a direct replacement of an expert interpreter, but complementary by allowing the petrophysicist to leverage computing power, improve consistency, reduce error and improve turnaround time.


2021 ◽  
Author(s):  
Zulkuf Azizoglu ◽  
Zoya Heidari ◽  
Leonardo Goncalves ◽  
Lucas Abreu Blanes De Oliveira ◽  
Moacyr Silva Do Nascimento Neto

Abstract Broadband dielectric dispersion measurements are attractive options for assessment of water-filled pore volume, especially when quantifying salt concentration is challenging. However, conventional models for interpretation of dielectric measurements such as Complex Refractive Index Model (CRIM) and Maxwell Garnett (MG) model require oversimplifying assumptions about pore structure and distribution of constituting fluids/minerals. Therefore, dielectric-based estimates of water saturation are often not reliable in the presence of complex pore structure, rock composition, and rock fabric (i.e., spatial distribution of solid/fluid components). The objectives of this paper are (a) to propose a simple workflow for interpretation of dielectric permittivity measurements in log-scale domain, which takes the impacts of complex pore geometry and distribution of minerals into account, (b) to experimentally verify the reliability of the introduced workflow in the core-scale domain, and (c) to apply the introduced workflow for well-log-based assessment of water saturation. The dielectric permittivity model includes tortuosity-dependent parameters to honor the complexity of the pore structure and rock fabric for interpretation of broadband dielectric dispersion measurements. We estimate tortuosity-dependent parameters for each rock type from dielectric permittivity measurements conducted on core samples. To verify the reliability of dielectric-based water saturation model, we conduct experimental measurements on core plugs taken from a carbonate formation with complex pore structures. We also introduce a workflow for applying the introduced model to dielectric dispersion well logs for depth-by-depth assessment of water saturation. The tortuosity-dependent parameters in log-scale domain can be estimated either via experimental core-scale calibration, well logs in fully water-saturated zones, or pore-scale evaluation in each rock type. The first approach is adopted in this paper. We successfully applied the introduced model on core samples and well logs from a pre-salt formation in Santos Basin. In the core-scale domain, the estimated water saturation using the introduced model resulted in an average relative error of less than 11% (compared to gravimetric measurements). The introduced workflow improved water saturation estimates by 91% compared to CRIM. Results confirmed the reliability of the new dielectric model. In application to well logs, we observed significant improvements in water saturation estimates compared to cases where a conventional effective medium model (i.e., CRIM) was used. The documented results from both core-scale and well-log-scale applications of the introduced method emphasize on the importance of honoring pore structure in the interpretation of dielectric measurements.


2014 ◽  
Vol 54 (1) ◽  
pp. 241
Author(s):  
Hanieh Jafary Dargahi ◽  
Reza Rezaee

The recognition of distinct rock types through log responses, referred to as electrofacies, is a fundamental role in mapping stratigraphic units that do not have any specific geological description. Lateral variability within adjoining intervals is differentiated by studying lithological characteristics such as petrography and mineralogy acquired from visual core description. In non-cored wells electrofacies analysis, therefore, is the most reliable way in determining reservoir zonations. The electrofacies’ accuracy is critically important in defining potentially desirable rock types for shale gas reservoirs in non-cored intervals, which can be obtained through an analogy of well log responses in identical lithofacies within cored wells. Considering the complexity of making a final prediction due to the unavailability of different well logs covering the whole area, only the gamma-ray log is used in determining electrofacies patterns within the studied shale gas intervals. The electrofacies patterns within identified lithofacies have been studied for the Kockatea Shale, which presented analogous patterns for identical lithological facies. The similarity has allowed for the correlation of lithofacies in cored and non-cored wells, and the evaluation of lithofacies variability and development within various wells. The correlation of the defined electrofacies indicates facies changes across the basin in association with thickening of some lithofacies. The thickest part of the electrofacies is shown at the Dandaragan Trough and the Beagle Ridge. Some electrofacies, however, have disappeared in some parts of these areas, such as lithofacies E in the Beagle Ridge, which is partially replaced by electrofacies C.


Author(s):  
Mohan Allam ◽  
M. Nandhini ◽  
M. Thangadarshini

Autism spectrum disorder is a syndrome related to interaction with people and repetitive behavior. ASD is diagnosed by health experts with the help of special practices that can be prolonged and costly. Researchers developed several ASD detection techniques by utilizing machine learning tools. ML provides the advanced algorithms that build automatic classification models. But disease prediction is a challenge for ML models due to the majority of the medical datasets including irrelevant features. Feature selection is a critical job in the predictive modeling for selecting a subset of significant features from the dataset. Recent feature selection techniques are using the optimization algorithms to improve the prediction rate of classification models. Most of the optimization algorithms make use of several controlling parameters that have to be tuned for improved productivity. In this chapter, a novel feature selection technique is proposed using binary teaching learning-based optimization algorithm that requires standard controlling parameters to acquire optimum features from ASD data.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Guangzhou An ◽  
Kazuko Omodaka ◽  
Satoru Tsuda ◽  
Yukihiro Shiga ◽  
Naoko Takada ◽  
...  

This study develops an objective machine-learning classification model for classifying glaucomatous optic discs and reveals the classificatory criteria to assist in clinical glaucoma management. In this study, 163 glaucoma eyes were labelled with four optic disc types by three glaucoma specialists and then randomly separated into training and test data. All the images of these eyes were captured using optical coherence tomography and laser speckle flowgraphy to quantify the ocular structure and blood-flow-related parameters. A total of 91 parameters were extracted from each eye along with the patients’ background information. Machine-learning classifiers, including the neural network (NN), naïve Bayes (NB), support vector machine (SVM), and gradient boosted decision trees (GBDT), were trained to build the classification models, and a hybrid feature selection method that combines minimum redundancy maximum relevance and genetic-algorithm-based feature selection was applied to find the most valid and relevant features for NN, NB, and SVM. A comparison of the performance of the three machine-learning classification models showed that the NN had the best classification performance with a validated accuracy of 87.8% using only nine ocular parameters. These selected quantified parameters enabled the trained NN to classify glaucomatous optic discs with relatively high performance without requiring color fundus images.


2015 ◽  
Vol 14s5 ◽  
pp. CIN.S30795 ◽  
Author(s):  
S. Sakira Hassan ◽  
Pekka Ruusuvuori ◽  
Leena Latonen ◽  
Heikki Huttunen

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ1 regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.


Author(s):  
Anditya Sapta Rahesthi ◽  
Ratnayu Sitaresmi ◽  
Sigit Rahmawan

<em>Rock permeability is an important rock characteristic because it can help determine the rate of fluid production. Permeability can only be determined by direct measurement of core samples in the laboratory. Even though coring gives good results, the disadvantage is that it takes a lot of time and costs so it is not possible to do coring at all intervals. So that the well log is required to predict the level of permeability indirectly. However, the calculation of permeability prediction using well log data has a high uncertainty value, so rock typing is required so that the calculation of permeability prediction becomes more detailed. This research was conducted in an effort to determine the Hydraulic Flow Unit (HFU) of the reservoir in the well that has core data using the Flow Zone Indicator (FZI) parameter and FZI value propagation on wells that do not have core data so that the type of rock and permeability value are obtained from every well interval. From the results of the study, the reservoirs on the ASR field can be grouped into six rock types. The six rock types each have permeability as a function of validated porosity by applying it at all intervals. After FZI is calculated from log data and validated with core data, it can be seen that the results of the method produce a fairly good correlation (R<sup>2</sup> = 0.92). Furthermore, from the permeability equation values for each different rock type, the predicted permeability results are also quite good (R<sup>2 </sup>= 0.81).</em>


Sign in / Sign up

Export Citation Format

Share Document