scholarly journals Value of Geologically Derived Features in Machine Learning Facies Classification

2019 ◽  
Vol 52 (1) ◽  
pp. 5-29 ◽  
Author(s):  
Julie Halotel ◽  
Vasily Demyanov ◽  
Andy Gardiner

AbstractThe aim of this work is to demonstrate how geologically interpretative features can improve machine learning facies classification with uncertainty assessment. Manual interpretation of lithofacies from wireline log data is traditionally performed by an expert, can be subject to biases, and is substantially laborious and time consuming for very large datasets. Characterizing the interpretational uncertainty in facies classification is quite difficult, but it can be very important for reservoir development decisions. Thus, automation of the facies classification process using machine learning is a potentially intuitive and efficient way to facilitate facies interpretation based on large-volume data. It can also enable more adequate quantification of the uncertainty in facies classification by ensuring that possible alternative lithological scenarios are not overlooked. An improvement of the performance of purely data-driven classifiers by integrating geological features and expert knowledge as additional inputs is proposed herein, with the aim of equipping the classifier with more geological insight and gaining interpretability by making it more explanatory. Support vector machine and random forest classifiers are compared to demonstrate the superiority of the latter. This study contrasts facies classification using only conventional wireline log inputs and using additional geological features. In the first experiment, geological rule-based constraints were implemented as an additional derived and constructed input. These account for key geological features that a petrophysics or geological expert would attribute to typical and identifiable wireline log responses. In the second experiment, geological interpretative features (i.e., grain size, pore size, and argillaceous content) were used as additional independent inputs to enhance the prediction accuracy and geological consistency of the classification outcomes. Input and output noise injection experiments demonstrated the robustness of the results towards systematic and random noise in the data. The aspiration of this study is to establish geological characteristics and knowledge to be considered as decisive data when used in machine learning facies classification.

2021 ◽  
Author(s):  
Jerome Asedegbega ◽  
Oladayo Ayinde ◽  
Alexander Nwakanma

Abstract Several computer-aided techniques have been developed in recent past to improve interpretational accuracy of subsurface geology. This paradigm shift has provided tremendous success in variety of Machine Learning Application domains and help for better feasibility study in reservoir evaluation using multiple classification techniques. Facies classification is an essential subsurface exploration task as sedimentary facies reflect associated physical, chemical, and biological conditions that formation unit experienced during sedimentation activity. This study however, employed formation samples for facies classification using Machine Learning (ML) techniques and classified different facies from well logs in seven (7) wells of the PORT Field, Offshore Niger Delta. Six wells were concatenated during data preparation and trained using supervised ML algorithms before validating the models by blind testing on one well log to predict discrete facies groups. The analysis started with data preparation and examination where various features of the available well data were conditioned. For the model building and performance, support vector machine, random forest, decision tree, extra tree, neural network (multilayer preceptor), k-nearest neighbor and logistic regression model were built after dividing the data sets into training, test, and blind test well data. Results of metric score for the blind test well estimated for the various models using Jaccard index and F1-score indicated 0.73 and 0.82 for support vector machine, 0.38 and 0.54 for random forest, 0.78 and 0.83 for extra tree, 0.91 and 0.95 for k-nearest neighbor, 0.41 and 0.56 for decision tree, 0.63 and 0.74 for logistic regression, 0.55 and 0.68 for neural network, respectively. The efficiency of ML techniques for enhancing the prediction accuracy and decreasing the procedure time and their approach toward the data, makes it importantly desirable to recommend them in subsurface facies classification analysis.


2019 ◽  
Vol 11 (2) ◽  
pp. 196 ◽  
Author(s):  
Omid Ghorbanzadeh ◽  
Thomas Blaschke ◽  
Khalil Gholamnia ◽  
Sansar Meena ◽  
Dirk Tiede ◽  
...  

There is a growing demand for detailed and accurate landslide maps and inventories around the globe, but particularly in hazard-prone regions such as the Himalayas. Most standard mapping methods require expert knowledge, supervision and fieldwork. In this study, we use optical data from the Rapid Eye satellite and topographic factors to analyze the potential of machine learning methods, i.e., artificial neural network (ANN), support vector machines (SVM) and random forest (RF), and different deep-learning convolution neural networks (CNNs) for landslide detection. We use two training zones and one test zone to independently evaluate the performance of different methods in the highly landslide-prone Rasuwa district in Nepal. Twenty different maps are created using ANN, SVM and RF and different CNN instantiations and are compared against the results of extensive fieldwork through a mean intersection-over-union (mIOU) and other common metrics. This accuracy assessment yields the best result of 78.26% mIOU for a small window size CNN, which uses spectral information only. The additional information from a 5 m digital elevation model helps to discriminate between human settlements and landslides but does not improve the overall classification accuracy. CNNs do not automatically outperform ANN, SVM and RF, although this is sometimes claimed. Rather, the performance of CNNs strongly depends on their design, i.e., layer depth, input window sizes and training strategies. Here, we conclude that the CNN method is still in its infancy as most researchers will either use predefined parameters in solutions like Google TensorFlow or will apply different settings in a trial-and-error manner. Nevertheless, deep-learning can improve landslide mapping in the future if the effects of the different designs are better understood, enough training samples exist, and the effects of augmentation strategies to artificially increase the number of existing samples are better understood.


2017 ◽  
Vol 36 (3) ◽  
pp. 267-269 ◽  
Author(s):  
Matt Hall ◽  
Brendon Hall

The Geophysical Tutorial in the October issue of The Leading Edge was the first we've done on the topic of machine learning. Brendon Hall's article ( Hall, 2016 ) showed readers how to take a small data set — wireline logs and geologic facies data from nine wells in the Hugoton natural gas and helium field of southwest Kansas ( Dubois et al., 2007 ) — and predict the facies in two wells for which the facies data were not available. The article demonstrated with 25 lines of code how to explore the data set, then create, train and test a machine learning model for facies classification, and finally visualize the results. The workflow took a deliberately naive approach using a support vector machine model. It achieved a sort of baseline accuracy rate — a first-order prediction, if you will — of 0.42. That might sound low, but it's not untypical for a naive approach to this kind of problem. For comparison, random draws from the facies distribution score 0.16, which is therefore the true baseline.


2021 ◽  
Author(s):  
Adrian Groza ◽  
Liana Toderean ◽  
George Muntean ◽  
Simona Delia Nicoara

Abstract Purpose: Expertise for auditing AI systems in medical domain is only now being accumulated. Conformity assessment procedures will require AI systems: 1) to be transparent, 2) not to rely decisions solely on algorithms, or iii) to include safety assurance cases in the documentation to facilitate technical audit. We are interested here in obtaining transparency in the case of machine learning (ML) applied to classification of retina conditions. High performance metrics achieved using ML has become common practice. However, in the medical domain, algorithmic decisions need to be sustained by explanations. We aim at building a support tool for ophthalmologists able to: i) explain algorithmic decision to the human agent by automatically extracting rules from the ML learned models; ii) include the ophthalmologist in the loop by formalising expert rules and including the expert knowledge in the argumentation machinery; iii) build safety cases by creating assurance argument patterns for each diagnosis. Methods: For the learning task, we used a dataset consisting of 699 OCT images: 126 Normal class, 210 with Diabetic Retinopathy (DR) and 363 with Age Related Macular Degeneration (AMD). The dataset contains patients from the Ophthalmology Department of the County Emergency Hospital of Cluj-Napoca. All ethical norms and procedures, including anonymisation, have been performed. We applied three machine learning algorithms: decision tree (DT), support vector machine (SVM) and artificial neural network (ANN). For each algorithm we automatically extract diagnosis rules. For formalising expert knowledge, we relied on the normative dataset [13]. For arguing be- tween agents, we used the Jason multi-agent platform. We assume different knowledge base and reasoning capabilities for each agent. The agents have their own Optical Coherence Tomography (OCT) images on which they apply a distinct machine learning algorithm. The learned model is used to extract diagnosis rules. With distinct learned rules, the agents engage in an argumentative process. The resolution of the debate outputs a diagnosis that is then explained to the ophthalmologist, by means of assurance cases. Results: For diagnosing the retina condition, our AI solution deals with the following three issues: First, the learned models are automatically translated into rules. These rules are then used to build an explanation by tracing the reasoning chain supporting the diagnosis. Hence, the proposed AI solution complies with the requirement that ”algorithmic decision should be explained to the human agent”. Second, the decision is not solely based on ML-algorithms. The proposed architecture includes expert knowledge. The diagnosis is taken based on exchanging arguments between ML-based algorithms and expert knowledge. The conflict resolution among arguments is verbalised, so that the ophthalmologist can supervise the diagnosis. Third, the assurance cases are generated to facilitate technical audit. The assurance cases structure the evidence among various safety goals such as: machine learning methodology, transparency, or data quality. For each dimension, the auditor can check the provided evidence against the current best practices or safety standards. Conclusion: We developed a multi-agent system for retina conditions in which algorithmic decisions are sustained by explanations. The proposed tool goes behind most software in medical domain that focuses only on performance metrics. Our approach helps the technical auditor to approve software in the medical domain. Interleaving knowledge extracted from ML-models with ex- pert knowledge is a step towards balancing the benefits of ML with explainability, aiming at engineering reliable medical applications.


2020 ◽  
Author(s):  
Yutao Lu ◽  
Juan Wang ◽  
Miao Liu ◽  
Kaixuan Zhang ◽  
Guan Gui ◽  
...  

The ever-increasing amount of data in cellular networks poses challenges for network operators to monitor the quality of experience (QoE). Traditional key quality indicators (KQIs)-based hard decision methods are difficult to undertake the task of QoE anomaly detection in the case of big data. To solve this problem, in this paper, we propose a KQIs-based QoE anomaly detection framework using semi-supervised machine learning algorithm, i.e., iterative positive sample aided one-class support vector machine (IPS-OCSVM). There are four steps for realizing the proposed method while the key step is combining machine learning with the network operator's expert knowledge using OCSVM. Our proposed IPS-OCSVM framework realizes QoE anomaly detection through soft decision and can easily fine-tune the anomaly detection ability on demand. Moreover, we prove that the fluctuation of KQIs thresholds based on expert knowledge has a limited impact on the result of anomaly detection. Finally, experiment results are given to confirm the proposed IPS-OCSVM framework for QoE anomaly detection in cellular networks.


2020 ◽  
Author(s):  
Yutao Lu ◽  
Juan Wang ◽  
Miao Liu ◽  
Kaixuan Zhang ◽  
Guan Gui ◽  
...  

The ever-increasing amount of data in cellular networks poses challenges for network operators to monitor the quality of experience (QoE). Traditional key quality indicators (KQIs)-based hard decision methods are difficult to undertake the task of QoE anomaly detection in the case of big data. To solve this problem, in this paper, we propose a KQIs-based QoE anomaly detection framework using semi-supervised machine learning algorithm, i.e., iterative positive sample aided one-class support vector machine (IPS-OCSVM). There are four steps for realizing the proposed method while the key step is combining machine learning with the network operator's expert knowledge using OCSVM. Our proposed IPS-OCSVM framework realizes QoE anomaly detection through soft decision and can easily fine-tune the anomaly detection ability on demand. Moreover, we prove that the fluctuation of KQIs thresholds based on expert knowledge has a limited impact on the result of anomaly detection. Finally, experiment results are given to confirm the proposed IPS-OCSVM framework for QoE anomaly detection in cellular networks.


2018 ◽  
Vol 10 (7) ◽  
pp. 168781401878442 ◽  
Author(s):  
Baojun Li ◽  
Ying Dong ◽  
Zhijie Wen ◽  
Mingzeng Liu ◽  
Lei Yang ◽  
...  

To avoid the requirement of expert knowledge in conventional methods for car styling analysis, this article proposes a machine learning–based method which requires no expert-engineered features for car frontal styling analysis. In this article, we aim to identify the group behaviors in car styling such as the degree of brand styling consistency among different automakers and car styling patterns. The brand styling consistency is considered as a group behavior in this article and is formulated as a brand classification problem. This classification problem is then solved by a machine learning method based on the PCANet for automatic feature encoding and the support vector machine for feature-based classification. The brand styling consistency can thus be measured based on the classification accuracy. To perform the analysis, a car frontal styling database with 23 brands is first built. To present discovered brand styling patterns in classification, a decoding method is proposed to map salient features for brand classification to original images for revelation of salient styling regions. To provide a direct perception in brand styling characteristics, frontal styling representatives of several brands are present as well. This study contributes to efficient identification of brand styling consistency and visualization of brand styling patterns without relying on expert experience.


2020 ◽  
Vol 9 (1) ◽  
pp. 92-111
Author(s):  
Hanane Zermane ◽  
Rached Kasmi

Manufacturing automation is a double-edged sword, on one hand, it increases productivity of production system, cost reduction, reliability, etc. However, on the other hand it increases the complexity of the system. This has led to the need of efficient solutions such as artificial techniques. Data and experiences are extracted from experts that usually rely on common sense when they solve problems. They also use vague and ambiguous terms. However, knowledge engineer would have difficulties providing a computer with the same level of understanding. To resolve this situation, this article proposed fuzzy logic to know how the authors can represent expert knowledge that uses fuzzy terms in supervising complex industrial processes as a first step. As a second step, adopting one of the powerful techniques of machine learning, which is Support Vector Machine (SVM), the authors want to classify data to determine state of the supervision system and learn how to supervise the process preserving habitual linguistic used by operators.


2018 ◽  
Vol 4 ◽  
pp. e150 ◽  
Author(s):  
Lieven Billiet ◽  
Sabine Van Huffel ◽  
Vanya Van Belle

Over the last decades, clinical decision support systems have been gaining importance. They help clinicians to make effective use of the overload of available information to obtain correct diagnoses and appropriate treatments. However, their power often comes at the cost of a black box model which cannot be interpreted easily. This interpretability is of paramount importance in a medical setting with regard to trust and (legal) responsibility. In contrast, existing medical scoring systems are easy to understand and use, but they are often a simplified rule-of-thumb summary of previous medical experience rather than a well-founded system based on available data. Interval Coded Scoring (ICS) connects these two approaches, exploiting the power of sparse optimization to derive scoring systems from training data. The presented toolbox interface makes this theory easily applicable to both small and large datasets. It contains two possible problem formulations based on linear programming or elastic net. Both allow to construct a model for a binary classification problem and establish risk profiles that can be used for future diagnosis. All of this requires only a few lines of code. ICS differs from standard machine learning through its model consisting of interpretable main effects and interactions. Furthermore, insertion of expert knowledge is possible because the training can be semi-automatic. This allows end users to make a trade-off between complexity and performance based on cross-validation results and expert knowledge. Additionally, the toolbox offers an accessible way to assess classification performance via accuracy and the ROC curve, whereas the calibration of the risk profile can be evaluated via a calibration curve. Finally, the colour-coded model visualization has particular appeal if one wants to apply ICS manually on new observations, as well as for validation by experts in the specific application domains. The validity and applicability of the toolbox is demonstrated by comparing it to standard Machine Learning approaches such as Naive Bayes and Support Vector Machines for several real-life datasets. These case studies on medical problems show its applicability as a decision support system. ICS performs similarly in terms of classification and calibration. Its slightly lower performance is countered by its model simplicity which makes it the method of choice if interpretability is a key issue.


2022 ◽  
pp. 1-14
Author(s):  
Salem Al-Gharbi ◽  
Abdulaziz Al-Majed ◽  
Salaheldin Elkatatny ◽  
Abdulazeez Abdulraheem

Abstract Due to high demand for energy, oil and gas companies started to drill wells in remote environments conducting unconventional operations. In order to maintain safe, fast and more cost-effective operations, utilizing machine learning (ML) technologies has become a must. The harsh environments of drilling sites and the transmission setups, are negatively affecting the drilling data, leading to less than acceptable ML results. For that reason, big portion of ML development projects were actually spent on improving the data by data-quality experts. The objective of this paper is to evaluate the effectiveness of ML on improving the real-time drilling-data-quality and compare it to a human expert knowledge. To achieve that, two large real-time drilling datasets were used; one dataset was used to train three different ML techniques: artificial neural network (ANN), support vector machine (SVM) and decision tree (DT), the second dataset was used to evaluate it. The ML results were compared with the results of a real- time drilling data quality expert. Despite the complexity of ANN and good results in general, it achieved a relative root mean square error (RRMSE) of 2.83%, which was lower than DT and SVM technologies that achieved RRMSE of 0.35% and 0.48% respectively. The uniqueness of this work is in developing ML that simulates the improvement of drilling-data- quality by an expert. This research provides a guide for improving the quality of real-time drilling data.


Sign in / Sign up

Export Citation Format

Share Document