Machine learning for geophysical characterization of brittleness: Tuscaloosa Marine Shale case study

Brittleness is one of the most important reservoir properties for unconventional reservoir exploration and production. Better knowledge about the brittleness distribution can help to optimize the hydraulic fracturing operation and lower costs. However, there are very few reliable and effective physical models to predict the spatial distribution of brittleness. We have developed a machine learning-based method to predict subsurface brittleness by using multidiscipline data sets, such as seismic attributes, rock physics, and petrophysics information, which allows us to implement the prediction without using a physical model. The method is applied on a data set from Tuscaloosa Marine Shale, and the predicted rock physics template is close to the calculated value from conventional inverted elastic parameters. Therefore, the proposed method helps determine areas of the reservoir that have optimal geomechanical properties for successful hydraulic fracturing.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Joint Application of Diagenetic, Petrophysical and Geomechanical Data for Selecting Hydraulic Fracturing Candidate Zone

International Journal of Petroleum Technology ◽

10.15377/2409-787x.2021.08.5 ◽

2021 ◽

Vol 8 ◽

pp. 55-79

Author(s):

E. Bakhshi ◽

A. Shahrabadi ◽

N. Golsanami ◽

Sh. Seyedsajadi ◽

X. Liu ◽

...

Keyword(s):

Hydraulic Fracturing ◽

Pore Pressure ◽

Earth Model ◽

Reservoir Properties ◽

Stress Regime ◽

Geomechanical Properties ◽

Hydrocarbon Reservoir ◽

Diagenetic Processes ◽

Mechanical Earth Model ◽

New Equation

The more comprehensive information on the reservoir properties will help to better plan drilling and design production. Herein, diagenetic processes and geomechanical properties are notable parameters that determine reservoir quality. Recognizing the geomechanical properties of the reservoir as well as building a mechanical earth model play a strong role in the hydrocarbon reservoir life cycle and are key factors in analyzing wellbore instability, drilling operation optimization, and hydraulic fracturing designing operation. Therefore, the present study focuses on selecting the candidate zone for hydraulic fracturing through a novel approach that simultaneously considers the diagenetic, petrophysical, and geomechanical properties. The diagenetic processes were analyzed to determine the porosity types in the reservoir. After that, based on the laboratory test results for estimating reservoir petrophysical parameters, the zones with suitable reservoir properties were selected. Moreover, based on the reservoir geomechanical parameters and the constructed mechanical earth model, the best zones were selected for hydraulic fracturing operation in one of the Iranian fractured carbonate reservoirs. Finally, a new empirical equation for estimating pore pressure in nine zones of the studied well was developed. This equation provides a more precise estimation of stress profiles and thus leads to more accurate decision-making for candidate zone selection. Based on the results, vuggy porosity was the best porosity type, and zones C2, E2 and G2, having suitable values of porosity, permeability, and water saturation, showed good reservoir properties. Therefore, zone E2 and G2 were chosen as the candidate for hydraulic fracturing simulation based on their E (Young’s modulus) and ν (Poisson’s ratio) values. Based on the mechanical earth model and changes in the acoustic data versus depth, a new equation is introduced for calculating the pore pressure in the studied reservoir. According to the new equation, the dominant stress regime in the whole well, especially in the candidate zones, is SigHmax>SigV>Sighmin, while according to the pore pressure equation presented in the literature, the dominant stress regime in the studied well turns out to be SigHmax>Sighmin>SigV.

Download Full-text

Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n4-2021a4 ◽

2021 ◽

Vol 62 (4) ◽

pp. 393-406

Author(s):

Yanxiang Yu ◽

◽

Chicheng Xu ◽

Siddharth Misra ◽

Weichang Li ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Short Term Memory ◽

Rock Physics ◽

Training Data ◽

Machine Learning Techniques ◽

Blind Test ◽

Data Set ◽

Benchmark Model ◽

Sonic Log

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Download Full-text

Birds Sound Classification Based on Machine Learning Algorithms

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v9i430227 ◽

2021 ◽

pp. 1-11

Author(s):

Aska E. Mehyadin ◽

Adnan Mohsin Abdulazeez ◽

Dathar Abas Hasan ◽

Jwan N. Saeed

Keyword(s):

Machine Learning ◽

Noise Suppression ◽

Bird Species ◽

Machine Learning Algorithms ◽

Data Sets ◽

Learning Technology ◽

Species Classification ◽

Data Set ◽

Sound Classification ◽

Mel Frequency Cepstral Coefficient

The bird classifier is a system that is equipped with an area machine learning technology and uses a machine learning method to store and classify bird calls. Bird species can be known by recording only the sound of the bird, which will make it easier for the system to manage. The system also provides species classification resources to allow automated species detection from observations that can teach a machine how to recognize whether or classify the species. Non-undesirable noises are filtered out of and sorted into data sets, where each sound is run via a noise suppression filter and a separate classification procedure so that the most useful data set can be easily processed. Mel-frequency cepstral coefficient (MFCC) is used and tested through different algorithms, namely Naïve Bayes, J4.8 and Multilayer perceptron (MLP), to classify bird species. J4.8 has the highest accuracy (78.40%) and is the best. Accuracy and elapsed time are (39.4 seconds).

Download Full-text

GAINESIS: Generative Artificial Intelligence NEtlists SynthesIS

Electronics ◽

10.3390/electronics11020245 ◽

2022 ◽

Vol 11 (2) ◽

pp. 245

Author(s):

Konstantinos G. Liakos ◽

Georgios K. Georgakilas ◽

Fotis C. Plessas ◽

Paris Kitsos

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Power Analysis ◽

Public Libraries ◽

Data Sets ◽

Hardware Trojan ◽

Generative Adversarial Network ◽

Data Set ◽

Encrypted Data ◽

Adversarial Network

A significant problem in the field of hardware security consists of hardware trojan (HT) viruses. The insertion of HTs into a circuit can be applied for each phase of the circuit chain of production. HTs degrade the infected circuit, destroy it or leak encrypted data. Nowadays, efforts are being made to address HTs through machine learning (ML) techniques, mainly for the gate-level netlist (GLN) phase, but there are some restrictions. Specifically, the number and variety of normal and infected circuits that exist through the free public libraries, such as Trust-HUB, are based on the few samples of benchmarks that have been created from circuits large in size. Thus, it is difficult, based on these data, to develop robust ML-based models against HTs. In this paper, we propose a new deep learning (DL) tool named Generative Artificial Intelligence Netlists SynthesIS (GAINESIS). GAINESIS is based on the Wasserstein Conditional Generative Adversarial Network (WCGAN) algorithm and area–power analysis features from the GLN phase and synthesizes new normal and infected circuit samples for this phase. Based on our GAINESIS tool, we synthesized new data sets, different in size, and developed and compared seven ML classifiers. The results demonstrate that our new generated data sets significantly enhance the performance of ML classifiers compared with the initial data set of Trust-HUB.

Download Full-text

Common-focus point-based target-oriented imaging approach for continuous seismic reservoir monitoring

Geophysics ◽

10.1190/geo2017-0842.1 ◽

2018 ◽

Vol 83 (4) ◽

pp. M41-M48 ◽

Cited By ~ 3

Author(s):

Hongwei Liu ◽

Mustafa Naser Al-Ali

Keyword(s):

Seismic Data ◽

Rock Physics ◽

Time Lapse ◽

Velocity Estimation ◽

Estimation Methods ◽

Data Sets ◽

Data Set ◽

Velocity Models ◽

Reservoir Monitoring ◽

Focus Point

The ideal approach for continuous reservoir monitoring allows generation of fast and accurate images to cope with the massive data sets acquired for such a task. Conventionally, rigorous depth-oriented velocity-estimation methods are performed to produce sufficiently accurate velocity models. Unlike the traditional way, the target-oriented imaging technology based on the common-focus point (CFP) theory can be an alternative for continuous reservoir monitoring. The solution is based on a robust data-driven iterative operator updating strategy without deriving a detailed velocity model. The same focusing operator is applied on successive 3D seismic data sets for the first time to generate efficient and accurate 4D target-oriented seismic stacked images from time-lapse field seismic data sets acquired in a [Formula: see text] injection project in Saudi Arabia. Using the focusing operator, target-oriented prestack angle domain common-image gathers (ADCIGs) could be derived to perform amplitude-versus-angle analysis. To preserve the amplitude information in the ADCIGs, an amplitude-balancing factor is applied by embedding a synthetic data set using the real acquisition geometry to remove the geometry imprint artifact. Applying the CFP-based target-oriented imaging to time-lapse data sets revealed changes at the reservoir level in the poststack and prestack time-lapse signals, which is consistent with the [Formula: see text] injection history and rock physics.

Download Full-text

An Integrated Analytics and Machine Learning Solution for Predicting the Anisotropic Static Geomechanical Properties of the Tuscaloosa Marine Shale

Proceedings of the 9th Unconventional Resources Technology Conference ◽

10.15530/urtec-2021-5625 ◽

2021 ◽

Author(s):

Cristina Mariana Ruse ◽

Jamal Ahmadov ◽

Ning Liu ◽

Mehdi Mokhtari

Keyword(s):

Machine Learning ◽

Geomechanical Properties ◽

Marine Shale

Download Full-text

Precision-Recall versus Accuracy and the Role of Large Data Sets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014039 ◽

2019 ◽

Vol 33 ◽

pp. 4039-4048 ◽

Cited By ~ 8

Author(s):

Brendan Juba ◽

Hai S. Le

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Large Data ◽

Constant Factor ◽

Data Sets ◽

Data Set ◽

Small Constant ◽

Classifier Performance ◽

Necessary And Sufficient

Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that a larger number of examples is necessary and sufficient to address class imbalance, a finding we also illustrate empirically.

Download Full-text

Workshop Preview: Data Analytics and Machine Learning Hackathon 2021: A deep dive into the open-source data challenge for E&P

The Leading Edge ◽

10.1190/tle40010068.1 ◽

2021 ◽

Vol 40 (1) ◽

pp. 68-71

Author(s):

Haibin Di ◽

Anisha Kaul ◽

Leigh Truelove ◽

Weichang Li ◽

Wenyi Hu ◽

...

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Experience ◽

Data Set ◽

Deep Dive ◽

Open Source Data ◽

Research Workshop ◽

Hands On ◽

Source Data ◽

Exploration And Production

We present a data challenge as part of the hackathon planned for the August 2021 SEG Research Workshop on Data Analytics and Machine Learning for Exploration and Production. The hackathon aims to provide hands-on machine learning experience for beginners and advanced practitioners, using a relatively well-defined problem and a carefully curated data set. The seismic data are from New Zealand's Taranaki Basin. The labels for a subset of the data have been generated by an experienced geologist. The objective of the challenge is to develop innovative machine learning solutions to identify key horizons.

Download Full-text