Development of Machine Learning Based Propped Fracture Conductivity Correlations in Shale Formations

Abstract Propped hydraulic fracturing is a stimulation technique used in tight formations to create conductive fractures. To predict the fractured well productivity, the conductivity of those propped fractures should be estimated. It is common to measure the conductivity of propped fractures in the laboratory under controlled conditions. Nonetheless, it is costly and time-consuming which encouraged developing many empirical and analytical propped fracture conductivity models. Previous empirical models, however, were based on limited datasets producing questionable correlations. We propose herein new empirical models based on an extensive data set utilizing machine learning (ML) methods. In this study, an artificial neural network (ANN) was utilized. A dataset comprised of 351 data points of propped hydraulic fracture experiments on different shale types with different mineralogy under various confining stresses was collected and studied. Several statistical and data science approaches such as box and whisker plots, correlation crossplots, and Z-score techniques were used to remove the outliers and extreme data points. The performance of the developed model was evaluated using powerful metrics such as correlation coefficient and root mean squared error. After several executions and function evaluations, an ANN was found to be the best technique to predict propped fracture conductivity for different mineralogy. The proposed ANN models resulted in less than 7% error between actual and predicted values. In this study, in addition to the development of an optimized ANN model, explicit empirical correlations are also extracted from the weights and biases of the fine-tuned model. The proposed model of propped fracture conductivity was then compared with the commonly available correlations. The results revealed that the proposed mineralogy based propped fracture conductivity models made the predictions with a high correlation coefficient of 94%. This work clearly shows the potential of computer-based ML techniques in the determination of mineralogy based propped fracture conductivity. The proposed empirical correlation can be implemented without requiring any ML-based software.

Download Full-text

Artificial Neural Network Model for Managing and Forecasting Water Reservoir Discharge (Hemren Reservoir as A Case Study)

Diyala Journal of Engineering Sciences ◽

10.24237/djes.2014.07409 ◽

2014 ◽

Vol 7 (4) ◽

pp. 132-143

Author(s):

ABBAS M. ABD ◽

SAAD SH. SAMMEN

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Correlation Coefficient ◽

Mean Squared Error ◽

Water Reservoir ◽

Prediction Method ◽

Training Data ◽

Ann Model ◽

Data Set ◽

Artificial Neural

The prediction of different hydrological phenomenon (or system) plays an increasing role in the management of water resources. As engineers; it is required to predict the component of natural reservoirs’ inflow for numerous purposes. Resulting prediction techniques vary with the potential purpose, characteristics, and documented data. The best prediction method is of interest of experts to overcome the uncertainty, because the most hydrological parameters are subjected to the uncertainty. Artificial Neural Network (ANN) approach has adopted in this paper to predict Hemren reservoir inflow. Available data including monthly discharge supplied from DerbendiKhan reservoir and rain fall intensity falling on the intermediate catchment area between Hemren-DerbendiKhan dams were used.A Back Propagation (LMBP) algorithm (Levenberg-Marquardt) has been utilized to construct the ANN models. For the developed ANN model, different networks with different numbers of neurons and layers were evaluated. A total of 24 years of historical data for interval from 1980 to 2004 were used to train and test the networks. The optimum ANN network with 3 inputs, 40 neurons in both two hidden layers and one output was selected. Mean Squared Error (MSE) and the Correlation Coefficient (CC) were employed to evaluate the accuracy of the proposed model. The network was trained and converged at MSE = 0.027 by using training data subjected to early stopping approach. The network could forecast the testing data set with the accuracy of MSE = 0.031. Training and testing process showed the correlation coefficient of 0.97 and 0.77 respectively and this is refer to a high precision of that prediction technique.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Advancing data science in drug development through an innovative computational framework for data sharing and statistical analysis

BMC Medical Research Methodology ◽

10.1186/s12874-021-01409-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ann-Marie Mallon ◽

Dieter A. Häring ◽

Frank Dahlke ◽

Piet Aarden ◽

Soroosh Afyouni ◽

...

Keyword(s):

Machine Learning ◽

Clinical Trial ◽

Drug Development ◽

Phase Ii ◽

Data Science ◽

Clinical Trial Data ◽

Trial Data ◽

Computational Framework ◽

Data Set ◽

Collaborative Development

Abstract Background Novartis and the University of Oxford’s Big Data Institute (BDI) have established a research alliance with the aim to improve health care and drug development by making it more efficient and targeted. Using a combination of the latest statistical machine learning technology with an innovative IT platform developed to manage large volumes of anonymised data from numerous data sources and types we plan to identify novel patterns with clinical relevance which cannot be detected by humans alone to identify phenotypes and early predictors of patient disease activity and progression. Method The collaboration focuses on highly complex autoimmune diseases and develops a computational framework to assemble a research-ready dataset across numerous modalities. For the Multiple Sclerosis (MS) project, the collaboration has anonymised and integrated phase II to phase IV clinical and imaging trial data from ≈35,000 patients across all clinical phenotypes and collected in more than 2200 centres worldwide. For the “IL-17” project, the collaboration has anonymised and integrated clinical and imaging data from over 30 phase II and III Cosentyx clinical trials including more than 15,000 patients, suffering from four autoimmune disorders (Psoriasis, Axial Spondyloarthritis, Psoriatic arthritis (PsA) and Rheumatoid arthritis (RA)). Results A fundamental component of successful data analysis and the collaborative development of novel machine learning methods on these rich data sets has been the construction of a research informatics framework that can capture the data at regular intervals where images could be anonymised and integrated with the de-identified clinical data, quality controlled and compiled into a research-ready relational database which would then be available to multi-disciplinary analysts. The collaborative development from a group of software developers, data wranglers, statisticians, clinicians, and domain scientists across both organisations has been key. This framework is innovative, as it facilitates collaborative data management and makes a complicated clinical trial data set from a pharmaceutical company available to academic researchers who become associated with the project. Conclusions An informatics framework has been developed to capture clinical trial data into a pipeline of anonymisation, quality control, data exploration, and subsequent integration into a database. Establishing this framework has been integral to the development of analytical tools.

Download Full-text

Machine learning and Grad-Cam based vascular aging assessment using photoplethysmogram (Preprint)

10.2196/preprints.31709 ◽

2021 ◽

Author(s):

Hangsik Shin

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Age Estimation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Vascular Aging ◽

Squared Error ◽

Vascular Age

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104

Download Full-text

Weighted Maximum Likelihood Correlation Coefficient to Handle Missing Values and Outliers in Data Set

WSEAS TRANSACTIONS ON MATHEMATICS ◽

10.37394/23206.2021.20.43 ◽

2021 ◽

Vol 20 ◽

pp. 415-430

Author(s):

Juthaphorn Sinsomboonthong ◽

Saichon Sinsomboonthong

Keyword(s):

Missing Data ◽

Maximum Likelihood ◽

Sample Size ◽

Correlation Coefficient ◽

Missing Values ◽

Mean Squared Error ◽

Correlation Coefficients ◽

Median Percentage ◽

Data Set ◽

Median Correlation

The proposed estimator, namely weighted maximum likelihood (WML) correlation coefficient, for measuring the relationship between two variables to concern about missing values and outliers in the dataset is presented. This estimator is proven by applying the conditional probability function to take care of some missing values and pay more attention to values near the center. However, outliers in the dataset are assigned a slight weight. These using techniques will give the robust proposed method when the preliminary assumptions are not met data analysis. To inspect about the quality of the proposed estimator, the six methods—WML, Pearson, median, percentage bend, biweight mid, and composite correlation coefficients—are compared the properties in two criteria, i.e. the bias and mean squared error, via the simulation study. The results of generated data are illustrated that the WML estimator seems to have the best performance to withstand the missing values and outliers in dataset, especially for the tiny sample size and large percentage of outliers regardless of missing data levels. However, for the massive sample size, the median correlation coefficient seems to have the good estimator when linear relationship levels between two variables are approximately over 0.4 irrespective of outliers and missing data levels

Download Full-text

Calabi-Yau Spaces in the String Landscape

Oxford Research Encyclopedia of Physics ◽

10.1093/acrefore/9780190871994.013.60 ◽

2020 ◽

Author(s):

Yang-Hui He

Keyword(s):

Machine Learning ◽

Computer Science ◽

Data Science ◽

Theoretical Physics ◽

Superstring Theory ◽

Data Set ◽

Pure Mathematics ◽

String Landscape ◽

Natural Solution ◽

Vacuum Solutions

Calabi-Yau spaces, or Kähler spaces admitting zero Ricci curvature, have played a pivotal role in theoretical physics and pure mathematics for the last half century. In physics, they constituted the first and natural solution to compactification of superstring theory to our 4-dimensional universe, primarily due to one of their equivalent definitions being the admittance of covariantly constant spinors. Since the mid-1980s, physicists and mathematicians have joined forces in creating explicit examples of Calabi-Yau spaces, compiling databases of formidable size, including the complete intersecion (CICY) data set, the weighted hypersurfaces data set, the elliptic-fibration data set, the Kreuzer-Skarke toric hypersurface data set, generalized CICYs, etc., totaling at least on the order of 1010 manifolds. These all contribute to the vast string landscape, the multitude of possible vacuum solutions to string compactification. More recently, this collaboration has been enriched by computer science and data science, the former in bench-marking the complexity of the algorithms in computing geometric quantities, and the latter in applying techniques such as machine learning in extracting unexpected information. These endeavours, inspired by the physics of the string landscape, have rendered the investigation of Calabi-Yau spaces one of the most exciting and interdisciplinary fields.

Download Full-text

Bituminous Mixtures Experimental Data Modeling Using a Hyperparameters-Optimized Machine Learning Approach

Applied Sciences ◽

10.3390/app112411710 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11710

Author(s):

Matteo Miani ◽

Matteo Dunnhofer ◽

Fabio Rondinella ◽

Evangelos Manthos ◽

Jan Valentin ◽

...

Keyword(s):

Machine Learning ◽

Bayesian Optimization ◽

Automatic Identification ◽

Learning Approach ◽

Ann Model ◽

Data Set ◽

Bituminous Mixtures ◽

Novel Approach ◽

The Neural Network ◽

Machine Learning Approach

This study introduces a machine learning approach based on Artificial Neural Networks (ANNs) for the prediction of Marshall test results, stiffness modulus and air voids data of different bituminous mixtures for road pavements. A novel approach for an objective and semi-automatic identification of the optimal ANN’s structure, defined by the so-called hyperparameters, has been introduced and discussed. Mechanical and volumetric data were obtained by conducting laboratory tests on 320 Marshall specimens, and the results were used to train the neural network. The k-fold Cross Validation method has been used for partitioning the available data set, to obtain an unbiased evaluation of the model predictive error. The ANN’s hyperparameters have been optimized using the Bayesian optimization, that overcame efficiently the more costly trial-and-error procedure and automated the hyperparameters tuning. The proposed ANN model is characterized by a Pearson coefficient value of 0.868.

Download Full-text

Detection of Epilepsy based on EEG Signals using PCA with ANN Model

Journal of Physics Conference Series ◽

10.1088/1742-6596/2070/1/012145 ◽

2021 ◽

Vol 2070 (1) ◽

pp. 012145

Author(s):

R Shiva Shankar ◽

CH Raminaidu ◽

VV Sivarama Raju ◽

J Rajanikanth

Keyword(s):

Machine Learning ◽

Epileptic Seizure ◽

Mean Squared Error ◽

Ann Model ◽

Eeg Signals ◽

Hinge Loss ◽

Squared Error ◽

Neurological Illness ◽

The World

Abstract Epilepsy is a chronic neurological illness that affects millions of people throughout the world. Epilepsy affects around 50 million people globally. It is estimated that if epilepsy is correctly diagnosed and treated, up to 70% of people with the condition will be seizure-free. There is a need to detect epilepsy at the initial stages to reduce symptoms by medications and other strategies. We use Epileptic Seizure Recognition dataset to train the model which is provided by UCI Machine Learning Repository. There are 179 attributes and 11,500 unique values in this dataset. MLP, PCA with RF, QDA, LDA, and PCA with ANN were applied among them; PCA with ANN provided the better metrics. For the metrics, we received the following findings. It is 97.55% Accuracy, 94.24% Precision, 91.48% recall, 83.38% hinge loss, and 2.32% mean squared error.

Download Full-text

DAuGAN: An Approach for Augmenting Time Series Imbalanced Datasets via Latent Space Sampling Using Adversarial Techniques

Scientific Programming ◽

10.1155/2021/7877590 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Andrei Bratu ◽

Gabriela Czibula

Keyword(s):

Machine Learning ◽

Time Series ◽

Data Science ◽

Data Augmentation ◽

Synthetic Data ◽

Generative Adversarial Networks ◽

Learning Agent ◽

Machine Learning Model ◽

Data Points ◽

And Performance

Data augmentation is a commonly used technique in data science for improving the robustness and performance of machine learning models. The purpose of the paper is to study the feasibility of generating synthetic data points of temporal nature towards this end. A general approach named DAuGAN (Data Augmentation using Generative Adversarial Networks) is presented for identifying poorly represented sections of a time series, studying the synthesis and integration of new data points, and performance improvement on a benchmark machine learning model. The problem is studied and applied in the domain of algorithmic trading, whose constraints are presented and taken into consideration. The experimental results highlight an improvement in performance on a benchmark reinforcement learning agent trained on a dataset enhanced with DAuGAN to trade a financial instrument.

Download Full-text

Application of Machine Learning Techniques As a Means of Mooring Integrity Monitoring

Volume 3: Structures, Safety, and Reliability ◽

10.1115/omae2019-96411 ◽

2019 ◽

Author(s):

Jonathan M. Gumley ◽

Hayden Marcollo ◽

Stuart Wales ◽

Andrew E. Potts ◽

Christopher J. Carra

Keyword(s):

Machine Learning ◽

Data Science ◽

Single Point ◽

Original System ◽

Training Data ◽

Machine Learning Techniques ◽

Mooring Line ◽

Artificial Noise ◽

Data Set ◽

Learning Techniques

Abstract There is growing importance in the offshore floating production sector to develop reliable and robust means of continuously monitoring the integrity of mooring systems for FPSOs and FPUs, particularly in light of the upcoming introduction of API-RP-2MIM. Here, the limitations of the current range of monitoring techniques are discussed, including well established technologies such as load cells, sonar, or visual inspection, within the context of the growing mainstream acceptance of data science and machine learning. Due to the large fleet of floating production platforms currently in service, there is a need for a readily deployable solution that can be retrofitted to existing platforms to passively monitor the performance of floating assets on their moorings, for which machine learning based systems have particular advantages. An earlier investigation conducted in 2016 on a shallow water, single point moored FPSO employed host facility data from in-service field measurements before and after a single mooring line failure event. This paper presents how the same machine learning techniques were applied to a deep water, semi taut, spread moored system where there was no host facility data available, therefore requiring a calibrated hydrodynamic numerical model to be used as the basis for the training data set. The machine learning techniques applied to both real and synthetically generated data were successful in replicating the response of the original system, even with the latter subjected to different variations of artificial noise. Furthermore, utilizing a probability-based approach, it was demonstrated that replicating the response of the underlying system was a powerful technique for predicting changes in the mooring system.

Download Full-text