Application of Machine Learning to Interpret Steady State Drainage Relative Permeability Experiments

2021 ◽  
Author(s):  
Eric Sonny Mathew ◽  
Moussa Tembely ◽  
Waleed AlAmeri ◽  
Emad W. Al-Shalabi ◽  
Abdul Ravoof Shaik

Abstract A meticulous interpretation of steady-state or unsteady-state relative permeability (Kr) experimental data is required to determine a complete set of Kr curves. In this work, three different machine learning models was developed to assist in a faster estimation of these curves from steady-state drainage coreflooding experimental runs. The three different models that were tested and compared were extreme gradient boosting (XGB), deep neural network (DNN) and recurrent neural network (RNN) algorithms. Based on existing mathematical models, a leading edge framework was developed where a large database of Kr and Pc curves were generated. This database was used to perform thousands of coreflood simulation runs representing oil-water drainage steady-state experiments. The results obtained from these simulation runs, mainly pressure drop along with other conventional core analysis data, were utilized to estimate Kr curves based on Darcy's law. These analytically estimated Kr curves along with the previously generated Pc curves were fed as features into the machine learning model. The entire data set was split into 80% for training and 20% for testing. K-fold cross validation technique was applied to increase the model accuracy by splitting the 80% of the training data into 10 folds. In this manner, for each of the 10 experiments, 9 folds were used for training and the remaining one was used for model validation. Once the model is trained and validated, it was subjected to blind testing on the remaining 20% of the data set. The machine learning model learns to capture fluid flow behavior inside the core from the training dataset. The trained/tested model was thereby employed to estimate Kr curves based on available experimental results. The performance of the developed model was assessed using the values of the coefficient of determination (R2) along with the loss calculated during training/validation of the model. The respective cross plots along with comparisons of ground-truth versus AI predicted curves indicate that the model is capable of making accurate predictions with error percentage between 0.2 and 0.6% on history matching experimental data for all the three tested ML techniques (XGB, DNN, and RNN). This implies that the AI-based model exhibits better efficiency and reliability in determining Kr curves when compared to conventional methods. The results also include a comparison between classical machine learning approaches, shallow and deep neural networks in terms of accuracy in predicting the final Kr curves. The various models discussed in this research work currently focusses on the prediction of Kr curves for drainage steady-state experiments; however, the work can be extended to capture the imbibition cycle as well.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Chalachew Muluken Liyew ◽  
Haileyesus Amsaya Melese

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.


Author(s):  
Dr. Kalaivazhi Vijayaragavan ◽  
S. Prakathi ◽  
S. Rajalakshmi ◽  
M Sandhiya

Machine learning is a subfield of artificial intelligence, which is learning algorithms to make decision-based on data and try to behave like a human being. Classification is one of the most fundamental concepts in machine learning. It is a process of recognizing, understanding, and grouping ideas and objects into pre-set categories or sub-populations. Using precategorized training datasets, machine learning concept use variety of algorithms to classify the future datasets into categories. Classification algorithms use input training data in machine learning to predict the subsequent data that fall into one of the predetermined categories. To improve the classification accuracy design of neural network is regarded as effective model to obtain better accuracy. However, design of neural network is usually consider scaling layer, perceptron layers and probabilistic layer. In this paper, an enhanced model selection can be evaluated with training and testing strategy. Further, the classification accuracy can be predicted. Finally by using two popular machine learning frameworks: PyTorch and Tensor Flow the prediction of classification accuracy is compared. Results demonstrate that the proposed method can predict with more accuracy. After the deployment of our machine learning model the performance of the model has been evaluated with the help of iris data set.


2020 ◽  
Vol 9 (3) ◽  
pp. 658 ◽  
Author(s):  
Jun-Cheng Weng ◽  
Tung-Yeh Lin ◽  
Yuan-Hsiung Tsai ◽  
Man Teng Cheok ◽  
Yi-Peng Eve Chang ◽  
...  

It is estimated that at least one million people die by suicide every year, showing the importance of suicide prevention and detection. In this study, an autoencoder and machine learning model was employed to predict people with suicidal ideation based on their structural brain imaging. The subjects in our generalized q-sampling imaging (GQI) dataset consisted of three groups: 41 depressive patients with suicidal ideation (SI), 54 depressive patients without suicidal thoughts (NS), and 58 healthy controls (HC). In the GQI dataset, indices of generalized fractional anisotropy (GFA), isotropic values of the orientation distribution function (ISO), and normalized quantitative anisotropy (NQA) were separately trained in different machine learning models. A convolutional neural network (CNN)-based autoencoder model, the supervised machine learning algorithm extreme gradient boosting (XGB), and logistic regression (LR) were used to discriminate SI subjects from NS and HC subjects. After five-fold cross validation, separate data were tested to obtain the accuracy, sensitivity, specificity, and area under the curve of each result. Our results showed that the best pattern of structure across multiple brain locations can classify suicidal ideates from NS and HC with a prediction accuracy of 85%, a specificity of 100% and a sensitivity of 75%. The algorithms developed here might provide an objective tool to help identify suicidal ideation risk among depressed patients alongside clinical assessment.


2021 ◽  
Vol 11 (11) ◽  
pp. 1055
Author(s):  
Pei-Chen Lin ◽  
Kuo-Tai Chen ◽  
Huan-Chieh Chen ◽  
Md. Mohaimenul Islam ◽  
Ming-Chin Lin

Accurate stratification of sepsis can effectively guide the triage of patient care and shared decision making in the emergency department (ED). However, previous research on sepsis identification models focused mainly on ICU patients, and discrepancies in model performance between the development and external validation datasets are rarely evaluated. The aim of our study was to develop and externally validate a machine learning model to stratify sepsis patients in the ED. We retrospectively collected clinical data from two geographically separate institutes that provided a different level of care at different time periods. The Sepsis-3 criteria were used as the reference standard in both datasets for identifying true sepsis cases. An eXtreme Gradient Boosting (XGBoost) algorithm was developed to stratify sepsis patients and the performance of the model was compared with traditional clinical sepsis tools; quick Sequential Organ Failure Assessment (qSOFA) and Systemic Inflammatory Response Syndrome (SIRS). There were 8296 patients (1752 (21%) being septic) in the development and 1744 patients (506 (29%) being septic) in the external validation datasets. The mortality of septic patients in the development and validation datasets was 13.5% and 17%, respectively. In the internal validation, XGBoost achieved an area under the receiver operating characteristic curve (AUROC) of 0.86, exceeding SIRS (0.68) and qSOFA (0.56). The performance of XGBoost deteriorated in the external validation (the AUROC of XGBoost, SIRS and qSOFA was 0.75, 0.57 and 0.66, respectively). Heterogeneity in patient characteristics, such as sepsis prevalence, severity, age, comorbidity and infection focus, could reduce model performance. Our model showed good discriminative capabilities for the identification of sepsis patients and outperformed the existing sepsis identification tools. Implementation of the ML model in the ED can facilitate timely sepsis identification and treatment. However, dataset discrepancies should be carefully evaluated before implementing the ML approach in clinical practice. This finding reinforces the necessity for future studies to perform external validation to ensure the generalisability of any developed ML approaches.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


10.2196/18142 ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. e18142
Author(s):  
Ramin Mohammadi ◽  
Mursal Atif ◽  
Amanda Jayne Centi ◽  
Stephen Agboola ◽  
Kamal Jethwani ◽  
...  

Background It is well established that lack of physical activity is detrimental to the overall health of an individual. Modern-day activity trackers enable individuals to monitor their daily activities to meet and maintain targets. This is expected to promote activity encouraging behavior, but the benefits of activity trackers attenuate over time due to waning adherence. One of the key approaches to improving adherence to goals is to motivate individuals to improve on their historic performance metrics. Objective The aim of this work was to build a machine learning model to predict an achievable weekly activity target by considering (1) patterns in the user’s activity tracker data in the previous week and (2) behavior and environment characteristics. By setting realistic goals, ones that are neither too easy nor too difficult to achieve, activity tracker users can be encouraged to continue to meet these goals, and at the same time, to find utility in their activity tracker. Methods We built a neural network model that prescribes a weekly activity target for an individual that can be realistically achieved. The inputs to the model were user-specific personal, social, and environmental factors, daily step count from the previous 7 days, and an entropy measure that characterized the pattern of daily step count. Data for training and evaluating the machine learning model were collected over a duration of 9 weeks. Results Of 30 individuals who were enrolled, data from 20 participants were used. The model predicted target daily count with a mean absolute error of 1545 (95% CI 1383-1706) steps for an 8-week period. Conclusions Artificial intelligence applied to physical activity data combined with behavioral data can be used to set personalized goals in accordance with the individual’s level of activity and thereby improve adherence to a fitness tracker; this could be used to increase engagement with activity trackers. A follow-up prospective study is ongoing to determine the performance of the engagement algorithm.


Diagnostics ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 2102
Author(s):  
Eyal Klang ◽  
Robert Freeman ◽  
Matthew A. Levin ◽  
Shelly Soffer ◽  
Yiftach Barash ◽  
...  

Background & Aims: We aimed at identifying specific emergency department (ED) risk factors for developing complicated acute diverticulitis (AD) and evaluate a machine learning model (ML) for predicting complicated AD. Methods: We analyzed data retrieved from unselected consecutive large bowel AD patients from five hospitals from the Mount Sinai health system, NY. The study time frame was from January 2011 through March 2021. Data were used to train and evaluate a gradient-boosting machine learning model to identify patients with complicated diverticulitis, defined as a need for invasive intervention or in-hospital mortality. The model was trained and evaluated on data from four hospitals and externally validated on held-out data from the fifth hospital. Results: The final cohort included 4997 AD visits. Of them, 129 (2.9%) visits had complicated diverticulitis. Patients with complicated diverticulitis were more likely to be men, black, and arrive by ambulance. Regarding laboratory values, patients with complicated diverticulitis had higher levels of absolute neutrophils (AUC 0.73), higher white blood cells (AUC 0.70), platelet count (AUC 0.68) and lactate (AUC 0.61), and lower levels of albumin (AUC 0.69), chloride (AUC 0.64), and sodium (AUC 0.61). In the external validation cohort, the ML model showed AUC 0.85 (95% CI 0.78–0.91) for predicting complicated diverticulitis. For Youden’s index, the model showed a sensitivity of 88% with a false positive rate of 1:3.6. Conclusions: A ML model trained on clinical measures provides a proof of concept performance in predicting complications in patients presenting to the ED with AD. Clinically, it implies that a ML model may classify low-risk patients to be discharged from the ED for further treatment under an ambulatory setting.


2020 ◽  
Vol 58 (6) ◽  
pp. 413-422
Author(s):  
Jinyeong Yu ◽  
Myoungjae Lee ◽  
Young Hoon Moon ◽  
Yoojeong Noh ◽  
Taekyung Lee

Electropulse-induced heating has attracted attention due to its high energy efficiency. However, the process gives rise to a nonlinear temperature variation, which is difficult to predict using a traditional physics model. As an alternative, this study employed machine-learning technology to predict such temperature variation for the first time. Mg alloy was exposed to a single electropulse with a variety of pulse magnitudes and durations for this purpose. Nine machine-learning models were established from algorithms from artificial neural network (ANN), deep neural network (DNN), and extreme gradient boosting (XGBoost). The ANN models showed an insufficient predicting capability with respect to the region of peak temperature, where temperature varied most significantly. The DNN models were built by increasing model complexity, enhancing architectures, and tuning hyperparameters. They exhibited a remarkable improvement in predicting capability at the heating-cooling boundary as well as overall estimation. As a result, the DNN-2 model in this group showed the best prediction of nonlinear temperature variation among the machinelearning models built in this study. The XGBoost model exhibited poor predicting performance when default hyperparameters were applied. However, hyperparameter tuning of learning rates and maximum depths resulted in a decent predicting capability with this algorithm. Furthermore, XGBoost models exhibited an extreme reduction in learning time compared with the ANN and DNN models. This advantage is expected to be useful for predicting more complicated cases including various materials, multi-step electropulses, and electrically-assisted forming.


Sign in / Sign up

Export Citation Format

Share Document