Techniques for Fast Screening of 3D Heterogeneous Shale Barrier Configurations and Their Impacts on SAGD Chamber Development

SPE Journal ◽

10.2118/199906-pa ◽

2021 ◽

pp. 1-25

Author(s):

Chang Gao ◽

Juliana Y. Leung

Keyword(s):

Distance Measure ◽

Flow Simulation ◽

Training Data ◽

Distance Measures ◽

Data Driven ◽

Data Set ◽

Flow Simulations ◽

Steam Chamber ◽

Reservoir Models ◽

Tracking Model

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.

Download Full-text

A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Water ◽

10.3390/w13010107 ◽

2021 ◽

Vol 13 (1) ◽

pp. 107

Author(s):

Elahe Jamalinia ◽

Faraz S. Tehrani ◽

Susan C. Steele-Dunne ◽

Philip J. Vardon

Keyword(s):

Numerical Simulation ◽

Water Flux ◽

Temporal Stability ◽

Synthetic Data ◽

Climatic Conditions ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Surface Cracking ◽

Real Time Analysis

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.

Download Full-text

The Influence of Geographic and Psychic Distance on Online Hotel Ratings

Journal of Travel Research ◽

10.1177/0047287519858400 ◽

2019 ◽

Vol 59 (4) ◽

pp. 722-741 ◽

Cited By ~ 2

Author(s):

Paul Phillips ◽

Nuno Antonio ◽

Ana de Almeida ◽

Luís Nunes

Keyword(s):

Text Mining ◽

Distance Measure ◽

Country Of Origin ◽

Geographic Distance ◽

Distance Measures ◽

Review Author ◽

Psychic Distance ◽

Rating Score ◽

Data Set ◽

The Relationship

This study examines the relationship between distance measures and a Portuguese data set consisting of 34,622 online hotel reviews extracted from Booking.com and TripAdvisor written in Portuguese, Spanish, and English. Based on the country of origin of each review author, a geographic and a psychic distance measure is calculated for Portugal. Data and text mining analysis provides additional insights into online hotel ratings. The authors confirm that online travelers’ evaluations are multifaceted constructs displaying varying patterns of rating behavior among the traveler base. By investigating the contemporary relevance of geographic and psychic distance, a key finding of this study is that travelers with less distance both in terms of psychic and geographic distance give a lower rating score than travelers with greater distance. The inclusion of psychic and geographic distance is advocated as a salient aspect for future researchers and for those practitioners who wish to enhance hotel product and service features.

Download Full-text

Calibration Approach Product Type Estimators of Population Mean in Stratified Sampling with Single Constraint: A Comparison of Three Distance Measures

Asian Journal of Probability and Statistics ◽

10.9734/ajpas/2021/v15i230350 ◽

2021 ◽

pp. 41-58

Author(s):

Enang, Ekaette Inyang ◽

Ojua, Doris Nkan ◽

T. T. Ojewale

Keyword(s):

Distance Measure ◽

Real Life ◽

High Gain ◽

Product Type ◽

Distance Measures ◽

Minimum Entropy ◽

Chi Square ◽

Data Set ◽

Population Mean ◽

Single Constraint

This study employed the method of calibration on product type estimator to propose calibration product type estimators using three distance measures namely; chi-square distance measure, the minimum entropy distance measure and the modified chi-square distance measure for single constraint. The estimators of variances of the proposed estimators were also obtained. An empirical study to ascertain the performance of these estimators was carried out using real life and stimulated data set. The result with the real life data showed that the proposed calibration product type estimator produced better estimates of the population mean compared to and . Results from the simulation study showed that the proposed calibration product type estimators had a high gain in efficiency as compared to the product type estimator. The simulation result also showed that the proposed estimators were more consistent and reliable under the Gamma and Exponential distributions with the exponential distribution taking the lead. The conventional product type estimator however was found to be better if the underlying distributional assumption is normal in nature.

Download Full-text

Semi-Supervised Learning With Co-Training for Data-Driven Prognostics

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48302 ◽

2011 ◽

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Taejin Kim

Keyword(s):

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Individual Data ◽

Data Set ◽

Failure Data ◽

Rich Information ◽

Useful Life ◽

Engineered Systems ◽

Systems Failure

Traditional data-driven prognostics often requires a large amount of failure data for the offline training in order to achieve good accuracy for the online prediction. However, in many engineered systems, failure data are fairly expensive and time-consuming to obtain while suspension data are readily available. In such cases, it becomes essentially critical to utilize suspension data, which may carry rich information regarding the degradation trend and help achieve more accurate remaining useful life (RUL) prediction. To this end, this paper proposes a co-training-based data-driven prognostic algorithm, denoted by Coprog, which uses two individual data-driven algorithms with each predicting RULs of suspension units for the other. The confidence of an individual data-driven algorithm in predicting the RUL of a suspension unit is quantified by the extent to which the inclusion of that unit in the training data set reduces the sum square error (SSE) in RUL prediction on the failure units. After a suspension unit is chosen and its RUL is predicted by an individual algorithm, it becomes a virtual failure unit that is added to the training data set. Results obtained from two case studies suggest that Coprog gives more accurate RUL predictions compared to any individual algorithm without the consideration of suspension data and that Coprog can effectively exploit suspension data to improve the accuracy in data-driven prognostics.

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

Modeling highway runoff pollutant levels using a data driven model

Water Science & Technology ◽

10.2166/wst.2009.289 ◽

2009 ◽

Vol 60 (1) ◽

pp. 19-28 ◽

Cited By ~ 8

Author(s):

T. Opher ◽

A. Ostfeld ◽

E. Friedler

Keyword(s):

Management Strategies ◽

Training Data ◽

Data Driven ◽

Environmental Research ◽

Good Prediction ◽

Runoff Water ◽

Highway Runoff ◽

Data Set ◽

Prediction Ability ◽

Road Pavement

Pollutants accumulated on road pavement during dry periods are washed off the surface with runoff water during rainfall events, presenting a potentially hazardous non-point source of pollution. Estimation of pollutant loads in these runoff waters is required for developing mitigation and management strategies, yet the numerous factors involved and their complex interconnected influences make straightforward assessment almost impossible. Data driven models (DDMs) have lately been used in water and environmental research and have shown very good prediction ability. The proposed methodology of a coupled MT-GA model provides an effective, accurate and easily calibrated predictive model for EMC of highway runoff pollutants. The models were trained and verified using a comprehensive data set of runoff events monitored in various highways in California, USA. EMCs of Cr, Pb, Zn, TOC and TSS were modeled, using different combinations of explanatory variables. The models' prediction ability in terms of correlation between predicted and actual values of both training and verification data was mostly higher than previously reported values. PbTotal was modeled with an outcome of R2 of 0.95 on training data and 0.43 on verification data. The developed model for TOC achieved R2 values of 0.91 and 0.49 on training and verification data respectively.

Download Full-text

An innovative picture fuzzy distance measure and novel multi-attribute decision-making method

Complex & Intelligent Systems ◽

10.1007/s40747-020-00235-3 ◽

2021 ◽

Author(s):

Abdul Haseeb Ganie ◽

Surender Singh

Keyword(s):

Decision Making ◽

Fuzzy Sets ◽

Distance Measure ◽

Real Data ◽

Distance Measures ◽

Classification Problems ◽

Data Set ◽

The Real ◽

Picture Fuzzy Set ◽

Multi Attribute Decision Making

AbstractPicture fuzzy set (PFS) is a direct generalization of the fuzzy sets (FSs) and intuitionistic fuzzy sets (IFSs). The concept of PFS is suitable to model the situations that involve more answers of the type yes, no, abstain, and refuse. In this study, we introduce a novel picture fuzzy (PF) distance measure on the basis of direct operation on the functions of membership, non-membership, neutrality, refusal, and the upper bound of the function of membership of two PFSs. We contrast the proposed PF distance measure with the existing PF distance measures and discuss the advantages in the pattern classification problems. The application of fuzzy and non-standard fuzzy models in the real data is very challenging as real data is always found in crisp form. Here, we also derive some conversion formulae to apply proposed method in the real data set. Moreover, we introduce a new multi-attribute decision-making (MADM) method using the proposed PF distance measure. In addition, we justify necessity of the newly proposed MADM method using appropriate counterintuitive examples. Finally, we contrast the performance of the proposed MADM method with the classical MADM methods in the PF environment.

Download Full-text

Data-Driven Nonlinear Constitutive Relations for Rarefied Flow Computations

10.21203/rs.3.rs-735668/v1 ◽

2021 ◽

Author(s):

Wenwen Zhao ◽

Lijian Jiang ◽

Shaobo Yao ◽

Weifang Chen

Keyword(s):

Heat Flux ◽

Rarefied Gas ◽

Constitutive Relations ◽

Kinetic Scheme ◽

Training Model ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Rarefied Flow ◽

Nonlinear Constitutive Relations

Abstract To overcome the defects of traditional rarefied numerical methods such as the Direct Simulation Monte Carlo (DSMC) method and unified Boltzmann equation schemes and extend the covering range of macroscopic equations in high Knudsen number flows, data-driven nonlinear constitutive relations (DNCR) are proposed firstly through machine learning method. Based on the training data from both Navier-Stokes (NS) solver and unified gas kinetic scheme (UGKS) solver, the map between discrepancies of stress tensors and heat flux and feature vectors is established after training phase. Through the obtained off-line training model, new test case excluded from training data set could be predicated rapidly and accurately by solving conventional equations with modified stress tensor and heat flux. Finally, conventional one-dimensional shock wave cases and two-dimensional hypersonic flows around a blunt circular cylinder are presented to assess the capability of the developed method through a various comparisons between DNCR, NS, UGKS, DSMC and experimental results. The improvement of the predictive capability of the coarse-graining model could make DNCR method to be an effective tool in rarefied gas community, especially for hypersonic engineering applications.

Download Full-text

An Ensemble Approach for Robust Data-Driven Prognostics

Volume 3: 38th Design Automation Conference, Parts A and B ◽

10.1115/detc2012-70529 ◽

2012 ◽

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Pingfeng Wang ◽

Joung Taek Yoon

Keyword(s):

Case Studies ◽

Nuclear Power ◽

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Error Estimator ◽

Data Set ◽

Weighting Schemes ◽

Ensemble Approach ◽

Testing Data

Prognostics aims at determining whether a failure of an engineered system (e.g., a nuclear power plant) is impending and estimating the remaining useful life (RUL) before the failure occurs. The traditional data-driven prognostic approach involves the following three steps: (Step 1) construct multiple candidate algorithms using a training data set; (Step 2) evaluate their respective performance using a testing data set; and (Step 3) select the one with the best performance while discarding all the others. There are three main challenges in the traditional data-driven prognostic approach: (i) lack of robustness in the selected standalone algorithm; (ii) waste of the resources for constructing the algorithms that are discarded; and (iii) demand for the testing data in addition to the training data. To address these challenges, this paper proposes an ensemble approach for data-driven prognostics. This approach combines multiple member algorithms with a weighted-sum formulation where the weights are estimated by using one of the three weighting schemes, namely the accuracy-based weighting, diversity-based weighting and optimization-based weighting. In order to estimate the prediction error required by the accuracy- and optimization-based weighting schemes, we propose the use of the k-fold cross validation (CV) as a robust error estimator. The performance of the proposed ensemble approach is verified with three engineering case studies. It can be seen from all the case studies that the ensemble approach achieves better accuracy in RUL predictions compared to any sole algorithm when the member algorithms with good diversity show comparable prediction accuracy.

Download Full-text

A Data-Driven Parameter Adaptive Clustering Algorithm Based on Density Peak

Complexity ◽

10.1155/2018/5232543 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14

Author(s):

Tao Du ◽

Shouning Qu ◽

Qin Wang

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Training Data ◽

Data Driven ◽

Density Peak ◽

Data Set ◽

Adaptive Clustering ◽

Key Factor ◽

Parameter Adaptive

Clustering is an important unsupervised machine learning method which can efficiently partition points without training data set. However, most of the existing clustering algorithms need to set parameters artificially, and the results of clustering are much influenced by these parameters, so optimizing clustering parameters is a key factor of improving clustering performance. In this paper, we propose a parameter adaptive clustering algorithm DDPA-DP which is based on density-peak algorithm. In DDPA-DP, all parameters can be adaptively adjusted based on the data-driven thought, and then the accuracy of clustering is highly improved, and the time complexity is not increased obviously. To prove the performance of DDPA-DP, a series of experiments are designed with some artificial data sets and a real application data set, and the clustering results of DDPA-DP are compared with some typical algorithms by these experiments. Based on these results, the accuracy of DDPA-DP has obvious advantage of all, and its time complexity is close to classical DP-Clust.

Download Full-text