Data-driven nonlinear constitutive relations for rarefied flow computations

AbstractTo overcome the defects of traditional rarefied numerical methods such as the Direct Simulation Monte Carlo (DSMC) method and unified Boltzmann equation schemes and extend the covering range of macroscopic equations in high Knudsen number flows, data-driven nonlinear constitutive relations (DNCR) are proposed first through the machine learning method. Based on the training data from both Navier-Stokes (NS) solver and unified gas kinetic scheme (UGKS) solver, the map between responses of stress tensors and heat flux and feature vectors is established after the training phase. Through the obtained off-line training model, new test cases excluded from training data set could be predicated rapidly and accurately by solving conventional equations with modified stress tensor and heat flux. Finally, conventional one-dimensional shock wave cases and two-dimensional hypersonic flows around a blunt circular cylinder are presented to assess the capability of the developed method through various comparisons between DNCR, NS, UGKS, DSMC and experimental results. The improvement of the predictive capability of the coarse-graining model could make the DNCR method to be an effective tool in the rarefied gas community, especially for hypersonic engineering applications.

Download Full-text

Data-Driven Nonlinear Constitutive Relations for Rarefied Flow Computations

10.21203/rs.3.rs-735668/v1 ◽

2021 ◽

Author(s):

Wenwen Zhao ◽

Lijian Jiang ◽

Shaobo Yao ◽

Weifang Chen

Keyword(s):

Heat Flux ◽

Rarefied Gas ◽

Constitutive Relations ◽

Kinetic Scheme ◽

Training Model ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Rarefied Flow ◽

Nonlinear Constitutive Relations

Abstract To overcome the defects of traditional rarefied numerical methods such as the Direct Simulation Monte Carlo (DSMC) method and unified Boltzmann equation schemes and extend the covering range of macroscopic equations in high Knudsen number flows, data-driven nonlinear constitutive relations (DNCR) are proposed firstly through machine learning method. Based on the training data from both Navier-Stokes (NS) solver and unified gas kinetic scheme (UGKS) solver, the map between discrepancies of stress tensors and heat flux and feature vectors is established after training phase. Through the obtained off-line training model, new test case excluded from training data set could be predicated rapidly and accurately by solving conventional equations with modified stress tensor and heat flux. Finally, conventional one-dimensional shock wave cases and two-dimensional hypersonic flows around a blunt circular cylinder are presented to assess the capability of the developed method through a various comparisons between DNCR, NS, UGKS, DSMC and experimental results. The improvement of the predictive capability of the coarse-graining model could make DNCR method to be an effective tool in rarefied gas community, especially for hypersonic engineering applications.

Download Full-text

A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Water ◽

10.3390/w13010107 ◽

2021 ◽

Vol 13 (1) ◽

pp. 107

Author(s):

Elahe Jamalinia ◽

Faraz S. Tehrani ◽

Susan C. Steele-Dunne ◽

Philip J. Vardon

Keyword(s):

Numerical Simulation ◽

Water Flux ◽

Temporal Stability ◽

Synthetic Data ◽

Climatic Conditions ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Surface Cracking ◽

Real Time Analysis

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.

Download Full-text

Techniques for Fast Screening of 3D Heterogeneous Shale Barrier Configurations and Their Impacts on SAGD Chamber Development

SPE Journal ◽

10.2118/199906-pa ◽

2021 ◽

pp. 1-25

Author(s):

Chang Gao ◽

Juliana Y. Leung

Keyword(s):

Distance Measure ◽

Flow Simulation ◽

Training Data ◽

Distance Measures ◽

Data Driven ◽

Data Set ◽

Flow Simulations ◽

Steam Chamber ◽

Reservoir Models ◽

Tracking Model

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.

Download Full-text

Fast Linear Adaptive Skipping Training Algorithm for Training Artificial Neural Network

Mathematical Problems in Engineering ◽

10.1155/2013/346949 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

R. Manjula Devi ◽

S. Kuppuswami ◽

R. C. Suganthe

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Training Model ◽

Training Data ◽

Experimental Result ◽

Training Algorithms ◽

Data Set ◽

Training Time ◽

Input Sample ◽

Artificial Neural

Artificial neural network has been extensively consumed training model for solving pattern recognition tasks. However, training a very huge training data set using complex neural network necessitates excessively high training time. In this correspondence, a new fast Linear Adaptive Skipping Training (LAST) algorithm for training artificial neural network (ANN) is instituted. The core essence of this paper is to ameliorate the training speed of ANN by exhibiting only the input samples that do not categorize perfectly in the previous epoch which dynamically reducing the number of input samples exhibited to the network at every single epoch without affecting the network’s accuracy. Thus decreasing the size of the training set can reduce the training time, thereby ameliorating the training speed. This LAST algorithm also determines how many epochs the particular input sample has to skip depending upon the successful classification of that input sample. This LAST algorithm can be incorporated into any supervised training algorithms. Experimental result shows that the training speed attained by LAST algorithm is preferably higher than that of other conventional training algorithms.

Download Full-text

Semi-Supervised Learning With Co-Training for Data-Driven Prognostics

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48302 ◽

2011 ◽

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Taejin Kim

Keyword(s):

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Individual Data ◽

Data Set ◽

Failure Data ◽

Rich Information ◽

Useful Life ◽

Engineered Systems ◽

Systems Failure

Traditional data-driven prognostics often requires a large amount of failure data for the offline training in order to achieve good accuracy for the online prediction. However, in many engineered systems, failure data are fairly expensive and time-consuming to obtain while suspension data are readily available. In such cases, it becomes essentially critical to utilize suspension data, which may carry rich information regarding the degradation trend and help achieve more accurate remaining useful life (RUL) prediction. To this end, this paper proposes a co-training-based data-driven prognostic algorithm, denoted by Coprog, which uses two individual data-driven algorithms with each predicting RULs of suspension units for the other. The confidence of an individual data-driven algorithm in predicting the RUL of a suspension unit is quantified by the extent to which the inclusion of that unit in the training data set reduces the sum square error (SSE) in RUL prediction on the failure units. After a suspension unit is chosen and its RUL is predicted by an individual algorithm, it becomes a virtual failure unit that is added to the training data set. Results obtained from two case studies suggest that Coprog gives more accurate RUL predictions compared to any individual algorithm without the consideration of suspension data and that Coprog can effectively exploit suspension data to improve the accuracy in data-driven prognostics.

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

Modeling highway runoff pollutant levels using a data driven model

Water Science & Technology ◽

10.2166/wst.2009.289 ◽

2009 ◽

Vol 60 (1) ◽

pp. 19-28 ◽

Cited By ~ 8

Author(s):

T. Opher ◽

A. Ostfeld ◽

E. Friedler

Keyword(s):

Management Strategies ◽

Training Data ◽

Data Driven ◽

Environmental Research ◽

Good Prediction ◽

Runoff Water ◽

Highway Runoff ◽

Data Set ◽

Prediction Ability ◽

Road Pavement

Pollutants accumulated on road pavement during dry periods are washed off the surface with runoff water during rainfall events, presenting a potentially hazardous non-point source of pollution. Estimation of pollutant loads in these runoff waters is required for developing mitigation and management strategies, yet the numerous factors involved and their complex interconnected influences make straightforward assessment almost impossible. Data driven models (DDMs) have lately been used in water and environmental research and have shown very good prediction ability. The proposed methodology of a coupled MT-GA model provides an effective, accurate and easily calibrated predictive model for EMC of highway runoff pollutants. The models were trained and verified using a comprehensive data set of runoff events monitored in various highways in California, USA. EMCs of Cr, Pb, Zn, TOC and TSS were modeled, using different combinations of explanatory variables. The models' prediction ability in terms of correlation between predicted and actual values of both training and verification data was mostly higher than previously reported values. PbTotal was modeled with an outcome of R2 of 0.95 on training data and 0.43 on verification data. The developed model for TOC achieved R2 values of 0.91 and 0.49 on training and verification data respectively.

Download Full-text

An Ensemble Approach for Robust Data-Driven Prognostics

Volume 3: 38th Design Automation Conference, Parts A and B ◽

10.1115/detc2012-70529 ◽

2012 ◽

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Pingfeng Wang ◽

Joung Taek Yoon

Keyword(s):

Case Studies ◽

Nuclear Power ◽

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Error Estimator ◽

Data Set ◽

Weighting Schemes ◽

Ensemble Approach ◽

Testing Data

Prognostics aims at determining whether a failure of an engineered system (e.g., a nuclear power plant) is impending and estimating the remaining useful life (RUL) before the failure occurs. The traditional data-driven prognostic approach involves the following three steps: (Step 1) construct multiple candidate algorithms using a training data set; (Step 2) evaluate their respective performance using a testing data set; and (Step 3) select the one with the best performance while discarding all the others. There are three main challenges in the traditional data-driven prognostic approach: (i) lack of robustness in the selected standalone algorithm; (ii) waste of the resources for constructing the algorithms that are discarded; and (iii) demand for the testing data in addition to the training data. To address these challenges, this paper proposes an ensemble approach for data-driven prognostics. This approach combines multiple member algorithms with a weighted-sum formulation where the weights are estimated by using one of the three weighting schemes, namely the accuracy-based weighting, diversity-based weighting and optimization-based weighting. In order to estimate the prediction error required by the accuracy- and optimization-based weighting schemes, we propose the use of the k-fold cross validation (CV) as a robust error estimator. The performance of the proposed ensemble approach is verified with three engineering case studies. It can be seen from all the case studies that the ensemble approach achieves better accuracy in RUL predictions compared to any sole algorithm when the member algorithms with good diversity show comparable prediction accuracy.

Download Full-text

Switching criteria for hybrid rarefied gas flow solvers

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2008.0497 ◽

2009 ◽

Vol 465 (2105) ◽

pp. 1581-1598 ◽

Cited By ~ 20

Author(s):

Duncan A. Lockerby ◽

Jason M. Reese ◽

Henning Struchtrup

Keyword(s):

Heat Flux ◽

Gas Flow ◽

Rarefied Gas ◽

Constitutive Relations ◽

Computational Cost ◽

Navier Stokes ◽

Molecular Gas ◽

Test Cases ◽

Similar Criterion ◽

The Difference

Switching criteria for hybrid hydrodynamic/molecular gas flow solvers are developed, and are demonstrated to be more appropriate than conventional criteria for identifying thermodynamic non-equilibrium. For switching from a molecular/kinetic solver to a hydrodynamic (continuum-fluid) solver, the criterion is based on the difference between the hydrodynamic near-equilibrium fluxes (i.e. the Navier–Stokes stress and Fourier heat flux) and the actual values of stress and heat flux as computed from the molecular solver. For switching from hydrodynamics to molecular/kinetic, a similar criterion is used but the values of stress and heat flux are approximated through higher order constitutive relations; in this case, we use the R13 equations. The efficacy of our proposed switching criteria is tested within an illustrative hybrid kinetic/Navier–Stokes solver. For the test cases investigated, the results from the hybrid procedure compare very well with the full kinetic solution, and are obtained at a fraction of the computational cost.

Download Full-text

A Data-Driven Parameter Adaptive Clustering Algorithm Based on Density Peak

Complexity ◽

10.1155/2018/5232543 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14

Author(s):

Tao Du ◽

Shouning Qu ◽

Qin Wang

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Training Data ◽

Data Driven ◽

Density Peak ◽

Data Set ◽

Adaptive Clustering ◽

Key Factor ◽

Parameter Adaptive

Clustering is an important unsupervised machine learning method which can efficiently partition points without training data set. However, most of the existing clustering algorithms need to set parameters artificially, and the results of clustering are much influenced by these parameters, so optimizing clustering parameters is a key factor of improving clustering performance. In this paper, we propose a parameter adaptive clustering algorithm DDPA-DP which is based on density-peak algorithm. In DDPA-DP, all parameters can be adaptively adjusted based on the data-driven thought, and then the accuracy of clustering is highly improved, and the time complexity is not increased obviously. To prove the performance of DDPA-DP, a series of experiments are designed with some artificial data sets and a real application data set, and the clustering results of DDPA-DP are compared with some typical algorithms by these experiments. Based on these results, the accuracy of DDPA-DP has obvious advantage of all, and its time complexity is close to classical DP-Clust.

Download Full-text