An out-of-distribution-aware autoencoder model for reduced chemical kinetics

<p style='text-indent:20px;'>While detailed chemical kinetic models have been successful in representing rates of chemical reactions in continuum scale computational fluid dynamics (CFD) simulations, applying the models in simulations for engineering device conditions is computationally prohibitive. To reduce the cost, data-driven methods, e.g., autoencoders, have been used to construct reduced chemical kinetic models for CFD simulations. Despite their success, data-driven methods rely heavily on training data sets and can be unreliable when used in out-of-distribution (OOD) regions (i.e., when extrapolating outside of the training set). In this paper, we present an enhanced autoencoder model for combustion chemical kinetics with uncertainty quantification to enable the detection of model usage in OOD regions, and thereby creating an OOD-aware autoencoder model that contributes to more robust CFD simulations of reacting flows. We first demonstrate the effectiveness of the method in OOD detection in two well-known datasets, MNIST and Fashion-MNIST, in comparison with the deep ensemble method, and then present the OOD-aware autoencoder for reduced chemistry model in syngas combustion.</p>

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

Data Driven Prognostics With Lack of Training Data Sets

Volume 2A: 41st Design Automation Conference ◽

10.1115/detc2015-46932 ◽

2015 ◽

Cited By ~ 2

Author(s):

Zhimin Xi ◽

Xiangxue Zhao

Keyword(s):

Lithium Ion ◽

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

The Neural Network ◽

Typical Data ◽

Capacity Degradation ◽

Battery Capacity ◽

Network Similarity

Data-driven prognostics typically requires sufficient offline training data sets for accurate remaining useful life (RUL) prediction of engineering products. This paper investigates performances of typical data-driven methodologies when the amount of training data sets is insufficient. The purpose is to better understand these methodologies especially when offline training datasets are insufficient. The neural network, similarity-based approach, and copula-based sampling approach were investigated when only three run-to-failure training units were available. The example of lithium-ion (Li-ion) battery capacity degradation was employed for the demonstration.

Download Full-text

Egalitarian Kinetic Models: Concepts and Results

Energies ◽

10.3390/en14217230 ◽

2021 ◽

Vol 14 (21) ◽

pp. 7230

Author(s):

Denis Constales ◽

Gregory Yablonsky ◽

Yiming Xi ◽

Guy Marin

Keyword(s):

Chemical Kinetics ◽

Kinetic Models ◽

Chemical Kinetic ◽

Complex Chemical ◽

New Class ◽

Kinetic Coefficients ◽

Reversible Reactions ◽

Analytic Expressions ◽

Complex Chemical Reactions ◽

Main Ideas

In this paper, two main ideas of chemical kinetics are distinguished, i.e., a hierarchy and commensuration. A new class of chemical kinetic models is proposed and defined, i.e., egalitarian kinetic models (EKM). Contrary to hierarchical kinetic models (HKM), for the models of the EKM class, all kinetic coefficients are equal. Analysis of EKM models for some complex chemical reactions is performed for sequences of irreversible reactions. Analytic expressions for acyclic and cyclic mechanisms of egalitarian kinetics are obtained. Perspectives on the application of egalitarian models for reversible reactions are discussed. All analytical results are illustrated by examples.

Download Full-text

Data-Driven approaches to optimize chemical kinetic models

10.2514/6.2022-0225 ◽

2022 ◽

Author(s):

Keunsoo Kim ◽

Paxton W. Wiersema ◽

Je Ir Ryu ◽

Eric Mayhew ◽

Jacob Temme ◽

...

Keyword(s):

Kinetic Models ◽

Chemical Kinetic ◽

Data Driven

Download Full-text

COMPARATIVE STUDY OF DETAILED CHEMICAL KINETIC MODELS OF SOOT PRECURSORS FOR ETHYLENE/AIR COMBUSTION

10.26678/abcm.encit2016.cit2016-0073 ◽

2016 ◽

Author(s):

Luís Fernando Figueira da Silva ◽

Thiago Fabricius Konopka ◽

Cesar Celis

Keyword(s):

Comparative Study ◽

Kinetic Models ◽

Chemical Kinetic ◽

Soot Precursors ◽

Detailed Chemical

Download Full-text

A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Water ◽

10.3390/w13010107 ◽

2021 ◽

Vol 13 (1) ◽

pp. 107

Author(s):

Elahe Jamalinia ◽

Faraz S. Tehrani ◽

Susan C. Steele-Dunne ◽

Philip J. Vardon

Keyword(s):

Numerical Simulation ◽

Water Flux ◽

Temporal Stability ◽

Synthetic Data ◽

Climatic Conditions ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Surface Cracking ◽

Real Time Analysis

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Data-driven deep density estimation

Neural Computing and Applications ◽

10.1007/s00521-021-06281-3 ◽

2021 ◽

Author(s):

Patrik Puchert ◽

Pedro Hermosilla ◽

Tobias Ritschel ◽

Timo Ropinski

Keyword(s):

Data Analysis ◽

Density Estimation ◽

Population Data ◽

Training Data ◽

Data Driven ◽

Discrete Observations ◽

Efficient Manner ◽

Continuous Models ◽

3D Scans ◽

Spatial Locations

AbstractDensity estimation plays a crucial role in many data analysis tasks, as it infers a continuous probability density function (PDF) from discrete samples. Thus, it is used in tasks as diverse as analyzing population data, spatial locations in 2D sensor readings, or reconstructing scenes from 3D scans. In this paper, we introduce a learned, data-driven deep density estimation (DDE) to infer PDFs in an accurate and efficient manner, while being independent of domain dimensionality or sample size. Furthermore, we do not require access to the original PDF during estimation, neither in parametric form, nor as priors, or in the form of many samples. This is enabled by training an unstructured convolutional neural network on an infinite stream of synthetic PDFs, as unbound amounts of synthetic training data generalize better across a deck of natural PDFs than any natural finite training data will do. Thus, we hope that our publicly available DDE method will be beneficial in many areas of data analysis, where continuous models are to be estimated from discrete observations.

Download Full-text

Chemical Kinetic Models of Early Diagenesis

Journal of Geological Education ◽

10.5408/0022-1368-20.5.267 ◽

1972 ◽

Vol 20 (5) ◽

pp. 267-272 ◽

Cited By ~ 3

Author(s):

Robert A. Berner

Keyword(s):

Kinetic Models ◽

Chemical Kinetic ◽

Early Diagenesis

Download Full-text