scholarly journals Parameter Estimation with Data-Driven Nonparametric Likelihood Functions

Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 559 ◽  
Author(s):  
Shixiao W. Jiang ◽  
John Harlim

In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To respect the geometry of the data, we employ this spectral expansion using a set of data-driven basis functions obtained from the diffusion maps algorithm. The theoretical error estimate suggests that the error bound of the approximate data-driven likelihood function is independent of the variance of the basis functions, which allows us to determine the amount of training data for accurate likelihood function estimations. Supporting numerical results to demonstrate the robustness of the data-driven likelihood functions for parameter estimation are given on instructive examples involving stochastic and deterministic differential equations. When the dimension of the data manifold is strictly less than the dimension of the ambient space, we found that the proposed approach (which does not require the knowledge of the data manifold) is superior compared to likelihood functions constructed using standard parametric basis functions defined on the ambient coordinates. In an example where the data manifold is not smooth and unknown, the proposed method is more robust compared to an existing polynomial chaos surrogate model which assumes a parametric likelihood, the non-intrusive spectral projection. In fact, the estimation accuracy is comparable to direct MCMC estimates with only eight likelihood function evaluations that can be done offline as opposed to 4000 sequential function evaluations, whenever direct MCMC can be performed. A robust accurate estimation is also found using a likelihood function trained on statistical averages of the chaotic 40-dimensional Lorenz-96 model on a wide parameter domain.

2019 ◽  
Vol 63 (4) ◽  
pp. 283-293 ◽  
Author(s):  
Jack Weatheritt ◽  
Richard David Sandberg

A novel data-driven turbulence modeling framework is presented and applied to the problem of junction body flow. In particular, a symbolic regression approach is used to find nonlinear analytical expressions of the turbulent stress‐strain coupling that are ready for implementation in computational fluid dynamics (CFD) solvers using Reynolds-averaged Navier‐Stokes (RANS) closures. Results from baseline linear RANS closure calculations of a finite square-mounted cylinder with a Reynolds number of <inline-graphic xlink:href="josr09180053inf1.tif"/>, based on diameter and freestream velocity, are shown to considerably overpredict the separated flow region downstream of the square cylinder, mainly because of the failure of the model to accurately represent the complex vortex structure generated by the junction flow. In the present study, a symbolic regression tool built on a gene expression programming technique is used to find a nonlinear constitutive stress‐strain relationship. In short, the algorithm finds the most appropriate linear combination of basis functions and spatially varying coefficients that approximate the turbulent stress tensor from high-fidelity data. Here, the high-fidelity data, or the so-called training data, were obtained from a hybrid RANS/Large Eddy Simulation (LES) calculation also developed with symbolic regression that showed excellent agreement with direct numerical simulation data. The present study, therefore, also demonstrates that training data required for RANS closure development can be obtained using computationally more affordable approaches, such as hybrid RANS/LES. A procedure is presented to evaluate which of the individual basis functions that are available for model development are most likely to produce a successful nonlinear closure. A new model is built using those basis functions only. This new model is then tested, i.e., an actual CFD calculation is performed, on the well-known periodic hills case and produces significantly better results than the linear baseline model, despite this test case being fundamentally different from the training case. Finally, the new model is shown to also improve predictive accuracy for a surface-mounted cube placed in a channel at a cube height Reynolds number of <inline-graphic xlink:href="josr09180053inf2.tif"/> over traditional linear RANS closures.


2019 ◽  
Vol 9 (20) ◽  
pp. 4291 ◽  
Author(s):  
Mahammad Humayoo ◽  
Xueqi Cheng

Regularization is a popular technique in machine learning for model estimation and for avoiding overfitting. Prior studies have found that modern ordered regularization can be more effective in handling highly correlated, high-dimensional data than traditional regularization. The reason stems from the fact that the ordered regularization can reject irrelevant variables and yield an accurate estimation of the parameters. How to scale up the ordered regularization problems when facing large-scale training data remains an unanswered question. This paper explores the problem of parameter estimation with the ordered ℓ 2 -regularization via Alternating Direction Method of Multipliers (ADMM), called ADMM-O ℓ 2 . The advantages of ADMM-O ℓ 2 include (i) scaling up the ordered ℓ 2 to a large-scale dataset, (ii) predicting parameters correctly by excluding irrelevant variables automatically, and (iii) having a fast convergence rate. Experimental results on both synthetic data and real data indicate that ADMM-O ℓ 2 can perform better than or comparable to several state-of-the-art baselines.


2020 ◽  
Vol 4 (3) ◽  
pp. 92
Author(s):  
André Hürkamp ◽  
Sebastian Gellrich ◽  
Tim Ossowski ◽  
Jan Beuscher ◽  
Sebastian Thiede ◽  
...  

The design and development of composite structures requires precise and robust manufacturing processes. Composite materials such as fiber reinforced thermoplastics (FRTP) provide a good balance between manufacturing time, mechanical performance and weight. In this contribution, we investigate the process combination of thermoforming FRTP sheets (organo sheets) and injection overmolding of short FRTP for automotive structures. The limiting factor in those structures is the bond strength between the organo sheet and the overmolded thermoplastic. Within this process chain, even small deviations of the process settings (e.g., temperature) can lead to significant defects in the structure. A cyber physical production system based framework for a digital twin combining simulation and machine learning is presented. Based on parametric Finite-Element-Method (FEM) studies, training data for machine learning methods are generated and a FEM surrogate is developed. A comparison of different data-driven methods yields information on the estimation accuracy of task-specific data-driven methods. Finally, in accordance with experimental cross tension tests, the investigated FEM surrogate model is able to predict the interface bond strength quality in dependence of the process settings. The visualization into different quality domains qualifies the presented approach as decision support.


Author(s):  
Weicai Huang ◽  
Kaiming Yang ◽  
Yu Zhu ◽  
Xin Li ◽  
Haihua Mu ◽  
...  

Rational basis functions are introduced into iterative learning control to enhance the flexibility towards nonrepeating tasks. At present, the application of rational basis functions either suffers from nonconvex optimization problem or requires the predefinition of poles, which restricts the achievable performance. In this article, a new data-driven rational feedforward tuning approach is developed, in which convex optimization is realized without predefining the poles. Specifically, the optimal parameter which eliminates the reference-induced error is directly solved using the least square method. No parametric model is involved in the parameter tuning process and the optimal parameter is estimated using the measured data. In the noisy condition, it is proved that the estimated optimal parameter is unbiased and the estimation accuracy in terms of variance is analysed. The performance of the proposed approach is tested on an ultraprecision wafer stage. The experimental results confirm that high performance is achieved using the proposed approach.


2019 ◽  
Vol 11 (3) ◽  
pp. 284 ◽  
Author(s):  
Linglin Zeng ◽  
Shun Hu ◽  
Daxiang Xiang ◽  
Xiang Zhang ◽  
Deren Li ◽  
...  

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.


Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 324
Author(s):  
Haobin Jiang ◽  
Xijia Chen ◽  
Yifu Liu ◽  
Qian Zhao ◽  
Huanhuan Li ◽  
...  

Accurately estimating the online state-of-charge (SOC) of the battery is one of the crucial issues of the battery management system. In this paper, the gas–liquid dynamics (GLD) battery model with direct temperature input is selected to model Li(NiMnCo)O2 battery. The extended Kalman Filter (EKF) algorithm is elaborated to couple the offline model and online model to achieve the goal of quickly eliminating initial errors in the online SOC estimation. An implementation of the hybrid pulse power characterization test is performed to identify the offline parameters and determine the open-circuit voltage vs. SOC curve. Apart from the standard cycles including Constant Current cycle, Federal Urban Driving Schedule cycle, Urban Dynamometer Driving Schedule cycle and Dynamic Stress Test cycle, a combined cycle is constructed for experimental validation. Furthermore, the study of the effect of sampling time on estimation accuracy and the robustness analysis of the initial value are carried out. The results demonstrate that the proposed method realizes the accurate estimation of SOC with a maximum mean absolute error at 0.50% in five working conditions and shows strong robustness against the sparse sampling and input error.


Water ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 107
Author(s):  
Elahe Jamalinia ◽  
Faraz S. Tehrani ◽  
Susan C. Steele-Dunne ◽  
Philip J. Vardon

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.


2021 ◽  
Vol 13 (7) ◽  
pp. 168781402110277
Author(s):  
Yankai Hou ◽  
Zhaosheng Zhang ◽  
Peng Liu ◽  
Chunbao Song ◽  
Zhenpo Wang

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.


Author(s):  
Patrik Puchert ◽  
Pedro Hermosilla ◽  
Tobias Ritschel ◽  
Timo Ropinski

AbstractDensity estimation plays a crucial role in many data analysis tasks, as it infers a continuous probability density function (PDF) from discrete samples. Thus, it is used in tasks as diverse as analyzing population data, spatial locations in 2D sensor readings, or reconstructing scenes from 3D scans. In this paper, we introduce a learned, data-driven deep density estimation (DDE) to infer PDFs in an accurate and efficient manner, while being independent of domain dimensionality or sample size. Furthermore, we do not require access to the original PDF during estimation, neither in parametric form, nor as priors, or in the form of many samples. This is enabled by training an unstructured convolutional neural network on an infinite stream of synthetic PDFs, as unbound amounts of synthetic training data generalize better across a deck of natural PDFs than any natural finite training data will do. Thus, we hope that our publicly available DDE method will be beneficial in many areas of data analysis, where continuous models are to be estimated from discrete observations.


Sign in / Sign up

Export Citation Format

Share Document