scholarly journals Surrogate-Enhanced Parameter Inference for Function-Valued Models

2021 ◽  
Vol 3 (1) ◽  
pp. 11
Author(s):  
Christopher G. Albert ◽  
Ulrich Callies ◽  
Udo von Toussaint

We present an approach to enhance the performance and flexibility of the Bayesian inference of model parameters based on observations of the measured data. Going beyond the usual surrogate-enhanced Monte-Carlo or optimization methods that focus on a scalar loss, we place emphasis on a function-valued output of a formally infinite dimension. For this purpose, the surrogate models are built on a combination of linear dimensionality reduction in an adaptive basis of principal components and Gaussian process regression for the map between reduced feature spaces. Since the decoded surrogate provides the full model output rather than only the loss, it is re-usable for multiple calibration measurements as well as different loss metrics and, consequently, allows for flexible marginalization over such quantities and applications to Bayesian hierarchical models. We evaluate the method’s performance based on a case study of a toy model and a simple riverine diatom model for the Elbe river. As input data, this model uses six tunable scalar parameters as well as silica concentrations in the upper reach of the river together with the continuous time-series of temperature, radiation, and river discharge over a specific year. The output consists of continuous time-series data that are calibrated against corresponding measurements from the Geesthacht Weir station at the Elbe river. For this study, only two scalar inputs were considered together with a function-valued output and compared to an existing model calibration using direct simulation runs without a surrogate.

2007 ◽  
Vol 9 (1) ◽  
pp. 30-41 ◽  
Author(s):  
Nikhil S. Padhye ◽  
Sandra K. Hanneman

The application of cosinor models to long time series requires special attention. With increasing length of the time series, the presence of noise and drifts in rhythm parameters from cycle to cycle lead to rapid deterioration of cosinor models. The sensitivity of amplitude and model-fit to the data length is demonstrated for body temperature data from ambulatory menstrual cycling and menopausal women and from ambulatory male swine. It follows that amplitude comparisons between studies cannot be made independent of consideration of the data length. Cosinor analysis may be carried out on serial-sections of the series for improved model-fit and for tracking changes in rhythm parameters. Noise and drift reduction can also be achieved by folding the series onto a single cycle, which leads to substantial gains in the model-fit but lowers the amplitude. Central values of model parameters are negligibly changed by consideration of the autoregressive nature of residuals.


2016 ◽  
Vol 73 (4) ◽  
pp. 589-597 ◽  
Author(s):  
Michael A. Spence ◽  
Paul G. Blackwell ◽  
Julia L. Blanchard

Dynamic size spectrum models have been recognized as an effective way of describing how size-based interactions can give rise to the size structure of aquatic communities. They are intermediate-complexity ecological models that are solutions to partial differential equations driven by the size-dependent processes of predation, growth, mortality, and reproduction in a community of interacting species and sizes. To be useful for quantitative fisheries management these models need to be developed further in a formal statistical framework. Previous work has used time-averaged data to “calibrate” the model using optimization methods with the disadvantage of losing detailed time-series information. Using a published multispecies size spectrum model parameterized for the North Sea comprising 12 interacting fish species and a background resource, we fit the model to time-series data using a Bayesian framework for the first time. We capture the 1967–2010 period using annual estimates of fishing mortality rates as input to the model and time series of fisheries landings data to fit the model to output. We estimate 38 key parameters representing the carrying capacity of each species and background resource, as well as initial inputs of the dynamical system and errors on the model output. We then forecast the model forward to evaluate how uncertainty propagates through to population- and community-level indicators under alternative management strategies.


2008 ◽  
Vol 5 (25) ◽  
pp. 885-897 ◽  
Author(s):  
Simon Cauchemez ◽  
Neil M Ferguson

We present a new statistical approach to analyse epidemic time-series data. A major difficulty for inference is that (i) the latent transmission process is partially observed and (ii) observed quantities are further aggregated temporally. We develop a data augmentation strategy to tackle these problems and introduce a diffusion process that mimicks the susceptible–infectious–removed (SIR) epidemic process, but that is more tractable analytically. While methods based on discrete-time models require epidemic and data collection processes to have similar time scales, our approach, based on a continuous-time model, is free of such constraint. Using simulated data, we found that all parameters of the SIR model, including the generation time, were estimated accurately if the observation interval was less than 2.5 times the generation time of the disease. Previous discrete-time TSIR models have been unable to estimate generation times, given that they assume the generation time is equal to the observation interval. However, we were unable to estimate the generation time of measles accurately from historical data. This indicates that simple models assuming homogenous mixing (even with age structure) of the type which are standard in mathematical epidemiology miss key features of epidemics in large populations.


2018 ◽  
Vol 2 (2) ◽  
pp. 49-57
Author(s):  
Dwi Yulianti ◽  
I Made Sumertajaya ◽  
Itasia Dina Sulvianti

Generalized space time autoregressive integrated  moving average (GSTARIMA) model is a time series model of multiple variables with spatial and time linkages (space time). GSTARIMA model is an extension of the space time autoregressive integrated moving average (STARIMA) model with the assumption that each location has unique model parameters, thus GSTARIMA model is more flexible than STARIMA model. The purposes of this research are to determine the best model and predict the time series data of rice price on all provincial capitals of Sumatra island using GSTARIMA model. This research used weekly data of rice price on all provincial capitals of Sumatra island from January 2010 to December 2017. The spatial weights used in this research are the inverse distance and queen contiguity. The modeling result shows that the best model is GSTARIMA (1,1,0) with queen contiguity weighted matrix and has the smallest MAPE value of 1.17817 %.


2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


2020 ◽  
Vol 15 (3) ◽  
pp. 225-237
Author(s):  
Saurabh Kumar ◽  
Jitendra Kumar ◽  
Vikas Kumar Sharma ◽  
Varun Agiwal

This paper deals with the problem of modelling time series data with structural breaks occur at multiple time points that may result in varying order of the model at every structural break. A flexible and generalized class of Autoregressive (AR) models with multiple structural breaks is proposed for modelling in such situations. Estimation of model parameters are discussed in both classical and Bayesian frameworks. Since the joint posterior of the parameters is not analytically tractable, we employ a Markov Chain Monte Carlo method, Gibbs sampling to simulate posterior sample. To verify the order change, a hypotheses test is constructed using posterior probability and compared with that of without breaks. The methodologies proposed here are illustrated by means of simulation study and a real data analysis.


Author(s):  
Pasan Karunaratne ◽  
Masud Moshtaghi ◽  
Shanika Karunasekera ◽  
Aaron Harwood ◽  
Trevor Cohn

In time-series forecasting, regression is a popular method, with Gaussian Process Regression widely held to be the state of the art. The versatility of Gaussian Processes has led to them being used in many varied application domains. However, though many real-world applications involve data which follows a working-week structure, where weekends exhibit substantially different behavior to weekdays, methods for explicit modelling of working-week effects in Gaussian Process Regression models have not been proposed. Not explicitly modelling the working week fails to incorporate a significant source of information which can be invaluable in forecasting scenarios. In this work we provide novel kernel-combination methods to explicitly model working-week effects in time-series data for more accurate predictions using Gaussian Process Regression. Further, we demonstrate that prediction accuracy can be improved by constraining the non-convex optimization process of finding optimal hyperparameter values. We validate the effectiveness of our methods by performing multi-step prediction on two real-world publicly available time-series datasets - one relating to electricity Smart Meter data of the University of Melbourne, and the other relating to the counts of pedestrians in the City of Melbourne.


Author(s):  
Su Jiang ◽  
Mun-Hong Hui ◽  
Louis J. Durlofsky

Data-space inversion (DSI) is a data assimilation procedure that directly generates posterior flow predictions, for time series of interest, without calibrating model parameters. No forward flow simulation is performed in the data assimilation process. DSI instead uses the prior data generated by performing O(1000) simulations on prior geomodel realizations. Data parameterization is useful in the DSI framework as it enables representation of the correlated time-series data quantities in terms of low-dimensional latent-space variables. In this work, a recently developed parameterization based on a recurrent autoencoder (RAE) is applied with DSI for a real naturally fractured reservoir. The parameterization, involving the use of a recurrent neural network and an autoencoder, is able to capture important correlations in the time-series data. RAE training is accomplished using flow simulation results for 1,350 prior model realizations. An ensemble smoother with multiple data assimilation (ESMDA) is applied to provide posterior DSI data samples. The modeling in this work is much more complex than that considered in previous DSI studies as it includes multiple 3D discrete fracture realizations, three-phase flow, tracer injection and production, and complicated field-management logic leading to frequent well shut-in and reopening. Results for the reconstruction of new simulation data (not seen in training), using both the RAE-based parameterization and a simpler approach based on principal component analysis (PCA) with histogram transformation, are presented. The RAE-based procedure is shown to provide better accuracy for these data reconstructions. Detailed posterior DSI results are then presented for a particular “true” model (which is outside the prior ensemble), and summary results are provided for five additional “true” models that are consistent with the prior ensemble. These results again demonstrate the advantages of DSI with RAE-based parameterization for this challenging fractured reservoir case.


Author(s):  
Michael Hauser ◽  
Yiwei Fu ◽  
Shashi Phoha ◽  
Asok Ray

This paper makes use of long short-term memory (LSTM) neural networks for forecasting probability distributions of time series in terms of discrete symbols that are quantized from real-valued data. The developed framework formulates the forecasting problem into a probabilistic paradigm as hΘ: X × Y → [0, 1] such that ∑y∈YhΘ(x,y)=1, where X is the finite-dimensional state space, Y is the symbol alphabet, and Θ is the set of model parameters. The proposed method is different from standard formulations (e.g., autoregressive moving average (ARMA)) of time series modeling. The main advantage of formulating the problem in the symbolic setting is that density predictions are obtained without any significantly restrictive assumptions (e.g., second-order statistics). The efficacy of the proposed method has been demonstrated by forecasting probability distributions on chaotic time series data collected from a laboratory-scale experimental apparatus. Three neural architectures are compared, each with 100 different combinations of symbol-alphabet size and forecast length, resulting in a comprehensive evaluation of their relative performances.


Sign in / Sign up

Export Citation Format

Share Document