scholarly journals Variational Inference over Nonstationary Data Streams for Exponential Family Models

Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 1942
Author(s):  
Andrés R. Masegosa ◽  
Darío Ramos-López ◽  
Antonio Salmerón ◽  
Helge Langseth ◽  
Thomas D. Nielsen

In many modern data analysis problems, the available data is not static but, instead, comes in a streaming fashion. Performing Bayesian inference on a data stream is challenging for several reasons. First, it requires continuous model updating and the ability to handle a posterior distribution conditioned on an unbounded data set. Secondly, the underlying data distribution may drift from one time step to another, and the classic i.i.d. (independent and identically distributed), or data exchangeability assumption does not hold anymore. In this paper, we present an approximate Bayesian inference approach using variational methods that addresses these issues for conjugate exponential family models with latent variables. Our proposal makes use of a novel scheme based on hierarchical priors to explicitly model temporal changes of the model parameters. We show how this approach induces an exponential forgetting mechanism with adaptive forgetting rates. The method is able to capture the smoothness of the concept drift, ranging from no drift to abrupt drift. The proposed variational inference scheme maintains the computational efficiency of variational methods over conjugate models, which is critical in streaming settings. The approach is validated on four different domains (energy, finance, geolocation, and text) using four real-world data sets.

2020 ◽  
Vol 70 (1) ◽  
pp. 145-161 ◽  
Author(s):  
Marnus Stoltz ◽  
Boris Baeumer ◽  
Remco Bouckaert ◽  
Colin Fox ◽  
Gordon Hiscott ◽  
...  

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]


2020 ◽  
pp. 1-22
Author(s):  
Luis E. Nieto-Barajas ◽  
Rodrigo S. Targino

ABSTRACT We propose a stochastic model for claims reserving that captures dependence along development years within a single triangle. This dependence is based on a gamma process with a moving average form of order $p \ge 0$ which is achieved through the use of poisson latent variables. We carry out Bayesian inference on model parameters and borrow strength across several triangles, coming from different lines of businesses or companies, through the use of hierarchical priors. We carry out a simulation study as well as a real data analysis. Results show that reserve estimates, for the real data set studied, are more accurate with our gamma dependence model as compared to the benchmark over-dispersed poisson that assumes independence.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 399
Author(s):  
Anna Pajor

Formal Bayesian comparison of two competing models, based on the posterior odds ratio, amounts to estimation of the Bayes factor, which is equal to the ratio of respective two marginal data density values. In models with a large number of parameters and/or latent variables, they are expressed by high-dimensional integrals, which are often computationally infeasible. Therefore, other methods of evaluation of the Bayes factor are needed. In this paper, a new method of estimation of the Bayes factor is proposed. Simulation examples confirm good performance of the proposed estimators. Finally, these new estimators are used to formally compare different hybrid Multivariate Stochastic Volatility–Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models which have a large number of latent variables. The empirical results show, among other things, that the validity of reduction of the hybrid MSV-MGARCH model to the MGARCH specification depends on the analyzed data set as well as on prior assumptions about model parameters.


2019 ◽  
Author(s):  
Nadheesh Jihan ◽  
Malith Jayasinghe ◽  
Srinath Perera

Online learning is an essential tool for predictive analysis based on continuous, endless data streams. Adopting Bayesian inference for online settings allows hierarchical modeling while representing the uncertainty of model parameters. Existing online inference techniques are motivated by either the traditional Bayesian updating or the stochastic optimizations. However, traditional Bayesian updating suffers from overconfidence posteriors, where posterior variance becomes too inadequate to adapt to new changes to the posterior. On the other hand, stochastic optimization of variational objective demands exhausting additional analysis to optimize a hyperparameter that controls the posterior variance. In this paper, we present ''Streaming Stochastic Variational Bayes" (SSVB)—a novel online approximation inference framework for data streaming to address the aforementioned shortcomings of the current state-of-the-art. SSVB adjusts its posterior variance duly without any user-specified hyperparameters while efficiently accommodating the drifting patterns to the posteriors. Moreover, SSVB can be easily adopted by practitioners for a wide range of models (i.e. simple regression models to complex hierarchical models) with little additional analysis. We appraised the performance of SSVB against Population Variational Inference (PVI), Stochastic Variational Inference (SVI) and Black-box Streaming Variational Bayes (BB-SVB) using two non-conjugate probabilistic models; multinomial logistic regression and linear mixed effect model. Furthermore, we also discuss the significant accuracy gain with SSVB based inference against conventional online learning models for each task.


2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


2017 ◽  
Vol 14 (134) ◽  
pp. 20170340 ◽  
Author(s):  
Aidan C. Daly ◽  
Jonathan Cooper ◽  
David J. Gavaghan ◽  
Chris Holmes

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.


Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 380
Author(s):  
Emanuele Cavenaghi ◽  
Gabriele Sottocornola ◽  
Fabio Stella ◽  
Markus Zanker

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.


2017 ◽  
Vol 37 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Haluk Ay ◽  
Anthony Luscher ◽  
Carolyn Sommerich

Purpose The purpose of this study is to design and develop a testing device to simulate interaction between human hand–arm dynamics, right-angle (RA) computer-controlled power torque tools and joint-tightening task-related variables. Design/methodology/approach The testing rig can simulate a variety of tools, tasks and operator conditions. The device includes custom data-acquisition electronics and graphical user interface-based software. The simulation of the human hand–arm dynamics is based on the rig’s four-bar mechanism-based design and mechanical components that provide adjustable stiffness (via pneumatic cylinder) and mass (via plates) and non-adjustable damping. The stiffness and mass values used are based on an experimentally validated hand–arm model that includes a database of model parameters. This database is with respect to gender and working posture, corresponding to experienced tool operators from a prior study. Findings The rig measures tool handle force and displacement responses simultaneously. Peak force and displacement coefficients of determination (R2) between rig estimations and human testing measurements were 0.98 and 0.85, respectively, for the same set of tools, tasks and operator conditions. The rig also provides predicted tool operator acceptability ratings, using a data set from a prior study of discomfort in experienced operators during torque tool use. Research limitations/implications Deviations from linearity may influence handle force and displacement measurements. Stiction (Coulomb friction) in the overall rig, as well as in the air cylinder piston, is neglected. The rig’s mechanical damping is not adjustable, despite the fact that human hand–arm damping varies with respect to gender and working posture. Deviations from these assumptions may affect the correlation of the handle force and displacement measurements with those of human testing for the same tool, task and operator conditions. Practical implications This test rig will allow the rapid assessment of the ergonomic performance of DC torque tools, saving considerable time in lineside applications and reducing the risk of worker injury. DC torque tools are an extremely effective way of increasing production rate and improving torque accuracy. Being a complex dynamic system, however, the performance of DC torque tools varies in each application. Changes in worker mass, damping and stiffness, as well as joint stiffness and tool program, make each application unique. This test rig models all of these factors and allows quick assessment. Social implications The use of this tool test rig will help to identify and understand risk factors that contribute to musculoskeletal disorders (MSDs) associated with the use of torque tools. Tool operators are subjected to large impulsive handle reaction forces, as joint torque builds up while tightening a fastener. Repeated exposure to such forces is associated with muscle soreness, fatigue and physical stress which are also risk factors for upper extremity injuries (MSDs; e.g. tendinosis, myofascial pain). Eccentric exercise exertions are known to cause damage to muscle tissue in untrained individuals and affect subsequent performance. Originality/value The rig provides a novel means for quantitative, repeatable dynamic evaluation of RA powered torque tools and objective selection of tightening programs. Compared to current static tool assessment methods, dynamic testing provides a more realistic tool assessment relative to the tool operator’s experience. This may lead to improvements in tool or controller design and reduction in associated musculoskeletal discomfort in operators.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Helena Mouriño ◽  
Maria Isabel Barão

Missing-data problems are extremely common in practice. To achieve reliable inferential results, we need to take into account this feature of the data. Suppose that the univariate data set under analysis has missing observations. This paper examines the impact of selecting an auxiliary complete data set—whose underlying stochastic process is to some extent interdependent with the former—to improve the efficiency of the estimators for the relevant parameters of the model. The Vector AutoRegressive (VAR) Model has revealed to be an extremely useful tool in capturing the dynamics of bivariate time series. We propose maximum likelihood estimators for the parameters of the VAR(1) Model based on monotone missing data pattern. Estimators’ precision is also derived. Afterwards, we compare the bivariate modelling scheme with its univariate counterpart. More precisely, the univariate data set with missing observations will be modelled by an AutoRegressive Moving Average (ARMA(2,1)) Model. We will also analyse the behaviour of the AutoRegressive Model of order one, AR(1), due to its practical importance. We focus on the mean value of the main stochastic process. By simulation studies, we conclude that the estimator based on the VAR(1) Model is preferable to those derived from the univariate context.


Geophysics ◽  
2016 ◽  
Vol 81 (4) ◽  
pp. U25-U38 ◽  
Author(s):  
Nuno V. da Silva ◽  
Andrew Ratcliffe ◽  
Vetle Vinje ◽  
Graham Conroy

Parameterization lies at the center of anisotropic full-waveform inversion (FWI) with multiparameter updates. This is because FWI aims to update the long and short wavelengths of the perturbations. Thus, it is important that the parameterization accommodates this. Recently, there has been an intensive effort to determine the optimal parameterization, centering the fundamental discussion mainly on the analysis of radiation patterns for each one of these parameterizations, and aiming to determine which is best suited for multiparameter inversion. We have developed a new parameterization in the scope of FWI, based on the concept of kinematically equivalent media, as originally proposed in other areas of seismic data analysis. Our analysis is also based on radiation patterns, as well as the relation between the perturbation of this set of parameters and perturbation in traveltime. The radiation pattern reveals that this parameterization combines some of the characteristics of parameterizations with one velocity and two Thomsen’s parameters and parameterizations using two velocities and one Thomsen’s parameter. The study of perturbation of traveltime with perturbation of model parameters shows that the new parameterization is less ambiguous when relating these quantities in comparison with other more commonly used parameterizations. We have concluded that our new parameterization is well-suited for inverting diving waves, which are of paramount importance to carry out practical FWI successfully. We have demonstrated that the new parameterization produces good inversion results with synthetic and real data examples. In the latter case of the real data example from the Central North Sea, the inverted models show good agreement with the geologic structures, leading to an improvement of the seismic image and flatness of the common image gathers.


Sign in / Sign up

Export Citation Format

Share Document