scholarly journals Streaming stochastic variational Bayes; An improved approach for Bayesian inference with data streams

Author(s):  
Nadheesh Jihan ◽  
Malith Jayasinghe ◽  
Srinath Perera

Online learning is an essential tool for predictive analysis based on continuous, endless data streams. Adopting Bayesian inference for online settings allows hierarchical modeling while representing the uncertainty of model parameters. Existing online inference techniques are motivated by either the traditional Bayesian updating or the stochastic optimizations. However, traditional Bayesian updating suffers from overconfidence posteriors, where posterior variance becomes too inadequate to adapt to new changes to the posterior. On the other hand, stochastic optimization of variational objective demands exhausting additional analysis to optimize a hyperparameter that controls the posterior variance. In this paper, we present ''Streaming Stochastic Variational Bayes" (SSVB)—a novel online approximation inference framework for data streaming to address the aforementioned shortcomings of the current state-of-the-art. SSVB adjusts its posterior variance duly without any user-specified hyperparameters while efficiently accommodating the drifting patterns to the posteriors. Moreover, SSVB can be easily adopted by practitioners for a wide range of models (i.e. simple regression models to complex hierarchical models) with little additional analysis. We appraised the performance of SSVB against Population Variational Inference (PVI), Stochastic Variational Inference (SVI) and Black-box Streaming Variational Bayes (BB-SVB) using two non-conjugate probabilistic models; multinomial logistic regression and linear mixed effect model. Furthermore, we also discuss the significant accuracy gain with SSVB based inference against conventional online learning models for each task.

2019 ◽  
Author(s):  
Nadheesh Jihan ◽  
Malith Jayasinghe ◽  
Srinath Perera

Online learning is an essential tool for predictive analysis based on continuous, endless data streams. Adopting Bayesian inference for online settings allows hierarchical modeling while representing the uncertainty of model parameters. Existing online inference techniques are motivated by either the traditional Bayesian updating or the stochastic optimizations. However, traditional Bayesian updating suffers from overconfident posteriors, where posterior variance becomes too inadequate to adapt to new changes to the posterior with concept drifting data streams. On the other hand, stochastic optimization of variational objective demands exhausting additional analysis to optimize a hyperparameter that controls the posterior variance. In this paper, we present "Streaming Stochastic Variational Bayes" (SSVB) — a novel online approximation inference framework for data streaming to address the aforementioned shortcomings of the current state-of-the-art. SSVB adjusts its posterior variance duly without any user-specified hyperparameters to control the posterior variance while efficiently accommodating the drifting patterns to the posteriors. Moreover, SSVB can be easily adopted by practitioners for a wide range of models (i.e. simple regression models to complex hierarchical models) with little additional analysis. We demonstrate the superior performance of SSVB against Population Variational Inference (PVI), Stochastic Variational Inference (SVI) and Black-box Streaming Variational Bayes (BB-SVB) using two non-conjugate probabilistic models: multinomial logistic regression and linear mixed effect model. Furthermore, we also emphasize the significant accuracy gain with SSVB based inference against conventional online learning models for each task.


2019 ◽  
Author(s):  
Nadheesh Jihan ◽  
Malith Jayasinghe ◽  
Srinath Perera

Online learning is an essential tool for predictive analysis based on continuous, endless data streams. Adopting Bayesian inference for online settings allows hierarchical modeling while representing the uncertainty of model parameters. Existing online inference techniques are motivated by either the traditional Bayesian updating or the stochastic optimizations. However, traditional Bayesian updating suffers from overconfident posteriors, where posterior variance becomes too inadequate to adapt to new changes to the posterior with concept drifting data streams. On the other hand, stochastic optimization of variational objective demands exhausting additional analysis to optimize a hyperparameter that controls the posterior variance. In this paper, we present "Streaming Stochastic Variational Bayes" (SSVB) — a novel online approximation inference framework for data streaming to address the aforementioned shortcomings of the current state-of-the-art. SSVB adjusts its posterior variance duly without any user-specified hyperparameters to control the posterior variance while efficiently accommodating the drifting patterns to the posteriors. Moreover, SSVB can be easily adopted by practitioners for a wide range of models (i.e. simple regression models to complex hierarchical models) with little additional analysis. We demonstrate the superior performance of SSVB against Population Variational Inference (PVI), Stochastic Variational Inference (SVI) and Black-box Streaming Variational Bayes (BB-SVB) using two non-conjugate probabilistic models: multinomial logistic regression and linear mixed effect model. Furthermore, we also emphasize the significant accuracy gain with SSVB based inference against conventional online learning models for each task.


Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 1942
Author(s):  
Andrés R. Masegosa ◽  
Darío Ramos-López ◽  
Antonio Salmerón ◽  
Helge Langseth ◽  
Thomas D. Nielsen

In many modern data analysis problems, the available data is not static but, instead, comes in a streaming fashion. Performing Bayesian inference on a data stream is challenging for several reasons. First, it requires continuous model updating and the ability to handle a posterior distribution conditioned on an unbounded data set. Secondly, the underlying data distribution may drift from one time step to another, and the classic i.i.d. (independent and identically distributed), or data exchangeability assumption does not hold anymore. In this paper, we present an approximate Bayesian inference approach using variational methods that addresses these issues for conjugate exponential family models with latent variables. Our proposal makes use of a novel scheme based on hierarchical priors to explicitly model temporal changes of the model parameters. We show how this approach induces an exponential forgetting mechanism with adaptive forgetting rates. The method is able to capture the smoothness of the concept drift, ranging from no drift to abrupt drift. The proposed variational inference scheme maintains the computational efficiency of variational methods over conjugate models, which is critical in streaming settings. The approach is validated on four different domains (energy, finance, geolocation, and text) using four real-world data sets.


Author(s):  
SHUO WANG ◽  
LEANDRO L. MINKU ◽  
XIN YAO

Although class imbalance learning and online learning have been extensively studied in the literature separately, online class imbalance learning that considers the challenges of both fields has not drawn much attention. It deals with data streams having very skewed class distributions, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. To fill in this research gap and contribute to a wide range of real-world applications, this paper first formulates online class imbalance learning problems. Based on the problem formulation, a new online learning algorithm, sampling-based online bagging (SOB), is proposed to tackle class imbalance adaptively. Then, we study how SOB and other state-of-the-art methods can benefit a class of fault detection data under various scenarios and analyze their performance in depth. Through extensive experiments, we find that SOB can balance the performance between classes very well across different data domains and produce stable G-mean when learning constantly imbalanced data streams, but it is sensitive to sudden changes in class imbalance, in which case SOB's predecessor undersampling-based online bagging (UOB) is more robust.


2018 ◽  
Author(s):  
Peter C. St. John ◽  
Jonathan Strutz ◽  
Linda J. Broadbelt ◽  
Keith E.J. Tyo ◽  
Yannick J. Bomble

SummaryModern biological tools generate a wealth of data on metabolite and protein concentrations that can be used to help inform new strain designs. However, integrating these data sources to generate predictions of steady-state metabolism typically requires a kinetic description of the enzymatic reactions that occur within a cell. Parameterizing these kinetic models from biological data can be computationally difficult, especially as the amount of data increases. Robust methods must also be able to quantify the uncertainty in model parameters as a function of the available data, which can be particularly computationally intensive. The field of Bayesian inference offers a wide range of methods for estimating distributions in parameter uncertainty. However, these techniques are poorly suited to kinetic metabolic modeling due to the complex kinetic rate laws typically employed and the resulting dynamic system that must be solved. In this paper, we employ linear-logarithmic kinetics to simplify the calculation of steady-state flux distributions and enable efficient sampling and variational inference methods. We demonstrate that detailed information on the posterior distribution of kinetic model parameters can be obtained efficiently at a variety of different problem scales, including large-scale kinetic models trained on multiomics datasets. These results allow modern Bayesian machine learning tools to be leveraged in understanding biological data and developing new, efficient strain designs.


2018 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Timothy G. Vaughan ◽  
Tanja Stadler

AbstractHeterogeneous populations can lead to important differences in birth and death rates across a phylogeny Taking this heterogeneity into account is thus critical to obtain accurate estimates of the underlying population dynamics. We present a new multi-state birth-death model (MSBD) that can estimate lineage-specific birth and death rates. For species phylogenies, this corresponds to estimating lineage-dependent speciation and extinction rates. Contrary to existing models, we do not require a prior hypothesis on a trait driving the rate differences and we allow the same rates to be present in different parts of the phylogeny. Using simulated datasets, we show that the MSBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a re-analysis of two empirical datasets and compare the results obtained by MSBD and by the existing software BAMM. The MSBD model is implemented as a package in the Bayesian inference software BEAST2, which allows joint inference of the phylogeny and the model parameters.Significance statementPhylogenetic trees can inform about the underlying speciation and extinction processes within a species clade. Many different factors, for instance environmental changes or morphological changes, can lead to differences in macroevolutionary dynamics within a clade. We present here a new multi-state birth-death (MSBD) model that can detect these differences and estimate both the position of changes in the tree and the associated macroevolutionary parameters. The MSBD model does not require a prior hypothesis on which trait is driving the changes in dynamics and is thus applicable to a wide range of datasets. It is implemented as an extension to the existing framework BEAST2.


Author(s):  
Andrew W. Nelson ◽  
Arif S. Malik ◽  
John C. Wendel ◽  
Mark E. Zipf

A primary factor in manufacturing high-quality cold-rolled sheet is the ability to accurately predict the required rolling force. Rolling force directly influences roll-stack deflections, which correlate to strip thickness profile and flatness. Accurate rolling force predictions enable assignment of efficient pass schedules and appropriate flatness actuator set-points, thereby reducing rolling time, improving quality, and reducing scrap. Traditionally, force predictions in cold rolling have employed deterministic, two-dimensional analytical models such as those proposed by Roberts and Bland and Ford. These simplified methods are prone to inaccuracy, however, because of several uncertain, yet influential, model parameters that cannot be established deterministically under diverse cold rolling conditions. Typical uncertain model parameters include the material's strength coefficient, strain-hardening exponent, strain-rate dependency, and the roll-bite friction characteristics at low and high mill speeds. Conventionally, such parameters are evaluated deterministically by comparing force predictions to force measurements and employing a best-fit regression approach. In this work, Bayesian inference is applied to identify posterior probability distributions of the uncertain parameters in rolling force models. The aim is to incorporate Bayesian inference into rolling force prediction for cold rolling mills to create a probabilistic modeling approach that learns as new data are added. The rolling data are based on stainless steel types 301 and 304, rolled on a 10-in. wide, 4-high production cold mill. Force data were collected by observing load-cell measurements at steady rolling speeds for four coils. Several studies are performed in this work to investigate the probabilistic learning capability of the Bayesian inference approach. These include studies to examine learning from repeated rolling passes, from passes of diverse coils, and by assuming uniform prior probabilities when changing materials. It is concluded that the Bayesian updating approach is useful for improving rolling force probability estimates as evidence is introduced in the form of additional rolling data. Evaluation of learning behavior implies that data from sequential groups of coils having similar gauge and material is important for practical implementation of Bayesian updating in cold rolling.


2017 ◽  
Vol 14 (134) ◽  
pp. 20170340 ◽  
Author(s):  
Aidan C. Daly ◽  
Jonathan Cooper ◽  
David J. Gavaghan ◽  
Chris Holmes

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.


Genetics ◽  
2000 ◽  
Vol 156 (1) ◽  
pp. 457-467 ◽  
Author(s):  
Z W Luo ◽  
S H Tao ◽  
Z-B Zeng

Abstract Three approaches are proposed in this study for detecting or estimating linkage disequilibrium between a polymorphic marker locus and a locus affecting quantitative genetic variation using the sample from random mating populations. It is shown that the disequilibrium over a wide range of circumstances may be detected with a power of 80% by using phenotypic records and marker genotypes of a few hundred individuals. Comparison of ANOVA and regression methods in this article to the transmission disequilibrium test (TDT) shows that, given the genetic variance explained by the trait locus, the power of TDT depends on the trait allele frequency, whereas the power of ANOVA and regression analyses is relatively independent from the allelic frequency. The TDT method is more powerful when the trait allele frequency is low, but much less powerful when it is high. The likelihood analysis provides reliable estimation of the model parameters when the QTL variance is at least 10% of the phenotypic variance and the sample size of a few hundred is used. Potential use of these estimates in mapping the trait locus is also discussed.


2021 ◽  
Vol 9 (4) ◽  
pp. 839
Author(s):  
Muhammad Rafiullah Khan ◽  
Vanee Chonhenchob ◽  
Chongxing Huang ◽  
Panitee Suwanamornlert

Microorganisms causing anthracnose diseases have a medium to a high level of resistance to the existing fungicides. This study aimed to investigate neem plant extract (propyl disulfide, PD) as an alternative to the current fungicides against mango’s anthracnose. Microorganisms were isolated from decayed mango and identified as Colletotrichum gloeosporioides and Colletotrichum acutatum. Next, a pathogenicity test was conducted and after fulfilling Koch’s postulates, fungi were reisolated from these symptomatic fruits and we thus obtained pure cultures. Then, different concentrations of PD were used against these fungi in vapor and agar diffusion assays. Ethanol and distilled water were served as control treatments. PD significantly (p ≤ 0.05) inhibited more of the mycelial growth of these fungi than both controls. The antifungal activity of PD increased with increasing concentrations. The vapor diffusion assay was more effective in inhibiting the mycelial growth of these fungi than the agar diffusion assay. A good fit (R2, 0.950) of the experimental data in the Gompertz growth model and a significant difference in the model parameters, i.e., lag phase (λ), stationary phase (A) and mycelial growth rate, further showed the antifungal efficacy of PD. Therefore, PD could be the best antimicrobial compound against a wide range of microorganisms.


Sign in / Sign up

Export Citation Format

Share Document