scholarly journals Bayesian optimization of generalized data

2018 ◽  
Vol 4 ◽  
pp. 30
Author(s):  
Goran Arbanas ◽  
Jinghua Feng ◽  
Zia J. Clifton ◽  
Andrew M. Holcomb ◽  
Marco T. Pigni ◽  
...  

Direct application of Bayes' theorem to generalized data yields a posterior probability distribution function (PDF) that is a product of a prior PDF of generalized data and a likelihood function, where generalized data consists of model parameters, measured data, and model defect data. The prior PDF of generalized data is defined by prior expectation values and a prior covariance matrix of generalized data that naturally includes covariance between any two components of generalized data. A set of constraints imposed on the posterior expectation values and covariances of generalized data via a given model is formally solved by the method of Lagrange multipliers. Posterior expectation values of the constraints and their covariance matrix are conventionally set to zero, leading to a likelihood function that is a Dirac delta function of the constraining equation. It is shown that setting constraints to values other than zero is analogous to introducing a model defect. Since posterior expectation values of any function of generalized data are integrals of that function over all generalized data weighted by the posterior PDF, all elements of generalized data may be viewed as nuisance parameters marginalized by this integration. One simple form of posterior PDF is obtained when the prior PDF and the likelihood function are normal PDFs. For linear models without a defect this PDF becomes equivalent to constrained least squares (CLS) method, that is, the χ2 minimization method.

2011 ◽  
Vol 23 (11) ◽  
pp. 2833-2867 ◽  
Author(s):  
Yi Dong ◽  
Stefan Mihalas ◽  
Alexander Russell ◽  
Ralph Etienne-Cummings ◽  
Ernst Niebur

When a neuronal spike train is observed, what can we deduce from it about the properties of the neuron that generated it? A natural way to answer this question is to make an assumption about the type of neuron, select an appropriate model for this type, and then choose the model parameters as those that are most likely to generate the observed spike train. This is the maximum likelihood method. If the neuron obeys simple integrate-and-fire dynamics, Paninski, Pillow, and Simoncelli ( 2004 ) showed that its negative log-likelihood function is convex and that, at least in principle, its unique global minimum can thus be found by gradient descent techniques. Many biological neurons are, however, known to generate a richer repertoire of spiking behaviors than can be explained in a simple integrate-and-fire model. For instance, such a model retains only an implicit (through spike-induced currents), not an explicit, memory of its input; an example of a physiological situation that cannot be explained is the absence of firing if the input current is increased very slowly. Therefore, we use an expanded model (Mihalas & Niebur, 2009 ), which is capable of generating a large number of complex firing patterns while still being linear. Linearity is important because it maintains the distribution of the random variables and still allows maximum likelihood methods to be used. In this study, we show that although convexity of the negative log-likelihood function is not guaranteed for this model, the minimum of this function yields a good estimate for the model parameters, in particular if the noise level is treated as a free parameter. Furthermore, we show that a nonlinear function minimization method (r-algorithm with space dilation) usually reaches the global minimum.


2017 ◽  
Vol 14 (18) ◽  
pp. 4295-4314 ◽  
Author(s):  
Dan Lu ◽  
Daniel Ricciuto ◽  
Anthony Walker ◽  
Cosmin Safta ◽  
William Munger

Abstract. Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.


2017 ◽  
Vol 146 ◽  
pp. 13010
Author(s):  
Juan Pablo Scotta ◽  
Gilles Noguere ◽  
José Ignacio Marquez Damian

2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


2021 ◽  
Author(s):  
Kevin J. Wischnewski ◽  
Simon B. Eickhoff ◽  
Viktor K. Jirsa ◽  
Oleksandr V. Popovych

Abstract Simulating the resting-state brain dynamics via mathematical whole-brain models requires an optimal selection of parameters, which determine the model’s capability to replicate empirical data. Since the parameter optimization via a grid search (GS) becomes unfeasible for high-dimensional models, we evaluate several alternative approaches to maximize the correspondence between simulated and empirical functional connectivity. A dense GS serves as a benchmark to assess the performance of four optimization schemes: Nelder-Mead Algorithm (NMA), Particle Swarm Optimization (PSO), Covariance Matrix Adaptation Evolution Strategy (CMAES) and Bayesian Optimization (BO). To compare them, we employ an ensemble of coupled phase oscillators built upon individual empirical structural connectivity of 105 healthy subjects. We determine optimal model parameters from two- and three-dimensional parameter spaces and show that the overall fitting quality of the tested methods can compete with the GS. There are, however, marked differences in the required computational resources and stability properties, which we also investigate before proposing CMAES and BO as efficient alternatives to a high-dimensional GS. For the three-dimensional case, these methods generated similar results as the GS, but within less than 6% of the computation time. Our results contribute to an efficient validation of models for personalized simulations of brain dynamics.


Genes ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 608 ◽  
Author(s):  
Yan Li ◽  
Junyi Li ◽  
Naizheng Bian

Identifying associations between lncRNAs and diseases can help understand disease-related lncRNAs and facilitate disease diagnosis and treatment. The dual-network integrated logistic matrix factorization (DNILMF) model has been used for drug–target interaction prediction, and good results have been achieved. We firstly applied DNILMF to lncRNA–disease association prediction (DNILMF-LDA). We combined different similarity kernel matrices of lncRNAs and diseases by using nonlinear fusion to extract the most important information in fused matrices. Then, lncRNA–disease association networks and similarity networks were built simultaneously. Finally, the Gaussian process mutual information (GP-MI) algorithm of Bayesian optimization was adopted to optimize the model parameters. The 10-fold cross-validation result showed that the area under receiving operating characteristic (ROC) curve (AUC) value of DNILMF-LDA was 0.9202, and the area under precision-recall (PR) curve (AUPR) was 0.5610. Compared with LRLSLDA, SIMCLDA, BiwalkLDA, and TPGLDA, the AUC value of our method increased by 38.81%, 13.07%, 8.35%, and 6.75%, respectively. The AUPR value of our method increased by 52.66%, 40.05%, 37.01%, and 44.25%. These results indicate that DNILMF-LDA is an effective method for predicting the associations between lncRNAs and diseases.


2017 ◽  
Vol 6 (3) ◽  
pp. 75
Author(s):  
Tiago V. F. Santana ◽  
Edwin M. M. Ortega ◽  
Gauss M. Cordeiro ◽  
Adriano K. Suzuki

A new regression model based on the exponentiated Weibull with the structure distribution and the structure of the generalized linear model, called the generalized exponentiated Weibull linear model (GEWLM), is proposed. The GEWLM is composed by three important structural parts: the random component, characterized by the distribution of the response variable; the systematic component, which includes the explanatory variables in the model by means of a linear structure; and a link function, which connects the systematic and random parts of the model. Explicit expressions for the logarithm of the likelihood function, score vector and observed and expected information matrices are presented. The method of maximum likelihood and a Bayesian procedure are adopted for estimating the model parameters. To detect influential observations in the new model, we use diagnostic measures based on the local influence and Bayesian case influence diagnostics. Also, we show that the estimates of the GEWLM are  robust to deal with the presence of outliers in the data. Additionally, to check whether the model supports its assumptions, to detect atypical observations and to verify the goodness-of-fit of the regression model, we define residuals based on the quantile function, and perform a Monte Carlo simulation study to construct confidence bands from the generated envelopes. We apply the new model to a dataset from the insurance area.


Author(s):  
Byeng D. Youn ◽  
Byung C. Jung ◽  
Zhimin Xi ◽  
Sang Bum Kim

As the role of predictive models has increased, the fidelity of computational results has been of great concern to engineering decision makers. Often our limited understanding of complex systems leads to building inappropriate predictive models. To address a growing concern about the fidelity of the predictive models, this paper proposes a hierarchical model validation procedure with two validation activities: (1) validation planning (top-down) and (2) validation execution (bottom-up). In the validation planning, engineers define either the physics-of-failure (PoF) mechanisms or the system performances of interest. Then, the engineering system is decomposed into subsystems or components of which computer models are partially valid in terms of PoF mechanisms or system performances of interest. Validation planning will identify vital tests and predictive models along with both known and unknown model parameter(s). The validation execution takes a bottom-up approach, improving the fidelity of the computer model at any hierarchical level using a statistical calibration technique. This technique compares the observed test results with the predicted results from the computer model. A likelihood function is used for the comparison metric. In the statistical calibration, an optimization technique is employed to maximize the likelihood function while determining the unknown model parameters. As the predictive model at a lower hierarchy level becomes valid, the valid model is fused into a model at a higher hierarchy level. The validation execution is then continued for the model at the higher hierarchy level. A cellular phone is used to demonstrate the hierarchical validation of predictive models presented in this paper.


Sign in / Sign up

Export Citation Format

Share Document