scholarly journals Variational Probabilistic Inference and the QMR-DT Network

1999 ◽  
Vol 10 ◽  
pp. 291-322 ◽  
Author(s):  
T. S. Jaakkola ◽  
M. I. Jordan

We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the `Quick Medical Reference' (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.

Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1475
Author(s):  
Marton Havasi ◽  
Jasper Snoek ◽  
Dustin Tran ◽  
Jonathan Gordon ◽  
José Miguel Hernández-Lobato

Variational inference is an optimization-based method for approximating the posterior distribution of the parameters in Bayesian probabilistic models. A key challenge of variational inference is to approximate the posterior with a distribution that is computationally tractable yet sufficiently expressive. We propose a novel method for generating samples from a highly flexible variational approximation. The method starts with a coarse initial approximation and generates samples by refining it in selected, local regions. This allows the samples to capture dependencies and multi-modality in the posterior, even when these are absent from the initial approximation. We demonstrate theoretically that our method always improves the quality of the approximation (as measured by the evidence lower bound). In experiments, our method consistently outperforms recent variational inference methods in terms of log-likelihood and ELBO across three example tasks: the Eight-Schools example (an inference task in a hierarchical model), training a ResNet-20 (Bayesian inference in a large neural network), and the Mushroom task (posterior sampling in a contextual bandit problem).


2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


2021 ◽  
Author(s):  
Béla Kovács ◽  
Márton Pál ◽  
Fanni Vörös

&lt;p&gt;The use of aerial photography in topography has started in the first decades of the 20&lt;sup&gt;th&lt;/sup&gt; century. Remote sensed data have become indispensable for cartographers and GIS staff when doing large-scale mapping: especially topographic, orienteering and thematic maps. The use of UAVs (unmanned aerial vehicles) for this purpose has also become widespread for some years. Various drones and sensors (RGB, multispectral and hyperspectral) with many specifications are used to capture and process the physical properties of an examined area. In parallel with the development of the hardware, new software solutions are emerging to visualize and analyse photogrammetric material: a large set of algorithms with different approaches are available for image processing.&lt;/p&gt;&lt;p&gt;Our study focuses on the large-scale topographic mapping of vegetation and land cover. Most traditional analogue and digital maps use these layers either for background or highlighted thematic purposes. We propose to use the theory of OBIA &amp;#8211; Object-based Image Analysis to differentiate cover types. This method involves pixels to be grouped into larger polygon units based on either spectral or other variables (e.g. elevation, aspect, curvature in case of DEMs). The neighbours of initial seed points are examined whether they should be added to the region according to the similarity of their attributes. Using OBIA, different land cover types (trees, grass, soils, bare rock surfaces) can be distinguished either with supervised or unsupervised classification &amp;#8211; depending on the purposes of the analyst. Our base data were high-resolution RGB and multispectral images (with 5 bands).&lt;/p&gt;&lt;p&gt;Following this methodology, not only elevation data (e.g. shaded relief or vector contour lines) can be derived from UAV imagery but vector land cover data are available for cartographers and GIS analysts. As the number of distinct land cover groups is free to choose, even quite complex thematic layers can be produced. These layers can serve as subjects of further analyses or for cartographic visualization.&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;BK is supported by Application Domain Specific Highly Reliable IT Solutions&amp;#8221; project &amp;#160;has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the Thematic Excellence Programme TKP2020-NKA-06 (National Challenges Subprogramme) funding scheme.&lt;/p&gt;&lt;p&gt;MP and FV are supported by EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies &amp;#8211; The Project is financed by the Hungarian Government and co-financed by the European Social Fund.&lt;/p&gt;


Author(s):  
Kanix Wang ◽  
Walid Hussain ◽  
John R. Birge ◽  
Michael D. Schreiber ◽  
Daniel Adelman

Having an interpretable, dynamic length-of-stay model can help hospital administrators and clinicians make better decisions and improve the quality of care. The widespread implementation of electronic medical record (EMR) systems has enabled hospitals to collect massive amounts of health data. However, how to integrate this deluge of data into healthcare operations remains unclear. We propose a framework grounded in established clinical knowledge to model patients’ lengths of stay. In particular, we impose expert knowledge when grouping raw clinical data into medically meaningful variables that summarize patients’ health trajectories. We use dynamic, predictive models to output patients’ remaining lengths of stay, future discharges, and census probability distributions based on their health trajectories up to the current stay. Evaluated with large-scale EMR data, the dynamic model significantly improves predictive power over the performance of any model in previous literature and remains medically interpretable. Summary of Contribution: The widespread implementation of electronic health systems has created opportunities and challenges to best utilize mounting clinical data for healthcare operations. In this study, we propose a new approach that integrates clinical analysis in generating variables and implementations of computational methods. This approach allows our model to remain interpretable to the medical professionals while being accurate. We believe our study has broader relevance to researchers and practitioners of healthcare operations.


2015 ◽  
Vol 112 (19) ◽  
pp. 6236-6241 ◽  
Author(s):  
Thomas M. Neeson ◽  
Michael C. Ferris ◽  
Matthew W. Diebel ◽  
Patrick J. Doran ◽  
Jesse R. O’Hanley ◽  
...  

In many large ecosystems, conservation projects are selected by a diverse set of actors operating independently at spatial scales ranging from local to international. Although small-scale decision making can leverage local expert knowledge, it also may be an inefficient means of achieving large-scale objectives if piecemeal efforts are poorly coordinated. Here, we assess the value of coordinating efforts in both space and time to maximize the restoration of aquatic ecosystem connectivity. Habitat fragmentation is a leading driver of declining biodiversity and ecosystem services in rivers worldwide, and we simultaneously evaluate optimal barrier removal strategies for 661 tributary rivers of the Laurentian Great Lakes, which are fragmented by at least 6,692 dams and 232,068 road crossings. We find that coordinating barrier removals across the entire basin is nine times more efficient at reconnecting fish to headwater breeding grounds than optimizing independently for each watershed. Similarly, a one-time pulse of restoration investment is up to 10 times more efficient than annual allocations totaling the same amount. Despite widespread emphasis on dams as key barriers in river networks, improving road culvert passability is also essential for efficiently restoring connectivity to the Great Lakes. Our results highlight the dramatic economic and ecological advantages of coordinating efforts in both space and time during restoration of large ecosystems.


Author(s):  
Martin Schreiber ◽  
Pedro S Peixoto ◽  
Terry Haut ◽  
Beth Wingate

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.


1997 ◽  
Vol 50 (3) ◽  
pp. 528-559 ◽  
Author(s):  
Catriona M. Morrison ◽  
Tameron D. Chappell ◽  
Andrew W. Ellis

Studies of lexical processing have relied heavily on adult ratings of word learning age or age of acquisition, which have been shown to be strongly predictive of processing speed. This study reports a set of objective norms derived in a large-scale study of British children's naming of 297 pictured objects (including 232 from the Snodgrass & Vanderwart, 1980, set). In addition, data were obtained on measures of rated age of acquisition, rated frequency, imageability, object familiarity, picture-name agreement, and name agreement. We discuss the relationship between the objective measure and adult ratings of word learning age. Objective measures should be used when available, but where not, our data suggest that adult ratings provide a reliable and valid measure of real word learning age.


Author(s):  
Scott C. Chase

AbstractThe combination of the paradigms of shape algebras and predicate logic representations, used in a new method for describing designs, is presented. First-order predicate logic provides a natural, intuitive way of representing shapes and spatial relations in the development of complete computer systems for reasoning about designs. Shape algebraic formalisms have advantages over more traditional representations of geometric objects. Here we illustrate the definition of a large set of high-level design relations from a small set of simple structures and spatial relations, with examples from the domains of geographic information systems and architecture.


2016 ◽  
Author(s):  
Timothy N. Rubin ◽  
Oluwasanmi Koyejo ◽  
Krzysztof J. Gorgolewski ◽  
Michael N. Jones ◽  
Russell A. Poldrack ◽  
...  

AbstractA central goal of cognitive neuroscience is to decode human brain activity--i.e., to infer mental processes from observed patterns of whole-brain activation. Previous decoding efforts have focused on classifying brain activity into a small set of discrete cognitive states. To attain maximal utility, a decoding framework must be open-ended, systematic, and context-sensitive--i.e., capable of interpreting numerous brain states, presented in arbitrary combinations, in light of prior information. Here we take steps towards this objective by introducing a Bayesian decoding framework based on a novel topic model---Generalized Correspondence Latent Dirichlet Allocation---that learns latent topics from a database of over 11,000 published fMRI studies. The model produces highly interpretable, spatially-circumscribed topics that enable flexible decoding of whole-brain images. Importantly, the Bayesian nature of the model allows one to “seed” decoder priors with arbitrary images and text--enabling researchers, for the first time, to generative quantitative, context-sensitive interpretations of whole-brain patterns of brain activity.


2016 ◽  
Author(s):  
Dominik Paprotny ◽  
Oswaldo Morales Nápoles

Abstract. Large-scale hydrological modelling of flood hazard requires adequate extreme discharge data. Models based on physics are applied alongside those utilizing only statistical analysis. The former requires enormous computation power, while the latter are most limited in accuracy and spatial coverage. In this paper we introduce an alternate, statistical approach based on Bayesian Networks (BN), a graphical model for dependent random variables. We use a non-parametric BN to describe the joint distribution of extreme discharges in European rivers and variables describing the geographical characteristics of their catchments. Data on annual maxima of daily discharges from more than 1800 river gauge stations were collected, together with information on terrain, land use and climate of catchments that drain to those locations. The (conditional) correlations between the variables are modelled through copulas, with the dependency structure defined in the network. The results show that using this method, mean annual maxima and return periods of discharges could be estimated with an accuracy similar to existing studies using physical models for Europe, and better than a comparable global statistical method. Performance of the model varies slightly between regions of Europe, but is consistent between different time periods, and is not affected by a split-sample validation. The BN was applied to a large domain covering all sizes of rivers in the continent, both for present and future climate, showing large variation in influence of climate change on river discharges, as well as large differences between emission scenarios. The method could be used to provide quick estimates of extreme discharges at any location for the purpose of obtaining input information for hydraulic modelling.


Sign in / Sign up

Export Citation Format

Share Document