multivariate gaussian distribution
Recently Published Documents


TOTAL DOCUMENTS

61
(FIVE YEARS 27)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 3 (4) ◽  
pp. 417-434
Author(s):  
Kfir Eliaz ◽  
Ran Spiegler ◽  
Yair Weiss

Beliefs and decisions are often based on confronting models with data. What is the largest “fake” correlation that a misspecified model can generate, even when it passes an elementary misspecification test? We study an “analyst” who fits a model, represented by a directed acyclic graph, to an objective (multivariate) Gaussian distribution. We characterize the maximal estimated pairwise correlation for generic Gaussian objective distributions, subject to the constraint that the estimated model preserves the marginal distribution of any individual variable. As the number of model variables grows, the estimated correlation can become arbitrarily close to one regardless of the objective correlation. (JEL D83, C13, C46, C51)


2021 ◽  
Author(s):  
Aayush Gupta ◽  
Souvik Dey ◽  
Huan-Xiang Zhou

Artificial intelligence recently achieved the breakthrough of predicting the three-dimensional structures of proteins. The next frontier is presented by intrinsically disordered proteins (IDPs), which, representing 30% to 50% of proteomes, readily access vast conformational space. Molecular dynamics (MD) simulations are promising in sampling IDP conformations, but only at extremely high computational cost. Here, we developed generative autoencoders that learn from short MD simulations and generate full conformational ensembles. An encoder represents IDP conformations as vectors in a reduced-dimensional latent space. The mean vector and covariance matrix of the training dataset are calculated to define a multivariate Gaussian distribution, from which vectors are sampled and fed to a decoder to generate new conformations. The ensembles of generated conformations cover those sampled by long MD simulations and are validated by small-angle X-ray scattering profile and NMR chemical shifts. This work illustrates the vast potential of artificial intelligence in conformational mining of IDPs.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1071
Author(s):  
Jiancheng Sun ◽  
Zhinan Wu ◽  
Si Chen ◽  
Huimin Niu ◽  
Zongqing Tu

Time series analysis has been an important branch of information processing, and the conversion of time series into complex networks provides a new means to understand and analyze time series. In this work, using Variational Auto-Encode (VAE), we explored the construction of latent networks for univariate time series. We first trained the VAE to obtain the space of latent probability distributions of the time series and then decomposed the multivariate Gaussian distribution into multiple univariate Gaussian distributions. By measuring the distance between univariate Gaussian distributions on a statistical manifold, the latent network construction was finally achieved. The experimental results show that the latent network can effectively retain the original information of the time series and provide a new data structure for the downstream tasks.


2021 ◽  
Vol 11 (9) ◽  
pp. 3923
Author(s):  
Kwangsub Song ◽  
Tae-Jun Park ◽  
Joon-Hyuk Chang

In this paper, we propose a novel data augmentation technique employing multivariate Gaussian distribution (DA-MGD) for neural network (NN)-based blood pressure (BP) estimation, which incorporates the relationship between the features in a multi-dimensional feature vector to describe the correlated real-valued random variables successfully. To verify the proposed algorithm against the conventional algorithm, we compare the results in terms of mean error (ME) with standard deviation and Pearson correlation using 110 subjects contributed to the database (DB) which includes the systolic BP (SBP), diastolic BP (DBP), photoplethysmography (PPG) signal, and electrocardiography (ECG) signal. For each subject, 3 times (or 6 times) measurements are accomplished in which the PPG and ECG signals are recorded for 20 s. And, to compare with the performance of the BP estimation (BPE) using the data augmentation algorithms, we train the BPE model using the two-stage system, called the stacked NN. Since the proposed algorithm can express properly the correlation between the features than the conventional algorithm, the errors turn out lower compared to the conventional algorithm, which shows the superiority of our approach.


Author(s):  
Catherine E. Finkenbiner ◽  
Stephen P. Good ◽  
Scott T. Allen ◽  
Richard P. Fiorella ◽  
Gabriel J. Bowen

AbstractSampling intervals of precipitation geochemistry measurements are often coarser than those required by fine-scale hydrometeorological models. This study presents a statistical method to temporally downscale geochemical tracer signals in precipitation so that they can be used in high-resolution, tracer-enabled applications. In this method, we separated the deterministic component of the time series and the remaining daily stochastic component, which was approximated by a conditional multivariate Gaussian distribution. Specifically, statistics of the stochastic component could be explained from coarser data using a newly identified power-law decay function, which relates data aggregation intervals to changes in tracer concentration variance and correlations with precipitation amounts. These statistics were used within a copula framework to generate synthetic tracer values from the deterministic and stochastic time series components based on daily precipitation amounts. The method was evaluated at 27 sites located worldwide using daily precipitation isotope ratios, which were aggregated in time to provide low resolution testing datasets with known daily values. At each site, the downscaling method was applied on weekly, biweekly and monthly aggregated series to yield an ensemble of daily tracer realizations. Daily tracer concentrations downscaled from a biweekly series had average (+/- standard deviation) absolute errors of 1.69‰ (1.61‰) for δ2H and 0.23‰ (0.24‰) for δ18O relative to observations. The results suggest coarsely sampled precipitation tracers can be accurately downscaled to daily values. This method may be extended to other geochemical tracers in order to generate downscaled datasets needed to drive complex, fine-scale models of hydrometeorological processes.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 420
Author(s):  
Enrique G. Rodrigo ◽  
Juan C. Alfaro ◽  
Juan A. Aledo ◽  
José A. Gámez

The goal of the Label Ranking (LR) problem is to learn preference models that predict the preferred ranking of class labels for a given unlabeled instance. Different well-known machine learning algorithms have been adapted to deal with the LR problem. In particular, fine-tuned instance-based algorithms (e.g., k-nearest neighbors) and model-based algorithms (e.g., decision trees) have performed remarkably well in tackling the LR problem. Probabilistic Graphical Models (PGMs, e.g., Bayesian networks) have not been considered to deal with this problem because of the difficulty of modeling permutations in that framework. In this paper, we propose a Hidden Naive Bayes classifier (HNB) to cope with the LR problem. By introducing a hidden variable, we can design a hybrid Bayesian network in which several types of distributions can be combined: multinomial for discrete variables, Gaussian for numerical variables, and Mallows for permutations. We consider two kinds of probabilistic models: one based on a Naive Bayes graphical structure (where only univariate probability distributions are estimated for each state of the hidden variable) and another where we allow interactions among the predictive attributes (using a multivariate Gaussian distribution for the parameter estimation). The experimental evaluation shows that our proposals are competitive with the start-of-the-art algorithms in both accuracy and in CPU time requirements.


2021 ◽  
Author(s):  
J. Emmanuel Johnson ◽  
Maria Piles ◽  
Valero Laparra ◽  
Gustau Camps-Valls

<p>Long-standing questions in multivariate statistics, information theory and machine learning reduce to estimating multivariate densities. However, this is still an unresolved problem and one of the biggest challenge in general, and for Earth system data analysis in particular, due to the high dimensionality (spatial, temporal and/or spectral) of the data streams. Gaussianization is a class of generative models (normalizing flows) that is effective in computing density estimates by using  a sequence of composite invertible transformations which transform data from its original domain to a multivariate Gaussian distribution. The methodology in turn allows us to estimate information theory measures (ITMs), which are relevant for the analysis and characterization of Earth system data superseding the mean, variance and correlation, as higher order measures, thereby capturing more complexity and providing more insight into various problems. We show that our Rotation-Based Iterative Gaussianization (RBIG) method allows us to compute ITMs from multivariate (spatio-spectral-temporal) Earth data efficiently in both computation and memory terms, directly from the Gaussianizing transformation, while being robust to data dimensionality . We demonstrate how Gaussianization is useful in various Earth observation data analysis problems, from hyperspectral image analysis to drought detection in data cubes.</p>


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1460
Author(s):  
Sijie Lin ◽  
Ke Xu ◽  
Hui Feng ◽  
Bo Hu

Graph signal sampling has been widely studied in recent years, but the accurate signal models required by most of the existing sampling methods are usually unavailable prior to any observations made in a practical environment. In this paper, a sequential sampling and estimation algorithm is proposed for approximately bandlimited graph signals, in the absence of prior knowledge concerning signal properties. We approach the problem from a Bayesian perspective in which we formulate the signal prior by a multivariate Gaussian distribution with unknown hyperparameters. To overcome the interconnected problems associated with the parameter estimation, in the proposed algorithm, hyperparameter estimation and sample selection are performed in an alternating way. At each step, the unknown hyperparameters are updated by an expectation maximization procedure based on historical observations, and then the next node in the sampling operation is chosen by uncertainty sampling with the latest hyperparameters. We prove that under some specific conditions, signal estimation in the proposed algorithm is consistent. Subsequent validation of the approach through simulations shows that the proposed procedure yields performances which are significantly better than existing state-of-the-art approaches notwithstanding the additional attribute of robustness in the presence of a broad range of signal attributes.


2021 ◽  
pp. 1-24
Author(s):  
Zihan Wang ◽  
Hongyi Xu

Abstract The complex topological characteristics of network-like structural systems, such as lattice structures, cellular metamaterials, and mass transport networks, pose a great challenge for uncertainty quantification (UQ). Existing UQ approaches are only applicable to parametric uncertainties or high dimensional random quantities distributed in a simply connected space (e.g., line section, rectangular area, etc.). Those methods do not consider the topological characteristics of the spatial domain. To resolve this issue, a network distance-based Gaussian random process UQ approach is proposed. By representing the topological input space as a node-edge network, the network distance is employed to replace the Euclidean distance in characterizing the spatial correlations. Furthermore, a conditional simulation-based sampling approach is proposed for generating realizations from the UQ model. Network node values are modeled by a multivariate Gaussian distribution, and the network edge values are simulated conditionally on the node values and the known network edge values. The effectiveness of the proposed approach is demonstrated on two engineering case studies: thermal conduction analysis of 3D lattice structures with stochastic properties, and characterization of the distortion patterns of additively manufactured cellular structures.


Sign in / Sign up

Export Citation Format

Share Document