scholarly journals Scalable Bayesian Non-linear Matrix Completion

Author(s):  
Xiangju Qin ◽  
Paul Blomstedt ◽  
Samuel Kaski

Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed. A standard solution is matrix factorization, which predicts unobserved entries as linear combinations of latent variables. We generalize to non-linear combinations in massive-scale matrices. Bayesian approaches have been proven beneficial in linear matrix completion, but not applied in the more general non-linear case, due to limited scalability. We introduce a Bayesian non-linear matrix completion algorithm, which is based on a recent Bayesian formulation of Gaussian process latent variable models. To solve the challenges regarding scalability and computation, we propose a data-parallel distributed computational approach with a restricted communication scheme. We evaluate our method on challenging out-of-matrix prediction tasks using both simulated and real-world data.

Author(s):  
Antonino Staiano ◽  
Lara De Vinco ◽  
Giuseppe Longo ◽  
Roberto Tagliaferri

Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which seem to be able to overcome most of the shortcomings of other neural tools. PPS builds a probability density function of a given set of patterns lying in a high-dimensional space which can be expressed in terms of a fixed number of latent variables lying in a latent Q-dimensional space. Usually, the Q-space is either two or three dimensional and thus the density function can be used to visualize the data within it. The case in which Q = 3 allows to project the patterns on a spherical manifold which turns out to be optimal when dealing with sparse data. PPS may also be arranged in ensembles to tackle complex classification tasks. As template cases we discuss the application of PPS to two real- world data sets from astronomy and genetics.


2018 ◽  
Vol 77 ◽  
pp. 378-394 ◽  
Author(s):  
Jicong Fan ◽  
Tommy W.S. Chow

2019 ◽  
Author(s):  
Henrik Kenneth Andersen

This article provides an in-depth look at the method of fixed-effects regression in the structural equation modeling (SEM) framework. It is meant for those who are less familiar with SEM but interested in panel data analysis as well as those familiar with SEM but new to fixed-effects regression. It demonstrates the decomposition of observed variables into within- and between-unit variance components using latent variables and gives an intuitive least squares-based explanation of latent variable estimation. The estimation of the substantive effect coefficients is shown analytically. The procedure is demonstrated on simulated as well as real-world data using the German Family Panel Survey (pairfam). The example analyses show the SEM results are identical to the conventional methods of pooled ordinary least squares on demeaned data. The supplementary materials provide the model code for use in replication and further study.


2008 ◽  
pp. 2067-2087
Author(s):  
Antonino Staiano ◽  
Lara De Vinco ◽  
Giuseppe Longo ◽  
Roberto Tagliaferri

Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which seem to be able to overcome most of the shortcomings of other neural tools. PPS builds a probability density function of a given set of patterns lying in a high-dimensional space which can be expressed in terms of a fixed number of latent variables lying in a latent Q-dimensional space. Usually, the Q-space is either two or three dimensional and thus the density function can be used to visualize the data within it. The case in which Q = 3 allows to project the patterns on a spherical manifold which turns out to be optimal when dealing with sparse data. PPS may also be arranged in ensembles to tackle complex classification tasks. As template cases we discuss the application of PPS to two real- world data sets from astronomy and genetics.


IEEE Access ◽  
2017 ◽  
Vol 5 ◽  
pp. 6688-6696 ◽  
Author(s):  
Xing Xu ◽  
Li He ◽  
Huimin Lu ◽  
Atsushi Shimada ◽  
Rin-Ichiro Taniguchi

Methodology ◽  
2011 ◽  
Vol 7 (4) ◽  
pp. 157-164
Author(s):  
Karl Schweizer

Probability-based and measurement-related hypotheses for confirmatory factor analysis of repeated-measures data are investigated. Such hypotheses comprise precise assumptions concerning the relationships among the true components associated with the levels of the design or the items of the measure. Measurement-related hypotheses concentrate on the assumed processes, as, for example, transformation and memory processes, and represent treatment-dependent differences in processing. In contrast, probability-based hypotheses provide the opportunity to consider probabilities as outcome predictions that summarize the effects of various influences. The prediction of performance guided by inexact cues serves as an example. In the empirical part of this paper probability-based and measurement-related hypotheses are applied to working-memory data. Latent variables according to both hypotheses contribute to a good model fit. The best model fit is achieved for the model including latent variables that represented serial cognitive processing and performance according to inexact cues in combination with a latent variable for subsidiary processes.


2019 ◽  
Author(s):  
Kevin Constante ◽  
Edward Huntley ◽  
Emma Schillinger ◽  
Christine Wagner ◽  
Daniel Keating

Background: Although family behaviors are known to be important for buffering youth against substance use, research in this area often evaluates a particular type of family interaction and how it shapes adolescents’ behaviors, when it is likely that youth experience the co-occurrence of multiple types of family behaviors that may be protective. Methods: The current study (N = 1716, 10th and 12th graders, 55% female) examined associations between protective family context, a latent variable comprised of five different measures of family behaviors, and past 12 months substance use: alcohol, cigarettes, marijuana, and e-cigarettes. Results: A multi-group measurement invariance assessment supported protective family context as a coherent latent construct with partial (metric) measurement invariance among Black, Latinx, and White youth. A multi-group path model indicated that protective family context was significantly associated with less substance use for all youth, but of varying magnitudes across ethnic-racial groups. Conclusion: These results emphasize the importance of evaluating psychometric properties of family-relevant latent variables on the basis of group membership in order to draw appropriate inferences on how such family variables relate to substance use among diverse samples.


2021 ◽  
Vol 20 (1) ◽  
pp. 1-15
Author(s):  
Qi Zhang ◽  
Zheng Xu ◽  
Yutong Lai

Abstract Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).


2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


Sign in / Sign up

Export Citation Format

Share Document