scholarly journals Model-based ordination with constrained latent variables

2021 ◽  
Author(s):  
Bert van der Veen ◽  
Francis K.C. Hui ◽  
Knut A. Hovstad ◽  
Robert B. O’Hara

SummaryIn community ecology, unconstrained ordination can be used to predict latent variables from a multivariate dataset, which generated the observed species composition.Latent variables can be understood as ecological gradients, which are represented as a function of measured predictors in constrained ordination, so that ecologists can better relate species composition to the environment while reducing dimensionality of the predictors and the response data.However, existing constrained ordination methods do not explicitly account for information provided by species responses, so that they have the potential to misrepresent community structure if not all predictors are measured.We propose a new method for model-based ordination with constrained latent variables in the Generalized Linear Latent Variable Model framework, which incorporates both measured predictors and residual covariation to optimally represent ecological gradients. Simulations of unconstrained and constrained ordination show that the proposed method outperforms CCA and RDA.

1989 ◽  
Vol 14 (4) ◽  
pp. 335-350 ◽  
Author(s):  
Robert J. Mislevy ◽  
Kathleen M. Sheehan

The Fisher, or expected, information matrix for the parameters in a latent-variable model is bounded from above by the information that would be obtained if the values of the latent variables could also be observed. The difference between this upper bound and the information in the observed data is the “missing information.” This paper explicates the structure of the expected information matrix and related information matrices, and characterizes the degree to which missing information can be recovered by exploiting collateral variables for respondents. The results are illustrated in the context of item response theory models, and practical implications are discussed.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Victoria Savalei ◽  
Steven P. Reise

McNeish (2018) advocates that researchers abandon coefficient alpha in favor of alternative reliability measures, such as the 1-factor reliability (coefficient omega), a total reliability coefficient based on an exploratory bifactor solution (“Revelle’s omega total”), and the glb (“greatest lower bound”). McNeish supports this argument by demonstrating that these coefficients produce higher sample values in several examples. We express three main disagreements with this article. First, we show that McNeish exaggerates the extent to which alpha is different from omega when unidimensionality holds. Second, we argue that, when unidimensionality is violated, most alternative reliability coefficients are model-based, and it is critical to carefully select the underlying latent variable model rather than relying on software defaults. Third, we point out that higher sample reliability values do not necessarily capture population reliability better: many alternative reliability coefficients are upwardly biased except in very large samples. We conclude with a set of alternative recommendations for researchers.


2020 ◽  
pp. 1471082X1989668
Author(s):  
Zhihua Ma ◽  
Guanghui Chen

Motivated by the China Health and Nutrition Survey (CHNS) data, a semiparametric latent variable model with a Dirichlet process (DP) mixtures prior on the latent variable is proposed to jointly analyse mixed binary and continuous responses. Non-ignorable missing covariates are considered through a selection model framework where a missing covariate model and a missing data mechanism model are included. The logarithm of the pseudo-marginal likelihood (LPML) is applied for selecting the priors, and the deviance information criterion measure focusing on the missing data mechanism model only is used for selecting different missing data mechanisms. A Bayesian index of local sensitivity to non-ignorability (ISNI) is extended to explore the local sensitivity of the parameters in our model. A simulation study is carried out to examine the empirical performance of the proposed methodology. Finally, the proposed model and the ISNI index are applied to analyse the CHNS data in the motivating example.


2020 ◽  
Author(s):  
Aditya Arie Nugraha ◽  
Kouhei Sekiguchi ◽  
Kazuyoshi Yoshii

This paper describes a deep latent variable model of speech power spectrograms and its application to semi-supervised speech enhancement with a deep speech prior. By integrating two major deep generative models, a variational autoencoder (VAE) and a normalizing flow (NF), in a mutually-beneficial manner, we formulate a flexible latent variable model called the NF-VAE that can extract low-dimensional latent representations from high-dimensional observations, akin to the VAE, and does not need to explicitly represent the distribution of the observations, akin to the NF. In this paper, we consider a variant of NF called the generative flow (GF a.k.a. Glow) and formulate a latent variable model called the GF-VAE. We experimentally show that the proposed GF-VAE is better than the standard VAE at capturing fine-structured harmonics of speech spectrograms, especially in the high-frequency range. A similar finding is also obtained when the GF-VAE and the VAE are used to generate speech spectrograms from latent variables randomly sampled from the standard Gaussian distribution. Lastly, when these models are used as speech priors for statistical multichannel speech enhancement, the GF-VAE outperforms the VAE and the GF.


2019 ◽  
Author(s):  
Kathleen Gates ◽  
Kenneth Bollen ◽  
Zachary F. Fisher

Researchers across many domains of psychology increasingly wish to arrive at personalized and generalizable dynamic models of individuals’ processes. This is seen in psychophysiological, behavioral, and emotional research paradigms, across a range of data types. Errors of measurement are inherent in most data. For this reason, researchers typically gather multiple indicators of the same latent construct and use methods, such as factor analysis, to arrive at scores from these indices. In addition to accurately measuring individuals, researchers also need to find the model that best describes the relations among the latent constructs. Most currently available data-driven searches do not include latent variables. We present an approach that builds from the strong foundations of Group Iterative Multiple Model Estimation (GIMME), the idiographic filter, and model implied instrumental variables with two-stage least squares estimation (MIIV-2SLS) to provide researchers with the option to include latent variables in their data-driven model searches. The resulting approach is called Latent Variable GIMME (LV-GIMME). GIMME is utilized for the data-driven search for relations that exist among latent variables. Unlike other approaches such as the idiographic filter, LV-GIMME does not require that the latent variable model to be constant across individuals. This requirement is loosened by utilizing MIIV-2SLS for estimation. Simulated data studies demonstrate that the method can reliably detect relations among latent constructs, and that latent constructs provide more power to detect effects than using observed variables directly. We use empirical data examples drawn from functional MRI and daily self-report data.


2021 ◽  
Author(s):  
Gordana C. Popovic ◽  
Francis K.C. Hui ◽  
David I. Warton

Visualising data is a vital part of analysis, allowing researchers to find patterns, and assess and communicate the results of statistical modeling. In ecology, visualisation is often challenging when there are many variables (often for different species or other taxonomic groups) and they are not normally distributed (often counts or presence-absence data). Ordination is a common and powerful way to overcome this hurdle by reducing data from many response variables to just two or three, to be easily plotted. Ordination is traditionally done using dissimilarity-based methods, most commonly non-metric multidimensional scaling (nMDS). In the last decade however, model-based methods for unconstrained ordination have gained popularity. These are primarily based on latent variable models, with latent variables estimating the underlying, unobserved ecological gradients. Despite some major benefits, a major drawback of model-based ordination methods is their speed, as they typically taking much longer to return a result than dissimilarity-based methods, especially for large sample sizes. We introduce copula ordination, a new, scalable model-based approach to unconstrained ordination. This method has all the desirable properties of model-based ordination methods, with the added advantage that it is computationally far more efficient. In particular, simulations show copula ordination is an order of magnitude faster than current model-based methods, and can even be faster than nMDS for large sample sizes, while being able to produce similar ordination plots and trends as these methods.


2018 ◽  
Author(s):  
Simon Bang Kristensen ◽  
Kristian Sandberg ◽  
Bo Martin Bibby

AbstractMetacognition is an important component in basic science and clinical psychology, often studied through complex, cognitive experiments. While Signal Detection Theory (SDT) provides a popular and pervasive framework for modelling responses from such experiments, a shortfall remains that it cannot in a straightforward manner account for the often complex designs. Additionally, SDT does not provide direct estimates of metacognitive ability. This latter shortcoming has recently been sought remedied by introduction of a measure for metacognitive sensitivity dubbed meta-d’. The need for a flexible regression model framework remains, however, which should also incorporate the new sensitivity measure. In the present paper, we argue that a straightforward extension of SDT is obtained by identifying the model with the proportional odds model, a widely implemented, ordinal regression technique. We go on to develop a formal statistical framework for metacognitive sensitivity by defining a model that combines standard SDT with meta-d’ in a latent variable model. We show how this agrees with the literature on meta-d’ and constitutes a practical framework for extending the model. We supply several teoretical considerations on the model, including closed-form approximate estimates of meta-d’ and optimal weighing of response-specific meta-sensitivities. We discuss regression analysis as an application of the obtained model and illustrate our points through simulations. Lastly, we presentR-software that implements the model. Our methods and their implementation extend the computational possibilities of SDT and meta-dand are useful for theoretical and practical researchers of metacognition.


2005 ◽  
Vol 2 (2) ◽  
Author(s):  
Cinzia Viroli

Independent Factor Analysis (IFA) has recently been proposed in the signal processing literature as a way to model a set of observed variables through linear combinations of hidden independent ones plus a noise term. Despite the peculiarity of its origin the method can be framed within the latent variable model domain and some parallels with the ordinary factor analysis can be drawn. If no prior information on the latent structure is available a relevant issue concerns the correct specification of the model. In this work some methods to detect the number of significant latent variables are investigated. Moreover, since the method defines a probability density function for the latent variables by mixtures of gaussians, the correct number of mixture components must also be determined. This issue will be treated according to two main approaches. The first one amounts to carry out a likelihood ratio test. The other one is based on a penalized form of the likelihood, that leads to the so called information criteria. Some simulations and empirical results on real data sets are finally presented.


2020 ◽  
Vol 117 (27) ◽  
pp. 15403-15408
Author(s):  
Lawrence K. Saul

We propose a latent variable model to discover faithful low-dimensional representations of high-dimensional data. The model computes a low-dimensional embedding that aims to preserve neighborhood relationships encoded by a sparse graph. The model both leverages and extends current leading approaches to this problem. Like t-distributed Stochastic Neighborhood Embedding, the model can produce two- and three-dimensional embeddings for visualization, but it can also learn higher-dimensional embeddings for other uses. Like LargeVis and Uniform Manifold Approximation and Projection, the model produces embeddings by balancing two goals—pulling nearby examples closer together and pushing distant examples further apart. Unlike these approaches, however, the latent variables in our model provide additional structure that can be exploited for learning. We derive an Expectation–Maximization procedure with closed-form updates that monotonically improve the model’s likelihood: In this procedure, embeddings are iteratively adapted by solving sparse, diagonally dominant systems of linear equations that arise from a discrete graph Laplacian. For large problems, we also develop an approximate coarse-graining procedure that avoids the need for negative sampling of nonadjacent nodes in the graph. We demonstrate the model’s effectiveness on datasets of images and text.


Author(s):  
Antonino Staiano ◽  
Lara De Vinco ◽  
Giuseppe Longo ◽  
Roberto Tagliaferri

Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which seem to be able to overcome most of the shortcomings of other neural tools. PPS builds a probability density function of a given set of patterns lying in a high-dimensional space which can be expressed in terms of a fixed number of latent variables lying in a latent Q-dimensional space. Usually, the Q-space is either two or three dimensional and thus the density function can be used to visualize the data within it. The case in which Q = 3 allows to project the patterns on a spherical manifold which turns out to be optimal when dealing with sparse data. PPS may also be arranged in ensembles to tackle complex classification tasks. As template cases we discuss the application of PPS to two real- world data sets from astronomy and genetics.


Sign in / Sign up

Export Citation Format

Share Document