scholarly journals Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Beatriz Galindo-Prieto ◽  
Paul Geladi ◽  
Johan Trygg

Abstract Background For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIPOPLS or VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpretation enhancement of PLS, OPLS and O2PLS models. For multiblock analysis, the OnPLS models find relationships among multiple data matrices (more than two blocks) by calculating latent variables; however, a method for improving the interpretation of these latent variables (model components) by assessing the importance of the input variables was not available up to now. Results A method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry. Conclusions We provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.

2017 ◽  
Vol 14 (4) ◽  
pp. 539
Author(s):  
Yuris Danilwan

In general, the study aims to see how far the effective use of policy implementation disbursements School Operational Assistance (BOS), which has been implemented so far in order to free tuition. The research involves several elements like: PIU Office Level II, Personnel at Bank dealer, School Committee, Principal and Students who are in districts north Sumatra province. This research also involves a number of factors thought to be determinants of implementation effectiveness of the School Operational Assistance (BOS) in the field. Statistical methods used are modeling Structural Equation Modeling (SEM). This research was carried out in Medan. Location of the study Elementary and Junior High School.  Respondents totaled 554 respondents. The results showed that: a). All factors considered valid or have a significant influence on the formation of each latent variable, namely: latent variable input, process, output and outcome. b). The amount of the direct influence of input variables to process variables of 0.83. While the contribution of 68.89%. c). There is a direct influence of input variables on output variables. The amount of the direct influence of input variables on output variables of 0.21. While the direct contribution given by the input variables on output variables of 4.41%. d). There are no direct influence on the input variables Outcome variables. The amount of indirect effect through the Input variable Output variable to the outcome variable that is equal to 0.162. While the contributions made by variable input through output variables Outcome variables at 2.61%. e). There is an indirect effect through the variable Input Variable Process and proceed through a variable output to outcome variables. The amount of indirect effect through the variable process input variables and proceed through a variable output to outcome variables of 0.50. Contributions made 25.49%. f). The influence of each factor formed on the latent variables of input, process and output of the factors increasing the value of education and National Final Test (UAN).


2018 ◽  
Vol 14 (4) ◽  
pp. 539-554
Author(s):  
Yuris Danilwan

In general, the study aims to see how far the effective use of policy implementation disbursements School Operational Assistance (BOS), which has been implemented so far in order to free tuition. The research involves several elements like: PIU Office Level II, Personnel at Bank dealer, School Committee, Principal and Students who are in districts north Sumatra province. This research also involves a number of factors thought to be determinants of implementation effectiveness of the School Operational Assistance (BOS) in the field. Statistical methods used are modeling Structural Equation Modeling (SEM). This research was carried out in Medan. Location of the study Elementary and Junior High School. Respondents totaled 554 respondents. The results showed that: a). All factors considered valid or have a significant influence on the formation of each latent variable, namely: latent variable input, process, output and outcome. b). The amount of the direct influence of input variables to process variables of 0.83. While the contribution of 68.89%. c). There is a direct influence of input variables on output variables. The amount of the direct influence of input variables on output variables of 0.21. While the direct contribution given by the input variables on output variables of 4.41%. d). There are no direct influence on the input variables Outcome variables. The amount of indirect effect through the Input variable Output variable to the outcome variable that is equal to 0.162. While the contributions made by variable input through output variables Outcome variables at 2.61%. e). There is an indirect effect through the variable Input Variable Process and proceed through a variable output to outcome variables. The amount of indirect effect through the variable process input variables and proceed through a variable output to outcome variables of 0.50. Contributions made 25.49%. f). The influence of each factor formed on the latent variables of input, process and output of the factors increasing the value of education and National Final Test (UAN).


Methodology ◽  
2011 ◽  
Vol 7 (4) ◽  
pp. 157-164
Author(s):  
Karl Schweizer

Probability-based and measurement-related hypotheses for confirmatory factor analysis of repeated-measures data are investigated. Such hypotheses comprise precise assumptions concerning the relationships among the true components associated with the levels of the design or the items of the measure. Measurement-related hypotheses concentrate on the assumed processes, as, for example, transformation and memory processes, and represent treatment-dependent differences in processing. In contrast, probability-based hypotheses provide the opportunity to consider probabilities as outcome predictions that summarize the effects of various influences. The prediction of performance guided by inexact cues serves as an example. In the empirical part of this paper probability-based and measurement-related hypotheses are applied to working-memory data. Latent variables according to both hypotheses contribute to a good model fit. The best model fit is achieved for the model including latent variables that represented serial cognitive processing and performance according to inexact cues in combination with a latent variable for subsidiary processes.


2019 ◽  
Author(s):  
Kevin Constante ◽  
Edward Huntley ◽  
Emma Schillinger ◽  
Christine Wagner ◽  
Daniel Keating

Background: Although family behaviors are known to be important for buffering youth against substance use, research in this area often evaluates a particular type of family interaction and how it shapes adolescents’ behaviors, when it is likely that youth experience the co-occurrence of multiple types of family behaviors that may be protective. Methods: The current study (N = 1716, 10th and 12th graders, 55% female) examined associations between protective family context, a latent variable comprised of five different measures of family behaviors, and past 12 months substance use: alcohol, cigarettes, marijuana, and e-cigarettes. Results: A multi-group measurement invariance assessment supported protective family context as a coherent latent construct with partial (metric) measurement invariance among Black, Latinx, and White youth. A multi-group path model indicated that protective family context was significantly associated with less substance use for all youth, but of varying magnitudes across ethnic-racial groups. Conclusion: These results emphasize the importance of evaluating psychometric properties of family-relevant latent variables on the basis of group membership in order to draw appropriate inferences on how such family variables relate to substance use among diverse samples.


2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3137
Author(s):  
Amine Tadjer ◽  
Reider B. Bratvold ◽  
Remus G. Hanea

Production forecasting is the basis for decision making in the oil and gas industry, and can be quite challenging, especially in terms of complex geological modeling of the subsurface. To help solve this problem, assisted history matching built on ensemble-based analysis such as the ensemble smoother and ensemble Kalman filter is useful in estimating models that preserve geological realism and have predictive capabilities. These methods tend, however, to be computationally demanding, as they require a large ensemble size for stable convergence. In this paper, we propose a novel method of uncertainty quantification and reservoir model calibration with much-reduced computation time. This approach is based on a sequential combination of nonlinear dimensionality reduction techniques: t-distributed stochastic neighbor embedding or the Gaussian process latent variable model and clustering K-means, along with the data assimilation method ensemble smoother with multiple data assimilation. The cluster analysis with t-distributed stochastic neighbor embedding and Gaussian process latent variable model is used to reduce the number of initial geostatistical realizations and select a set of optimal reservoir models that have similar production performance to the reference model. We then apply ensemble smoother with multiple data assimilation for providing reliable assimilation results. Experimental results based on the Brugge field case data verify the efficiency of the proposed approach.


Omega ◽  
2021 ◽  
pp. 102479
Author(s):  
Zhongbao Zhou ◽  
Meng Gao ◽  
Helu Xiao ◽  
Rui Wang ◽  
Wenbin Liu

1989 ◽  
Vol 14 (4) ◽  
pp. 335-350 ◽  
Author(s):  
Robert J. Mislevy ◽  
Kathleen M. Sheehan

The Fisher, or expected, information matrix for the parameters in a latent-variable model is bounded from above by the information that would be obtained if the values of the latent variables could also be observed. The difference between this upper bound and the information in the observed data is the “missing information.” This paper explicates the structure of the expected information matrix and related information matrices, and characterizes the degree to which missing information can be recovered by exploiting collateral variables for respondents. The results are illustrated in the context of item response theory models, and practical implications are discussed.


Sign in / Sign up

Export Citation Format

Share Document