Detecting latent mean differences between non-invariant groups using ordered categorical variables

2019 ◽  
Author(s):  
◽  
Ti Zhang

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Though more and more applied researchers have begun to treat response options as ordered-categorical variables when conducting measurement invariance (MI) testing, little is known about the role of ordered-categorical variables when comparing latent means between groups. Therefore, this study simulated ordered-categorical data to specifically examine the detection of latent mean differences between non-invariant groups across a variety of conditions, including the number of items, population latent mean differences, etc. The purpose of this study was to investigate the relative parameter bias, power rates, and Type I error rates that may arise when ignoring various types of MI in both the configural invariance and metric invariance models. In summary, the most important contributors to relative bias of the true latent mean difference estimates were a) the number of items and the size of the factor loadings in the configural invariance model, b) the size of the factor loading and threshold differences in the metric invariance model that ignored group parameter differences, and c) the number of items in the metric invariance model that addressed the group parameter differences. Thus, in order to reduce the bias in estimating the true latent mean difference between groups, practitioners should identify and address the non-invariance and use a test instrument with more items. The dominant effect on the power to identify whether the latent mean difference was different from 0, in both the configural invariance model and the metric invariance model that ignored true group differences, was the population latent mean difference. In the metric invariance model that addressed the group differences, the most important effects were a) population latent mean differences, and b) loading and threshold differences. When the latent mean difference was at least moderate or the large threshold difference was ignored, the power rate was inflated to be above .90. Applied researchers should know that it will be easier to detect relatively large latent mean differences if both the loading and threshold differences are free to differ between groups. The dominant effect on Type I error rate in the configural invariance model was the number of items. In the metric invariance model that ignored the group parameter differences, the most important effects were a) the size of threshold differences, b) the loading and threshold differences, and c) the number of items. In the metric invariance model that addressed the group parameter differences, the most important effect on Type I error was the number of noninvariant items, which also significantly interacted with the number of items. Often, applied researchers assume their groups are equal, and may not concern themselves with detecting the true latent mean differences. Of course, true population differences cannot be known, so it is recommended that researchers should still conduct a MI analysis. It is especially important to note that in the metric invariance model that addressed group parameter differences, the Type I error rate was below .05. This result suggests that conducting MI testing will help applied researchers detect the true latent mean difference regardless of the magnitude of that difference (i.e., 0, .2 and .5 in this study).

2020 ◽  
Vol 36 (10) ◽  
pp. 3099-3106
Author(s):  
Burim Ramosaj ◽  
Lubna Amro ◽  
Markus Pauly

Abstract Motivation Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This article closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand. Results Our findings indicate that machine-learning schemes for (multiply) imputing missing values may inflate type I error or result in comparably low power in small-to-moderate matched pairs, even after modifying the test statistics using Rubin’s multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered. Availability and implementation The corresponding R-code can be accessed through the authors and the gene expression data can be downloaded at www.gdac.broadinstitute.org. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 77 (4) ◽  
pp. 545-569 ◽  
Author(s):  
Soo Lee ◽  
Okan Bulut ◽  
Youngsuk Suh

A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory (IRT) framework. The goal of the current study is to extend the MIMIC-interaction model for detecting DIF in the context of multidimensional IRT modelling and examine the performance of the multidimensional MIMIC-interaction model under various simulation conditions with respect to Type I error and power rates. Simulation conditions include DIF pattern and magnitude, test length, correlation between latent traits, sample size, and latent mean differences between focal and reference groups. The results of this study indicate that power rates of the multidimensional MIMIC-interaction model under uniform DIF conditions were higher than those of nonuniform DIF conditions. When anchor item length and sample size increased, power for detecting DIF increased. Also, the equal latent mean condition tended to produce higher power rates than the different mean condition. Although the multidimensional MIMIC-interaction model was found to be a reasonably useful tool for identifying uniform DIF, the performance of the model in detecting nonuniform DIF appeared to be questionable.


2020 ◽  
Vol 10 (18) ◽  
pp. 6247
Author(s):  
Hanan M. Hammouri ◽  
Roy T. Sabo ◽  
Rasha Alsaadawi ◽  
Khalid A. Kheirallah

Scientists in biomedical and psychosocial research need to deal with skewed data all the time. In the case of comparing means from two groups, the log transformation is commonly used as a traditional technique to normalize skewed data before utilizing the two-group t-test. An alternative method that does not assume normality is the generalized linear model (GLM) combined with an appropriate link function. In this work, the two techniques are compared using Monte Carlo simulations; each consists of many iterations that simulate two groups of skewed data for three different sampling distributions: gamma, exponential, and beta. Afterward, both methods are compared regarding Type I error rates, power rates and the estimates of the mean differences. We conclude that the t-test with log transformation had superior performance over the GLM method for any data that are not normal and follow beta or gamma distributions. Alternatively, for exponentially distributed data, the GLM method had superior performance over the t-test with log transformation.


2000 ◽  
Vol 14 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Joni Kettunen ◽  
Niklas Ravaja ◽  
Liisa Keltikangas-Järvinen

Abstract We examined the use of smoothing to enhance the detection of response coupling from the activity of different response systems. Three different types of moving average smoothers were applied to both simulated interbeat interval (IBI) and electrodermal activity (EDA) time series and to empirical IBI, EDA, and facial electromyography time series. The results indicated that progressive smoothing increased the efficiency of the detection of response coupling but did not increase the probability of Type I error. The power of the smoothing methods depended on the response characteristics. The benefits and use of the smoothing methods to extract information from psychophysiological time series are discussed.


Methodology ◽  
2012 ◽  
Vol 8 (1) ◽  
pp. 23-38 ◽  
Author(s):  
Manuel C. Voelkle ◽  
Patrick E. McKnight

The use of latent curve models (LCMs) has increased almost exponentially during the last decade. Oftentimes, researchers regard LCM as a “new” method to analyze change with little attention paid to the fact that the technique was originally introduced as an “alternative to standard repeated measures ANOVA and first-order auto-regressive methods” (Meredith & Tisak, 1990, p. 107). In the first part of the paper, this close relationship is reviewed, and it is demonstrated how “traditional” methods, such as the repeated measures ANOVA, and MANOVA, can be formulated as LCMs. Given that latent curve modeling is essentially a large-sample technique, compared to “traditional” finite-sample approaches, the second part of the paper addresses the question to what degree the more flexible LCMs can actually replace some of the older tests by means of a Monte-Carlo simulation. In addition, a structural equation modeling alternative to Mauchly’s (1940) test of sphericity is explored. Although “traditional” methods may be expressed as special cases of more general LCMs, we found the equivalence holds only asymptotically. For practical purposes, however, no approach always outperformed the other alternatives in terms of power and type I error, so the best method to be used depends on the situation. We provide detailed recommendations of when to use which method.


Methodology ◽  
2015 ◽  
Vol 11 (1) ◽  
pp. 3-12 ◽  
Author(s):  
Jochen Ranger ◽  
Jörg-Tobias Kuhn

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.


Methodology ◽  
2013 ◽  
Vol 9 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Holger Steinmetz

Although the use of structural equation modeling has increased during the last decades, the typical procedure to investigate mean differences across groups is still to create an observed composite score from several indicators and to compare the composite’s mean across the groups. Whereas the structural equation modeling literature has emphasized that a comparison of latent means presupposes equal factor loadings and indicator intercepts for most of the indicators (i.e., partial invariance), it is still unknown if partial invariance is sufficient when relying on observed composites. This Monte-Carlo study investigated whether one or two unequal factor loadings and indicator intercepts in a composite can lead to wrong conclusions regarding latent mean differences. Results show that unequal indicator intercepts substantially affect the composite mean difference and the probability of a significant composite difference. In contrast, unequal factor loadings demonstrate only small effects. It is concluded that analyses of composite differences are only warranted in conditions of full measurement invariance, and the author recommends the analyses of latent mean differences with structural equation modeling instead.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


Sign in / Sign up

Export Citation Format

Share Document