Identifying Problematic Item Characteristics With Small Samples Using Mokken Scale Analysis

2021 ◽  
pp. 001316442110453
Author(s):  
Stefanie A. Wind

Researchers frequently use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, when they have relatively small samples of examinees. Researchers have provided some guidance regarding the minimum sample size for applications of MSA under various conditions. However, these studies have not focused on item-level measurement problems, such as violations of monotonicity or invariant item ordering (IIO). Moreover, these studies have focused on problems that occur for a complete sample of examinees. The current study uses a simulation study to consider the sensitivity of MSA item analysis procedures to problematic item characteristics that occur within limited ranges of the latent variable. Results generally support the use of MSA with small samples ( N around 100 examinees) as long as multiple indicators of item quality are considered.

2016 ◽  
Vol 78 (2) ◽  
pp. 319-342 ◽  
Author(s):  
Stefanie A. Wind ◽  
Yogendra J. Patil

Recent research has explored the use of models adapted from Mokken scale analysis as a nonparametric approach to evaluating rating quality in educational performance assessments. A potential limiting factor to the widespread use of these techniques is the requirement for complete data, as practical constraints in operational assessment systems often limit the use of complete rating designs. In order to address this challenge, this study explores the use of missing data imputation techniques and their impact on Mokken-based rating quality indicators related to rater monotonicity, rater scalability, and invariant rater ordering. Simulated data and real data from a rater-mediated writing assessment were modified to reflect varying levels of missingness, and four imputation techniques were used to impute missing ratings. Overall, the results indicated that simple imputation techniques based on rater and student means result in generally accurate recovery of rater monotonicity indices and rater scalability coefficients. However, discrepancies between violations of invariant rater ordering in the original and imputed data are somewhat unpredictable across imputation methods. Implications for research and practice are discussed.


Author(s):  
Daniela R. Crișan ◽  
Jorge N. Tendeiro ◽  
Rob R. Meijer

Abstract Purpose In Mokken scaling, the Crit index was proposed and is sometimes used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was twofold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. Methods We conducted two simulation studies in the context of dichotomously scored item responses. We manipulated the type of assumption violation, the proportion of violating items, sample size, and quality. False positive rates and power to detect assumption violations were our main outcome variables. Furthermore, we used the Crit coefficient in a Mokken scale analysis to a set of responses to the General Health Questionnaire (GHQ-12), a self-administered questionnaire for assessing current mental health. Results We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Overall, in small samples Crit lacked the power to detect misfit, and in larger samples power differed considerably depending on the type of violation and proportion of misfitting items. Furthermore, we also found in our empirical example that even in large samples the Crit index may fail to detect assumption violations. Discussion Even in large samples, the Crit coefficient showed limited usefulness for detecting moderate and severe violations of monotonicity. Our findings are relevant to researchers and practitioners who use Mokken scaling for scale and questionnaire construction and revision.


2019 ◽  
Author(s):  
Colin Vize ◽  
Katherine Collison ◽  
Donald Lynam ◽  
Josh Miller

Objective: Partialing procedures are frequently used in psychological research. The present study sought to further explore the consequences of partialing, focusing on the replicability of partialing-based results. Method: We used popular measures of the Dark Triad (DT; Machiavellianism, narcissism, and psychopathy) to explore the replicability of partialing procedures. We examined whether the residual content of popular DT scales are similar to the residual content of DT scales derived from separate samples based on relations with individual items from the IPIP-NEO-120, allowing for a fine-grained analysis of residual variable content. Results: Profiles were compared using three sample sizes (Small N=156-157, Moderate N = 313-314, Large N = 627-628) randomly drawn from a large MTurk sample (N = 1,255). There was low convergence among original/residual DT scales within samples. Additionally, results showed the content of residual Dirty Dozen scales was not similar across samples. Similar results were found for Short Dark Triad-Machiavellianism, but only in the moderate and small samples. Conclusion: The results indicate that there are important issues that arise when using partialing procedures, including replicability issues surrounding residual variables. Reasons for the observed results are discussed and further research examining the replicability of residual-based results is recommended.


2016 ◽  
Vol 85 ◽  
pp. 65 ◽  
Author(s):  
K.E. Freedland ◽  
M. Lemos ◽  
F. Doyle ◽  
B.C. Steinmeyer ◽  
I. Csik ◽  
...  

2021 ◽  
Vol 8 (3) ◽  
pp. 672-695
Author(s):  
Thomas DeVaney

This article presents a discussion and illustration of Mokken scale analysis (MSA), a nonparametric form of item response theory (IRT), in relation to common IRT models such as Rasch and Guttman scaling. The procedure can be used for dichotomous and ordinal polytomous data commonly used with questionnaires. The assumptions of MSA are discussed as well as characteristics that differentiate a Mokken scale from a Guttman scale. MSA is illustrated using the mokken package with R Studio and a data set that included over 3,340 responses to a modified version of the Statistical Anxiety Rating Scale. Issues addressed in the illustration include monotonicity, scalability, and invariant ordering. The R script for the illustration is included.


Sign in / Sign up

Export Citation Format

Share Document