item responses
Recently Published Documents


TOTAL DOCUMENTS

209
(FIVE YEARS 58)

H-INDEX

26
(FIVE YEARS 3)

2021 ◽  
Vol 11 (4) ◽  
pp. 1653-1687
Author(s):  
Alexander Robitzsch

Missing item responses are prevalent in educational large-scale assessment studies such as the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians have advocated for a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. In an illustrative simulation study, it is shown that the Mislevy-Wu model provides unbiased model parameters. Moreover, the simulation replicates the finding from various simulation studies from the literature that scoring missing item responses as wrong provides biased estimates if the latent ignorability assumption holds in the data-generating model. However, if missing item responses are generated such that they can only be generated from incorrect item responses, applying an item response model that relies on latent ignorability results in biased estimates. The Mislevy-Wu model guarantees unbiased parameter estimates if the more general Mislevy-Wu model holds in the data-generating model. In addition, this article uses the PISA 2018 mathematics dataset as a case study to investigate the consequences of different missing data treatments on country means and country standard deviations. Obtained country means and country standard deviations can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, in the discussion section, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.


Author(s):  
W. Kyle Ingle ◽  
Stephen M. Leach ◽  
Amy S. Lingo

We examined the characteristics of 77 high school participants from four school districts who participated in the Teaching and Learning Career Pathway (TLCP) at the University of Louisville during the 2018–2019 school year. The program seeks to support the recruitment of a diverse and effective educator workforce by recruiting high school students as potential teachers for dual-credit courses that explore the teaching profession. Utilizing descriptive and inferential analysis (χ2 tests) of closed-ended item responses as well as qualitative analysis of program documents, Web sites, and students’ open-ended item responses, we compared the characteristics of the participants with those of their home school districts and examined their perceptions of the program. When considering gender and race/ethnicity, our analysis revealed the program was unsuccessful in its first year, reaching predominantly white female high school students who were already interested in teaching. Respondents reported learning about the TLCP from school personnel, specifically, guidance counselors (39%), non-TCLP teachers (25%), or TLCP teachers (20%). We found that the TLCP program has not defined diversity in a measurable way and the lack of an explicit program theory hinders the evaluation and improvement of TLCP. Program recruitment and outcomes are the result of luck or idiosyncratic personnel recommendations rather than intentional processes. We identified a need for qualitative exploration of in-school recruitment processes and statewide longitudinal studies to track participant outcomes in college and in the teacher labor market.


Assessment ◽  
2021 ◽  
pp. 107319112110429
Author(s):  
Allison J. Ames ◽  
Brian C. Leventhal

Traditional psychometric models focus on studying observed categorical item responses, but these models often oversimplify the respondent cognitive response process, assuming responses are driven by a single substantive trait. A further weakness is that analysis of ordinal responses has been primarily limited to a single substantive trait at one time point. This study applies a significant expansion of this modeling framework to account for complex response processes across multiple waves of data collection using the item response tree (IRTree) framework. This study applies a novel model, the longitudinal IRTree, for response processes in longitudinal studies, and investigates whether the response style changes are proportional to changes in the substantive trait of interest. To do so, we present an empirical example using a six-item sexual knowledge scale from the National Longitudinal Study of Adolescent to Adult Health across two waves of data collection. Results show an increase in sexual knowledge from the first wave to the second wave and a decrease in midpoint and extreme response styles. Model validation revealed failure to account for response style can bias estimation of substantive trait growth. The longitudinal IRTree model captures midpoint and extreme response style, as well as the trait of interest, at both waves.


Author(s):  
Alexander Robitzsch

Missing item responses are prevalent in educational large-scale assessment studies like the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians advocated a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. This article uses the PISA 2018 mathematics dataset to investigate the consequences of different missing data treatments on country means. Obtained country means can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.


Author(s):  
Cai Xu ◽  
Mark V. Schaverien ◽  
Joani M. Christensen ◽  
Chris J. Sidey-Gibbons

Abstract Purpose This study aimed to evaluate and improve the accuracy and efficiency of the QuickDASH for use in assessment of limb function in patients with upper extremity lymphedema using modern psychometric techniques. Method We conducted confirmative factor analysis (CFA) and Mokken analysis to examine the assumption of unidimensionality for IRT model on data from 285 patients who completed the QuickDASH, and then fit the data to Samejima’s graded response model (GRM) and assessed the assumption of local independence of items and calibrated the item responses for CAT simulation. Results Initial CFA and Mokken analyses demonstrated good scalability of items and unidimensionality. However, the local independence of items assumption was violated between items 9 (severity of pain) and 11 (sleeping difficulty due to pain) (Yen’s Q3 = 0.46) and disordered thresholds were evident for item 5 (cutting food). After addressing these breaches of assumptions, the re-analyzed GRM with the remaining 10 items achieved an improved fit. Simulation of CAT administration demonstrated a high correlation between scores on the CAT and the QuickDash (r = 0.98). Items 2 (doing heavy chores) and 8 (limiting work or daily activities) were the most frequently used. The correlation among factor scores derived from the QuickDASH version with 11 items and the Ultra-QuickDASH version with items 2 and 8 was as high as 0.91. Conclusion By administering just these two best performing QuickDash items we can obtain estimates that are very similar to those obtained from the full-length QuickDash without the need for CAT technology.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Shinichiro Tomitaka ◽  
Toshiaki A. Furukawa

Abstract Background Recent studies have shown that, among the general population, responses to depression-rating scales follow a common mathematical pattern. However, the mathematical pattern among responses to the items of the Generalized Anxiety Disorder-7 (GAD-7) is currently unknown. The present study investigated whether item responses to the GAD-7, when administered to the general population, follow the same mathematical distribution as those of depression-rating scales. Methods We used data from the 2019 National Health Interview Survey (31,997 individuals), which is a nationwide survey of adults conducted annually in the United States. The patterns of item responses to the GAD-7 and the Patient Health Questionnaire-8 (PHQ-8), respectively, were analyzed inductively. Results For all GAD-7 items, the frequency distribution for each response option (“not at all,” “several days,” “more than half the days,” and “nearly every day,” respectively) was positively skewed. Line charts representing the responses to each GAD-7 item all crossed at a single point between “not at all” and “several days” and, on a logarithmic scale, showed a parallel pattern from “several days” to “nearly every day.” This mathematical pattern among the item responses was identical to that of the PHQ-8. This characteristic pattern of the item responses developed because the values for the “more than half the days” to “several days” ratio were similar across all items, as were the values for the “nearly every day” to “more than half the days” ratio. Conclusions Our results suggest that the symptom criteria of generalized anxiety disorder and major depression have a common distribution pattern in the general population.


2021 ◽  
pp. 001316442110453
Author(s):  
Gabriel Nagy ◽  
Esther Ulitzsch

Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.


2021 ◽  
Vol 2 ◽  
Author(s):  
Louise Moeldrup Nielsen ◽  
Lisa Gregersen Oestergaard ◽  
Hans Kirkegaard ◽  
Thomas Maribo

Introduction: The World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) is designed to measure functioning and disability in six domains. It is included in the International Classification of Diseases 11th revision (ICD-11). The objective of the study was to examine the construct validity of WHODAS 2.0 and describe its clinical utility for the assessment of functioning and disability among older patients discharged from emergency departments (EDs).Material and Methods: This cross-sectional study is based on data from 129 older patients. Patients completed the 36-item version of WHODAS 2.0 together with the Barthel-20, the Assessment of Motor and Process Skills (AMPS), Timed Up and Go (TUG), and the 30-Second Chair Stand Test (30 s-CST). Construct validity was examined through hypothesis testing by correlating the WHODAS with the other instruments and specifically the mobility domain in WHODAS 2.0 with the TUG and 30 s-CST tests. The clinical utility of WHODAS 2.0 was explored through floor/ceiling effect and missing item responses.Results: WHODAS 2.0 correlated fair with Barthel-20 (r = −0.49), AMPS process skills (r = −0.26) and TUG (r=0.30) and correlated moderate with AMPS motor skills (r = −0.58) and 30s-CST (r = −0.52). The WHODAS 2.0 mobility domain correlated fair with TUG (r = 0.33) and moderate with 30s-CST (r = −0.60). Four domains demonstrated floor effect: D1 “Cognition,” D3 “Self-care,” D4 “Getting along,” and D5 “Household.” Ceiling effect was not identified. The highest proportion of missing item responses were present for Item 3.4 (Staying by yourself for a few days), Item 4.4 (Making new friends), and Item 4.5 (Sexual activities).Conclusion: WHODAS 2.0 had fair-to-moderate correlations with Barthel-20, AMPS, TUG, and 30s-CST and provides additional aspects of disability compared with commonly used instruments. However, the clinical utility of WHODAS 2.0 applied to older patients discharged from EDs poses some challenges due to floor effect and missing item responses. Accordingly, patient and health professional perspectives need further investigation.


Author(s):  
Shiwei Tong ◽  
Qi Liu ◽  
Runlong Yu ◽  
Wei Huang ◽  
Zhenya Huang ◽  
...  

Cognitive diagnosis, a fundamental task in education area, aims at providing an approach to reveal the proficiency level of students on knowledge concepts. Actually, monotonicity is one of the basic conditions in cognitive diagnosis theory, which assumes that student's proficiency is monotonic with the probability of giving the right response to a test item. However, few of previous methods consider the monotonicity during optimization. To this end, we propose Item Response Ranking framework (IRR), aiming at introducing pairwise learning into cognitive diagnosis to well model the monotonicity between item responses. Specifically, we first use an item specific sampling method to sample item responses and construct response pairs based on their partial order, where we propose the two-branch sampling methods to handle the unobserved responses. After that, we use a pairwise objective function to exploit the monotonicity in the pair formulation. In fact, IRR is a general framework which can be applied to most of contemporary cognitive diagnosis models. Extensive experiments demonstrate the effectiveness and interpretability of our method.


Psychometrika ◽  
2021 ◽  
Author(s):  
Udo Boehm ◽  
Maarten Marsman ◽  
Han L. J. van der Maas ◽  
Gunter Maris

AbstractThe emergence of computer-based assessments has made response times, in addition to response accuracies, available as a source of information about test takers’ latent abilities. The development of substantively meaningful accounts of the cognitive process underlying item responses is critical to establishing the validity of psychometric tests. However, existing substantive theories such as the diffusion model have been slow to gain traction due to their unwieldy functional form and regular violations of model assumptions in psychometric contexts. In the present work, we develop an attention-based diffusion model based on process assumptions that are appropriate for psychometric applications. This model is straightforward to analyse using Gibbs sampling and can be readily extended. We demonstrate our model’s good computational and statistical properties in a comparison with two well-established psychometric models.


Sign in / Sign up

Export Citation Format

Share Document