On Longitudinal Item Response Theory Models: A Didactic

2019 ◽  
Vol 45 (3) ◽  
pp. 339-368 ◽  
Author(s):  
Chun Wang ◽  
Steven W. Nydick

Recent work on measuring growth with categorical outcome variables has combined the item response theory (IRT) measurement model with the latent growth curve model and extended the assessment of growth to multidimensional IRT models and higher order IRT models. However, there is a lack of synthetic studies that clearly evaluate the strength and limitations of different multilevel IRT models for measuring growth. This study aims to introduce the various longitudinal IRT models, including the longitudinal unidimensional IRT model, longitudinal multidimensional IRT model, and longitudinal higher order IRT model, which cover a broad range of applications in education and social science. Following a comparison of the parameterizations, identification constraints, strengths, and weaknesses of the different models, a real data example is provided to illustrate the application of different longitudinal IRT models to model students’ growth trajectories on multiple latent abilities.

2021 ◽  
pp. 43-48
Author(s):  
Rosa Fabbricatore ◽  
Francesco Palumbo

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.


2018 ◽  
Vol 29 (1) ◽  
pp. 35-44
Author(s):  
Nell Sedransk

This article is about FMCSA data and its analysis. The article responds to the two-part question: How does an Item Response Theory (IRT) model work differently . . . or better than any other model? The response to the first part is a careful, completely non-technical exposition of the fundamentals for IRT models. It differentiates IRT models from other models by providing the rationale underlying IRT modeling and by using graphs to illustrate two key properties for data items. The response to the second part of the question about superiority of an IRT model is, “it depends.” For FMCSA data, serious challenges arise from complexity of the data and from heterogeneity of the carrier industry. Questions are posed that will need to be addressed to determine the success of the actual model developed and of the scoring system.


2018 ◽  
Vol 42 (8) ◽  
pp. 644-659
Author(s):  
Xue Zhang ◽  
Chun Wang ◽  
Jian Tao

Testing item-level fit is important in scale development to guide item revision/deletion. Many item-level fit indices have been proposed in literature, yet none of them were directly applicable to an important family of models, namely, the higher order item response theory (HO-IRT) models. In this study, chi-square-based fit indices (i.e., Yen’s Q1, McKinley and Mill’s G2, Orlando and Thissen’s S-X2, and S-G2) were extended to HO-IRT models. Their performances are evaluated via simulation studies in terms of false positive rates and correct detection rates. The manipulated factors include test structure (i.e., test length and number of dimensions), sample size, level of correlations among dimensions, and the proportion of misfitting items. For misfitting items, the sources of misfit, including the misfitting item response functions, and misspecifying factor structures were also manipulated. The results from simulation studies demonstrate that the S-G2 is promising for higher order items.


2020 ◽  
Vol 80 (4) ◽  
pp. 665-694
Author(s):  
Ken A. Fujimoto ◽  
Sabina R. Neugebauer

Although item response theory (IRT) models such as the bifactor, two-tier, and between-item-dimensionality IRT models have been devised to confirm complex dimensional structures in educational and psychological data, they can be challenging to use in practice. The reason is that these models are multidimensional IRT (MIRT) models and thus are highly parameterized, making them only suitable for data provided by large samples. Unfortunately, many educational and psychological studies are conducted on a small scale, leaving the researchers without the necessary MIRT models to confirm the hypothesized structures in their data. To address the lack of modeling options for these researchers, we present a general Bayesian MIRT model based on adaptive informative priors. Simulations demonstrated that our MIRT model could be used to confirm a two-tier structure (with two general and six specific dimensions), a bifactor structure (with one general and six specific dimensions), and a between-item six-dimensional structure in rating scale data representing sample sizes as small as 100. Although our goal was to provide a general MIRT model suitable for smaller samples, the simulations further revealed that our model was applicable to larger samples. We also analyzed real data from 121 individuals to illustrate that the findings of our simulations are relevant to real situations.


2017 ◽  
Vol 78 (3) ◽  
pp. 384-408 ◽  
Author(s):  
Yong Luo ◽  
Hong Jiao

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.


2016 ◽  
Vol 59 (2) ◽  
pp. 281-289 ◽  
Author(s):  
Guido Makransky ◽  
Philip S. Dale ◽  
Philip Havmose ◽  
Dorthe Bleses

Purpose This study investigated the feasibility and potential validity of an item response theory (IRT)–based computerized adaptive testing (CAT) version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining measurement precision. Method Parent-reported vocabulary for the American CDI:WS norming sample consisting of 1,461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2-parameter logistic IRT model and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10, and 5 items. Results All but 14 items fit the 2-parameter logistic IRT model, and real data simulations of CDI-CATs with at least 50 items recovered full CDI scores with correlations over .95. Furthermore, the CDI-CATs with at least 50 items had similar correlations with age and socioeconomic status as the full CDI:WS. Conclusion These results provide strong evidence that a CAT version of the CDI:WS has the potential to reduce length while maintaining the accuracy and precision of the full instrument.


2019 ◽  
Vol 29 (4) ◽  
pp. 962-986
Author(s):  
R Gorter ◽  
J-P Fox ◽  
G Ter Riet ◽  
MW Heymans ◽  
JWR Twisk

Latent growth models are often used to measure individual trajectories representing change over time. The characteristics of the individual trajectories depend on the variability in the longitudinal outcomes. In many medical and epidemiological studies, the individual health outcomes cannot be observed directly and are indirectly observed through indicators (i.e. items of a questionnaire). An item response theory or a classical test theory measurement model is required, but the choice can influence the latent growth estimates. In this study, under various conditions, this influence is directly assessed by estimating latent growth parameters on a common scale for item response theory and classical test theory using a novel plausible value method in combination with Markov chain Monte Carlo. The latent outcomes are considered missing data and plausible values are generated from the corresponding posterior distribution, separately for item response theory and classical test theory. These plausible values are linearly transformed to a common scale. A Markov chain Monte Carlo method was developed to simultaneously estimate the latent growth and measurement model parameters using this plausible value technique. It is shown that estimated individual trajectories using item response theory, compared to classical test theory to measure outcomes, provide a more detailed description of individual change over time, since item response patterns (item response theory) are more informative about the health measurements than sum scores (classical test theory).


2021 ◽  
Vol 117 ◽  
pp. 106849
Author(s):  
Danilo Carrozzino ◽  
Kaj Sparle Christensen ◽  
Giovanni Mansueto ◽  
Fiammetta Cosci

2021 ◽  
Vol 8 (3) ◽  
pp. 672-695
Author(s):  
Thomas DeVaney

This article presents a discussion and illustration of Mokken scale analysis (MSA), a nonparametric form of item response theory (IRT), in relation to common IRT models such as Rasch and Guttman scaling. The procedure can be used for dichotomous and ordinal polytomous data commonly used with questionnaires. The assumptions of MSA are discussed as well as characteristics that differentiate a Mokken scale from a Guttman scale. MSA is illustrated using the mokken package with R Studio and a data set that included over 3,340 responses to a modified version of the Statistical Anxiety Rating Scale. Issues addressed in the illustration include monotonicity, scalability, and invariant ordering. The R script for the illustration is included.


Sign in / Sign up

Export Citation Format

Share Document