On Longitudinal Item Response Theory Models: A Didactic

Recent work on measuring growth with categorical outcome variables has combined the item response theory (IRT) measurement model with the latent growth curve model and extended the assessment of growth to multidimensional IRT models and higher order IRT models. However, there is a lack of synthetic studies that clearly evaluate the strength and limitations of different multilevel IRT models for measuring growth. This study aims to introduce the various longitudinal IRT models, including the longitudinal unidimensional IRT model, longitudinal multidimensional IRT model, and longitudinal higher order IRT model, which cover a broad range of applications in education and social science. Following a comparison of the parameterizations, identification constraints, strengths, and weaknesses of the different models, a real data example is provided to illustrate the application of different longitudinal IRT models to model students’ growth trajectories on multiple latent abilities.

Download Full-text

Clustering students according to their proficiency: a comparison between different approaches based on item response theory models

10.36253/978-88-5518-461-8.09 ◽

2021 ◽

pp. 43-48

Author(s):

Rosa Fabbricatore ◽

Francesco Palumbo

Keyword(s):

Item Response Theory ◽

Item Response ◽

Clustering Algorithm ◽

Latent Class ◽

Latent Class Models ◽

Response Theory ◽

Homogeneous Groups ◽

Final Grade ◽

Irt Model ◽

Irt Models

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.

Download Full-text

About item response theory models and how they work

Journal of Transportation Management ◽

10.22237/jotm/1530446640 ◽

2018 ◽

Vol 29 (1) ◽

pp. 35-44

Author(s):

Nell Sedransk

Keyword(s):

Item Response Theory ◽

Item Response ◽

Scoring System ◽

Response Theory ◽

Irt Model ◽

Its Analysis ◽

Irt Models ◽

Actual Model ◽

Item Response Theory Models ◽

Better Than

This article is about FMCSA data and its analysis. The article responds to the two-part question: How does an Item Response Theory (IRT) model work differently . . . or better than any other model? The response to the first part is a careful, completely non-technical exposition of the fundamentals for IRT models. It differentiates IRT models from other models by providing the rationale underlying IRT modeling and by using graphs to illustrate two key properties for data items. The response to the second part of the question about superiority of an IRT model is, “it depends.” For FMCSA data, serious challenges arise from complexity of the data and from heterogeneity of the carrier industry. Questions are posed that will need to be addressed to determine the success of the actual model developed and of the scoring system.

Download Full-text

Assessing Item-Level Fit for Higher Order Item Response Theory Models

Applied Psychological Measurement ◽

10.1177/0146621618762740 ◽

2018 ◽

Vol 42 (8) ◽

pp. 644-659

Author(s):

Xue Zhang ◽

Chun Wang ◽

Jian Tao

Keyword(s):

Item Response Theory ◽

Item Response ◽

Higher Order ◽

Simulation Studies ◽

Fit Indices ◽

Response Theory ◽

Chi Square ◽

Detection Rates ◽

Irt Models ◽

Item Level

Testing item-level fit is important in scale development to guide item revision/deletion. Many item-level fit indices have been proposed in literature, yet none of them were directly applicable to an important family of models, namely, the higher order item response theory (HO-IRT) models. In this study, chi-square-based fit indices (i.e., Yen’s Q1, McKinley and Mill’s G2, Orlando and Thissen’s S-X2, and S-G2) were extended to HO-IRT models. Their performances are evaluated via simulation studies in terms of false positive rates and correct detection rates. The manipulated factors include test structure (i.e., test length and number of dimensions), sample size, level of correlations among dimensions, and the proportion of misfitting items. For misfitting items, the sources of misfit, including the misfitting item response functions, and misspecifying factor structures were also manipulated. The results from simulation studies demonstrate that the S-G2 is promising for higher order items.

Download Full-text

A General Bayesian Multidimensional Item Response Theory Model for Small and Large Samples

Educational and Psychological Measurement ◽

10.1177/0013164419891205 ◽

2020 ◽

Vol 80 (4) ◽

pp. 665-694

Author(s):

Ken A. Fujimoto ◽

Sabina R. Neugebauer

Keyword(s):

Item Response Theory ◽

Item Response ◽

Rating Scale ◽

Real Data ◽

Theory Model ◽

Dimensional Structure ◽

Small Scale ◽

Response Theory ◽

Large Samples ◽

Irt Models

Although item response theory (IRT) models such as the bifactor, two-tier, and between-item-dimensionality IRT models have been devised to confirm complex dimensional structures in educational and psychological data, they can be challenging to use in practice. The reason is that these models are multidimensional IRT (MIRT) models and thus are highly parameterized, making them only suitable for data provided by large samples. Unfortunately, many educational and psychological studies are conducted on a small scale, leaving the researchers without the necessary MIRT models to confirm the hypothesized structures in their data. To address the lack of modeling options for these researchers, we present a general Bayesian MIRT model based on adaptive informative priors. Simulations demonstrated that our MIRT model could be used to confirm a two-tier structure (with two general and six specific dimensions), a bifactor structure (with one general and six specific dimensions), and a between-item six-dimensional structure in rating scale data representing sample sizes as small as 100. Although our goal was to provide a general MIRT model suitable for smaller samples, the simulations further revealed that our model was applicable to larger samples. We also analyzed real data from 121 individuals to illustrate that the findings of our simulations are relevant to real situations.

Download Full-text

Using the Stan Program for Bayesian Item Response Theory

Educational and Psychological Measurement ◽

10.1177/0013164417693666 ◽

2017 ◽

Vol 78 (3) ◽

pp. 384-408 ◽

Cited By ~ 19

Author(s):

Yong Luo ◽

Hong Jiao

Keyword(s):

Item Response Theory ◽

Item Response ◽

Model Comparison ◽

Response Model ◽

Graded Response Model ◽

Response Theory ◽

Irt Model ◽

Irt Models ◽

Nominal Response Model ◽

Graded Response

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.

Download Full-text

An Item Response Theory–Based, Computerized Adaptive Testing Version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-l-15-0202 ◽

2016 ◽

Vol 59 (2) ◽

pp. 281-289 ◽

Cited By ~ 5

Author(s):

Guido Makransky ◽

Philip S. Dale ◽

Philip Havmose ◽

Dorthe Bleses

Keyword(s):

Item Response Theory ◽

Item Response ◽

Computerized Adaptive Testing ◽

Real Data ◽

Adaptive Testing ◽

Measurement Precision ◽

Response Theory ◽

Irt Model ◽

Communicative Development ◽

Accuracy And Precision

Purpose This study investigated the feasibility and potential validity of an item response theory (IRT)–based computerized adaptive testing (CAT) version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining measurement precision. Method Parent-reported vocabulary for the American CDI:WS norming sample consisting of 1,461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2-parameter logistic IRT model and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10, and 5 items. Results All but 14 items fit the 2-parameter logistic IRT model, and real data simulations of CDI-CATs with at least 50 items recovered full CDI scores with correlations over .95. Furthermore, the CDI-CATs with at least 50 items had similar correlations with age and socioeconomic status as the full CDI:WS. Conclusion These results provide strong evidence that a CAT version of the CDI:WS has the potential to reduce length while maintaining the accuracy and precision of the full instrument.

Download Full-text

Latent growth modeling of IRT versus CTT measured longitudinal latent variables

Statistical Methods in Medical Research ◽

10.1177/0962280219856375 ◽

2019 ◽

Vol 29 (4) ◽

pp. 962-986

Author(s):

R Gorter ◽

J-P Fox ◽

G Ter Riet ◽

MW Heymans ◽

JWR Twisk

Keyword(s):

Item Response Theory ◽

Item Response ◽

Classical Test Theory ◽

Measurement Model ◽

Test Theory ◽

Response Theory ◽

Latent Growth ◽

Classical Test ◽

Plausible Values ◽

The Individual

Latent growth models are often used to measure individual trajectories representing change over time. The characteristics of the individual trajectories depend on the variability in the longitudinal outcomes. In many medical and epidemiological studies, the individual health outcomes cannot be observed directly and are indirectly observed through indicators (i.e. items of a questionnaire). An item response theory or a classical test theory measurement model is required, but the choice can influence the latent growth estimates. In this study, under various conditions, this influence is directly assessed by estimating latent growth parameters on a common scale for item response theory and classical test theory using a novel plausible value method in combination with Markov chain Monte Carlo. The latent outcomes are considered missing data and plausible values are generated from the corresponding posterior distribution, separately for item response theory and classical test theory. These plausible values are linearly transformed to a common scale. A Markov chain Monte Carlo method was developed to simultaneously estimate the latent growth and measurement model parameters using this plausible value technique. It is shown that estimated individual trajectories using item response theory, compared to classical test theory to measure outcomes, provide a more detailed description of individual change over time, since item response patterns (item response theory) are more informative about the health measurements than sum scores (classical test theory).

Download Full-text

A Multilevel Higher Order Item Response Theory Model for Measuring Latent Growth in Longitudinal Data

Applied Psychological Measurement ◽

10.1177/0146621614568112 ◽

2015 ◽

Vol 39 (5) ◽

pp. 362-372 ◽

Cited By ~ 10

Author(s):

Hung-Yu Huang

Keyword(s):

Item Response Theory ◽

Longitudinal Data ◽

Item Response ◽

Theory Model ◽

Higher Order ◽

Item Response Theory Model ◽

Response Theory ◽

Latent Growth

Download Full-text

Construct validity of the Smoker Complaint Scale: A clinimetric analysis using Item Response Theory (IRT) models

Addictive Behaviors ◽

10.1016/j.addbeh.2021.106849 ◽

2021 ◽

Vol 117 ◽

pp. 106849

Author(s):

Danilo Carrozzino ◽

Kaj Sparle Christensen ◽

Giovanni Mansueto ◽

Fiammetta Cosci

Keyword(s):

Item Response Theory ◽

Construct Validity ◽

Item Response ◽

Response Theory ◽

Irt Models

Download Full-text

Mokken Scale Analysis: Discussion and Application

Advances in Social Sciences Research Journal ◽

10.14738/assrj.83.9949 ◽

2021 ◽

Vol 8 (3) ◽

pp. 672-695

Author(s):

Thomas DeVaney

Keyword(s):

Item Response Theory ◽

Item Response ◽

Rating Scale ◽

Scale Analysis ◽

Guttman Scale ◽

Response Theory ◽

Data Set ◽

Mokken Scale Analysis ◽

Irt Models ◽

Guttman Scaling

This article presents a discussion and illustration of Mokken scale analysis (MSA), a nonparametric form of item response theory (IRT), in relation to common IRT models such as Rasch and Guttman scaling. The procedure can be used for dichotomous and ordinal polytomous data commonly used with questionnaires. The assumptions of MSA are discussed as well as characteristics that differentiate a Mokken scale from a Guttman scale. MSA is illustrated using the mokken package with R Studio and a data set that included over 3,340 responses to a modified version of the Statistical Anxiety Rating Scale. Issues addressed in the illustration include monotonicity, scalability, and invariant ordering. The R script for the illustration is included.

Download Full-text