scholarly journals Do Items Order? The Psychology in IRT Models

2018 ◽  
Author(s):  
Julia M. Haaf ◽  
Edgar C. Merkle ◽  
Jeffrey N. Rouder

Invariant item ordering refers to the statement that if one item is harder than another for one person, then it is harder for all people. Whether item ordering holds is a psychological statement because it describes how people may qualitatively vary. Yet, modern item response theory (IRT) makes an a priori commitment to item ordering. The Rasch model, for example, posits that items must order. Conversely, the 2PL model posits that items never order. Needed is an IRT model where item ordering or its violation is a function of the data rather than an *a priori* commitment. We develop two-parameter shift-scale models for this purpose, and find that the two-parameter uniform offers many advantages. We show how item ordering may be assessed using Bayes factor model comparison, and discuss computational issues with shift-scale IRT models.

2020 ◽  
Author(s):  
Martin Schnuerch ◽  
Lena Nadarevic ◽  
Jeffrey Rouder

The repetition-induced truth effect refers to a phenomenon where people rate repeated statements as more likely true than novel statements. In this paper we document qualitative individual differences in the effect. While the overwhelming majority of participants display the usual positive truth effect, a minority are the opposite – they reliably discount the validity of repeated statements, what we refer to as negative truth effect. We examine 8 truth-effect data sets where individual-level data are curated. These sets are composed of 1,105 individuals performing 38,904 judgments. Through Bayes factor model comparison, we show that reliable negative truth effects occur in 5 of the 8 data sets. The negative truth effect is informative because it seems unreasonable that the mechanisms mediating the positive truth effect are the same that lead to a discounting of repeated statements' validity. Moreover, the presence of qualitative differences motivates a different type of analysis of individual differences based on ordinal (i.e., Which sign does the effect have?) rather than metric measures. To our knowledge, this paper reports the first such reliable qualitative differences in a cognitive task.


Author(s):  
Martin Schnuerch ◽  
Lena Nadarevic ◽  
Jeffrey N. Rouder

Abstract The repetition-induced truth effect refers to a phenomenon where people rate repeated statements as more likely true than novel statements. In this paper, we document qualitative individual differences in the effect. While the overwhelming majority of participants display the usual positive truth effect, a minority are the opposite—they reliably discount the validity of repeated statements, what we refer to as negative truth effect. We examine eight truth-effect data sets where individual-level data are curated. These sets are composed of 1105 individuals performing 38,904 judgments. Through Bayes factor model comparison, we show that reliable negative truth effects occur in five of the eight data sets. The negative truth effect is informative because it seems unreasonable that the mechanisms mediating the positive truth effect are the same that lead to a discounting of repeated statements’ validity. Moreover, the presence of qualitative differences motivates a different type of analysis of individual differences based on ordinal (i.e., Which sign does the effect have?) rather than metric measures. To our knowledge, this paper reports the first such reliable qualitative differences in a cognitive task.


2017 ◽  
Author(s):  
Julia M. Haaf ◽  
Jeffrey Rouder

Model comparison in Bayesian mixed models is becoming popular in psychological science. Here we develop a set of nested models that account for order restrictions across individuals in psychological tasks. An order-restricted model addresses the question 'Does Everybody', as in, 'Does everybody show the usual Stroop effect', or ‘Does everybody respond more quickly to intense noises than subtle ones.’ The crux of the modeling is the instantiation of 10s or 100s of order restrictions simultaneously, one for each participant. To our knowledge, the problem is intractable in frequentist contexts but relatively straightforward in Bayesian ones. We develop a Bayes factor model-comparison strategy using Zellner and colleagues’ default g-priors appropriate for assessing whether effects obey equality and order restrictions. We apply the methodology to seven data sets from Stroop, Simon, and Eriksen interference tasks. Not too surprisingly, we find that everybody Stroops—that is, for all people congruent colors are truly named more quickly than incongruent ones. But, perhaps surprisingly, we find these order constraints are violated for some people in the Simon task, that is, for these people spatially incongruent responses occur truly more quickly than congruent ones! Implications of the modeling and conjectures about the task-related differences are discussed.This paper was written in R-Markdown with code for data analysis integrated into the text. The Markdown script isopen and freely available at https://github.com/PerceptionAndCognitionLab/ctx-indiff. The data are also open and freely available at https://github.com/PerceptionCognitionLab/data0/tree/master/contexteffects.


2017 ◽  
Author(s):  
Jeffrey Rouder ◽  
Julia M. Haaf ◽  
Clintin Stober ◽  
Joseph Hilgard

Most meta-analyses focus on meta-analytic means, testing whether they are significantly different from zero and how they depend on covariates. This mean is difficult to defend as a construct because the underlying distribution of studies reflects many factors such as how we choose to run experiments. We argue that the fundamental questions of meta-analysis should not be about the aggregated mean; instead, one should ask which relations are stable across all the studies. In a typical meta-analysis, there is a preferred or hypothesized direction (e.g., that violent video games increase, rather than decrease, agressive behavior). We ask whether all studies in a meta-analysis have true effects in a common direction. If so, this is an example of a stable relation across all the studies. We propose four models: (i) all studies are truly null; (ii) all studies share a single true nonzero effect; (iii) studies differ, but all true effects are in the same direction; and (iv) some study effects are truly positive while others are truly negative. We develop Bayes factor model comparison for these models and apply them to four extant meta-analyses to show their usefulness.


2021 ◽  
Author(s):  
Maximilian Linde ◽  
Don van Ravenzwaaij

Nested data structures, in which conditions include multiple trials, are often analyzed using repeated-measures analysis of variance or mixed effects models. Typically, researchers are interested in determining whether there is an effect of the experimental manipulation. Unfortunately, these kinds of analyses have different appropriate specifications for the null and alternative models, and a discussion on which is to be preferred and when is sorely lacking. van Doorn et al. (2021) performed three types of Bayes factor model comparisons on a simulated data set in order to examine which model comparison is most suitable for quantifying evidence for or against the presence of an effect of the experimental manipulation. Here we extend their results by simulating multiple data sets for various scenarios and by using different prior specifications. We demonstrate how three different Bayes factor model comparison types behave under changes in different parameters, and we make concrete recommendations on which model comparison is most appropriate for different scenarios.


2017 ◽  
Vol 78 (3) ◽  
pp. 384-408 ◽  
Author(s):  
Yong Luo ◽  
Hong Jiao

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.


2008 ◽  
Vol 216 (2) ◽  
pp. 89-101 ◽  
Author(s):  
Johannes Hartig ◽  
Jana Höhler

Multidimensional item response theory (MIRT) holds considerable promise for the development of psychometric models of competence. It provides an ideal foundation for modeling performance in complex domains, simultaneously taking into account multiple basic abilities. The aim of this paper is to illustrate the relations between a two-dimensional IRT model with between-item multidimensionality and a nested-factor model with within-item multidimensionality, and the different substantive meanings of the ability dimensions in the two models. Both models are applied to empirical data from a large-scale assessment of reading and listening comprehension in a foreign language. In the between-item model, performance in the reading and listening items is modeled by two separate dimensions. In the within-item model, one dimension represents the abilities common to both tests, and a second dimension represents abilities specific to listening comprehension. Distinct relations of external variables, such as gender and cognitive abilities, with ability scores demonstrate that the alternative models have substantively different implications.


Author(s):  
Maximilian Linde ◽  
Don van Ravenzwaaij

AbstractNested data structures, in which conditions include multiple trials and are fully crossed with participants, are often analyzed using repeated-measures analysis of variance or mixed-effects models. Typically, researchers are interested in determining whether there is an effect of the experimental manipulation. These kinds of analyses have different appropriate specifications for the null and alternative models, and a discussion on which is to be preferred and when is sorely lacking. van Doorn et al. (2021) performed three types of Bayes factor model comparisons on a simulated data set in order to examine which model comparison is most suitable for quantifying evidence for or against the presence of an effect of the experimental manipulation. Here, we extend their results by simulating multiple data sets for various scenarios and by using different prior specifications. We demonstrate how three different Bayes factor model comparison types behave under changes in different parameters, and we make concrete recommendations on which model comparison is most appropriate for different scenarios.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 79-80
Author(s):  
Chinyere Ekine ◽  
Raphael Mrode ◽  
Edwin Oyieng ◽  
Daniel Komwihangilo ◽  
Gilbert Msuta ◽  
...  

Abstract Modelling the growth curve of animals provides information on growth characteristics and is important for optimizing management in different livestock systems. This study evaluated the growth curves of crossbred calves from birth to 30 months of age in small holder dairy farms in Tanzania using a two parameter (exponential), four different three parameters (Logistic, von Bertalanffy, Brody, Gompertz), and three polynomial functions. Predicted weights based on heart girth measurements of 623 male and 846 female calves born between 2016 and 2019 used in this study were from the African Dairy Genetic Gains (ADGG) project in selected milk sheds in Tanzania, namely Tanga, Kilimanjaro, Arusha, Iringa, Njomba and Mbeya. Each function was fitted separately to weight measurement of males and females adjusted for the effect of ward and season of birth using the nonlinear least squares (nls) functions in R statistical software. The Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) were used for model comparison. Based on these criteria, all three polynomial and four parameter functions performed better and did not differ enough from each other in both males and females compared to the two-parameter exponential model. Predicted weight varied among the models and differed between males and females. The highest estimated weight was observed in the Brody model for both males (278.09 kg) and females (264.10 kg). Lowest estimated weight was observed in the exponential model. Estimated growth rate varied among models. For males, it ranged from 0.04 kg-0.08 kg and for females, from 0.05 kg-0.09 kg in the Brody model and logistic model respectively. Predictive ability across all fitted curves was low, ranging from 25% to approximately 29%. This could be due to the huge range of breed compositions in the evaluated crossbred calves which characterizes small holder dairy farms in this system and different levels of farm management.


Sign in / Sign up

Export Citation Format

Share Document