Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments

2019 ◽  
Vol 39 ◽  
pp. 25-38 ◽  
Author(s):  
Stefanie A. Wind ◽  
A. Adrienne Walker
2019 ◽  
Vol 35 (1) ◽  
pp. 126-136 ◽  
Author(s):  
Tour Liu ◽  
Tian Lan ◽  
Tao Xin

Abstract. Random response is a very common aberrant response behavior in personality tests and may negatively affect the reliability, validity, or other analytical aspects of psychological assessment. Typically, researchers use a single person-fit index to identify random responses. This study recommends a three-step person-fit analysis procedure. Unlike the typical single person-fit methods, the three-step procedure identifies both global misfit and local misfit individuals using different person-fit indices. This procedure was able to identify more local misfit individuals than single-index method, and a graphical method was used to visualize those particular items in which random response behaviors appear. This method may be useful to researchers in that it will provide them with more information about response behaviors, allowing better evaluation of scale administration and development of more plausible explanations. Real data were used in this study instead of simulation data. In order to create real random responses, an experimental test administration was designed. Four different random response samples were produced using this experimental system.


Psychometrika ◽  
1990 ◽  
Vol 55 (1) ◽  
pp. 75-106 ◽  
Author(s):  
Ivo W. Molenaar ◽  
Herbert Hoijtink

Many research had shown person fit indices might be influenced by the factor of test length on their detection rates of aberrant responses. The purpose of this study was to examine test length effects on the BW aberrance indices. Three conditions were designed in this study: test length (K, including 25, 50,100, and 200 items), ability ratio (T/K, defined as the total person score divided by test length K), and error ratio (E/K, defined as the number of errors within ability level divided by test length). Four 100-person times varying-item data matrices (100x25, 100x50, 100x100, and 100x200) were randomly generated and permuted 500 times for each data matrix through 20 repeats. Results showed that after partialling out the factors of E/K and T/K, the effect of test length on the association between the two indices was very slight. In nonlinear regression analyses, E/K and T/K can predict more than 76 and 73 percent of the variances of the B index and that of the W index, respectively, but test length with both very slight contributions on them. Furthermore, a very good model fit generated from SEM analyses also showed the effect of test length on the B and W indices were very tiny. All these pieces of evidence endorsed the B and W indices were robust with test length.


2020 ◽  
Vol 10 (11) ◽  
pp. 324
Author(s):  
Amin Mousavi ◽  
Ying Cui

Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several person fit indices were developed to detect aberrant responding on educational and psychological tests. The majority of the person fit literature has been focused on creating and evaluating new indices. The aim of this study was to assess the effect of aberrant responding on the accuracy of estimated item parameters and refining estimations by using person fit statistics by means of simulation. Our results showed that the presence of aberrant response patterns created bias in the both b and a parameters at the item level and affected the classification of students, particularly high-performing students, into performance categories regardless of whether aberrant response patterns were present in the data or were removed. The results differed by test length and the percentage of students with aberrant response patterns. Practical and theoretical implications are discussed.


2020 ◽  
Vol 45 (6) ◽  
pp. 719-749
Author(s):  
Eduardo Doval ◽  
Pedro Delicado

We propose new methods for identifying and classifying aberrant response patterns (ARPs) by means of functional data analysis. These methods take the person response function (PRF) of an individual and compare it with the pattern that would correspond to a generic individual of the same ability according to the item-person response surface. ARPs correspond to atypical difference functions. The ARP classification is done with functional data clustering applied to the PRFs identified as ARP. We apply these methods to two sets of simulated data (the first is used to illustrate the ARP identification methods and the second demonstrates classification of the response patterns flagged as ARP) and a real data set (a Grade 12 science assessment test, SAT, with 32 items answered by 600 examinees). For comparative purposes, ARPs are also identified with three nonparametric person-fit indices (Ht, Modified Caution Index, and ZU3). Our results indicate that the ARP detection ability of one of our proposed methods is comparable to that of person-fit indices. Moreover, the proposed classification methods enable ARP associated with either spuriously low or spuriously high scores to be distinguished.


Author(s):  
Yaqoub Z. Al Shaqsy ◽  
Yousef A. Abu Shindi ◽  
Rashid S. Almehrizi

This study aimed to examine the effectiveness of person fit indices (Wright’s weighted index, Drasgow index and Almehrizi’s weighted index) in item response models with different degrees of item local dependence (0.0, 0.3, 0.6, and 0.9) using simulated item parameters. Item responses for 40 samples each with 10000 subjects (a total of 400000 subjects) were simulated on a test of 60 items. Item discrimination parameters ranged between 0.19 and 1.79 and item difficulty parameters ranged between -2 and +2. 20% of test items were manipulated to show local dependence for each level of local dependence degrees. Student ability was generated to follow a standard normal distribution. Assumptions of item response theory were examined in all data sets using exploratory factor analysis and residual analysis using NOHARM platform for unidimensionality and Q3 index for local independence. Results showed that there was an increase in the percentages of non-conforming persons when increasing the degree of items local dependence for the three person fit indices (Wright’s weighted index, Drasgow index and Almehrizi’s weighted index). Results showed also that the percentages of non-conforming persons were larger with Wright’s weighted index than with Drasgow index and Almehrizi’s weighted index. The distributional properties of the three indices showed relatively consistent in distributional properties. Drasgow index and Almehrizi’s weighted index were very similar distributional properties. Also, there was a larger agreement index between Wright’s weighted index and Drasgow index.


2017 ◽  
Vol 42 (5) ◽  
pp. 343-358
Author(s):  
Yan Xia ◽  
Yi Zheng

Snijders developed a family of person fit indices that asymptotically follow the standard normal distribution, when the ability parameter is estimated. So far, [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text] from this family have been proposed in previous literature. One common property shared by [Formula: see text], U*, and W* (also [Formula: see text] and [Formula: see text] in some specific conditions) is that they employ symmetric weight functions and thus identify spurious scores on both easy and difficult items in the same manner. However, when the purpose is to detect only the spuriously high scores on difficult items, such as cheating, guessing, and having item preknowledge, using symmetric weight functions may jeopardize the detection rates of the target aberrant response patterns. By specifying two types of asymmetric weight functions, this study proposes SHa(λ)* (λ = 1/2 or 1) and SHb(β)* (β = 2 or 3) based on Snijders’s framework to specifically detect spuriously high scores on difficult items. Two simulation studies were carried out to investigate the Type I error rates and empirical power of SHa(λ)* and SHb(β)*, compared with [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text]. The empirical results demonstrated satisfactory performance of the proposed indices. Recommendations were also made on the choice of different person fit indices based on specific purposes.


Author(s):  
Rashid Al-Mehrzi

Wright's residual-based person fit indices were the first person fit indices with dichotomous IRT model and commonly used with Rasch model software. Although there were number of studies which suggested modifications to improve the statistical properties of the Wright's indices, they remained to lack good statistical properties.The study presented a new person fit index and how it can be interpreted and applied for detecting person misfit. Moreover, through a simulated data, the study investigated the statistical properties and the power rates of the new index and compared it with Wright's indices. Results showed that the new index had superior statistical properties under different test conditions and overcome the Wright's index. 


2012 ◽  
Vol 71 (2) ◽  
pp. 101-106 ◽  
Author(s):  
Raffaele Cioffi† ◽  
Anna Coluccia ◽  
Fabio Ferretti ◽  
Francesca Lorini ◽  
Aristide Saggino ◽  
...  

The present paper reexamines the psychometric properties of the Quality Perception Questionnaire (QPQ), an Italian survey instrument measuring patients’ perceptions of the quality of a recent hospital admission experience, in a sample of 4400 patients (Mage = 56.42 years; SD = 19.71 years, 48.8% females). The 14-item survey measures four factors: satisfaction with medical doctors, nursing staff, auxiliary staff, and hospital structures. First, we tested two models using a confirmatory factor analysis (structural equation modeling): a four orthogonal factor and a four oblique factor model. The SEM fit indices and the χ² difference suggested the acceptance of the second model. We then did a simulation using a bootstrap with 1000 replications. Results confirmed the four oblique factor solution. Third, we tested whether there were significant differences with respect to age or sex. The multivariate general linear model showed no significant differences in the factors with respect to sex or age.


Sign in / Sign up

Export Citation Format

Share Document