scholarly journals The Effect of Person Misfit on Item Parameter Estimation and Classification Accuracy: A Simulation Study

2020 ◽  
Vol 10 (11) ◽  
pp. 324
Author(s):  
Amin Mousavi ◽  
Ying Cui

Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several person fit indices were developed to detect aberrant responding on educational and psychological tests. The majority of the person fit literature has been focused on creating and evaluating new indices. The aim of this study was to assess the effect of aberrant responding on the accuracy of estimated item parameters and refining estimations by using person fit statistics by means of simulation. Our results showed that the presence of aberrant response patterns created bias in the both b and a parameters at the item level and affected the classification of students, particularly high-performing students, into performance categories regardless of whether aberrant response patterns were present in the data or were removed. The results differed by test length and the percentage of students with aberrant response patterns. Practical and theoretical implications are discussed.

2020 ◽  
Vol 45 (6) ◽  
pp. 719-749
Author(s):  
Eduardo Doval ◽  
Pedro Delicado

We propose new methods for identifying and classifying aberrant response patterns (ARPs) by means of functional data analysis. These methods take the person response function (PRF) of an individual and compare it with the pattern that would correspond to a generic individual of the same ability according to the item-person response surface. ARPs correspond to atypical difference functions. The ARP classification is done with functional data clustering applied to the PRFs identified as ARP. We apply these methods to two sets of simulated data (the first is used to illustrate the ARP identification methods and the second demonstrates classification of the response patterns flagged as ARP) and a real data set (a Grade 12 science assessment test, SAT, with 32 items answered by 600 examinees). For comparative purposes, ARPs are also identified with three nonparametric person-fit indices (Ht, Modified Caution Index, and ZU3). Our results indicate that the ARP detection ability of one of our proposed methods is comparable to that of person-fit indices. Moreover, the proposed classification methods enable ARP associated with either spuriously low or spuriously high scores to be distinguished.


Author(s):  
Rashid Almehrizi

Most tests are composed of multiple sections (each section has group of items) such as different item formats, different content category, competencies, different difficulty levels, test dimensions, testlets, and interpretive exercise items. Students could show unexpected and unacceptable responses across these sections. Studying person fit over item level cannot detect aberrant response over test sections. The study proposes a residual-based person fit statistic over test sections with a dichotomous IRT model. The paper demonstrates the new section-level person fit statistic and investigates its distributional properties and power of detecting aberrance in person responses with comparison to Wright's between person fit statistic. The proposed section-level person fit statistic shows superior distributional properties with both true and real ability and item parameters. Moreover, the performance of the proposed person fit statistic is also examined with real data.


2019 ◽  
Vol 44 (3) ◽  
pp. 309-341 ◽  
Author(s):  
Jeffrey M. Patton ◽  
Ying Cheng ◽  
Maxwell Hong ◽  
Qi Diao

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation studies, the iterative procedure leads to nearly perfect power in detecting extremely careless responders and much higher power than the noniterative procedure in detecting moderately careless responders. Meanwhile, the false-positive error rate is close to the nominal level. In addition, item parameter estimation is much improved by iteratively cleansing the calibration sample. The bias in item discrimination and location parameter estimates is substantially reduced. The standard error estimates, which are spuriously small in the presence of careless responses, are corrected by the iterative cleansing procedure. An empirical example is also presented to illustrate the proposed procedure. These results suggest that the proposed procedure is a promising way to improve item parameter estimation for tests of 20 items or longer when data are contaminated by careless responses.


2020 ◽  
Author(s):  
Ode Zulaeha ◽  
Wardani Rahayu ◽  
Yuliatri Sastrawijaya

The purpose of this study is to measure the accuracy of item parameters and abilities by using the Multidimensional Three-Parameter Logistics (M3PL) model. M3PL is a series of tests that measure more than one dimension of ability (θ). Item parameter estimation and the ability to model M3PL are reviewed based on a sample size of 1000 and test lengths of 15, 25, and 40. Parameter estimations are obtained using the Wingen software that is converted to BILOG. The results show that the estimate obtained with a test length of 15 displays a median correlation of 0.787 (high). The study therefore concludes that the level of difficulty of the questions is higher or the questions given to respondents are more difficult, so many respondents guessed the answers. The results of the estimated grain parameters and capabilities indicated that scoring based on sample size greatly affects the stability of the test length. By using the M3PL model, parameters can be measured pseudo-guessing, parameters b and parameters a. MIRT is able to explain interactions between the items on the test and the answers of the participants. The estimated results of the item parameters and the ability parameters of the participants also proved to be accurate and efficient. Keywords: Multidimensional Three-Parameter Logistics (M3PL), distribution parameter, test length


2017 ◽  
Vol 42 (5) ◽  
pp. 343-358
Author(s):  
Yan Xia ◽  
Yi Zheng

Snijders developed a family of person fit indices that asymptotically follow the standard normal distribution, when the ability parameter is estimated. So far, [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text] from this family have been proposed in previous literature. One common property shared by [Formula: see text], U*, and W* (also [Formula: see text] and [Formula: see text] in some specific conditions) is that they employ symmetric weight functions and thus identify spurious scores on both easy and difficult items in the same manner. However, when the purpose is to detect only the spuriously high scores on difficult items, such as cheating, guessing, and having item preknowledge, using symmetric weight functions may jeopardize the detection rates of the target aberrant response patterns. By specifying two types of asymmetric weight functions, this study proposes SHa(λ)* (λ = 1/2 or 1) and SHb(β)* (β = 2 or 3) based on Snijders’s framework to specifically detect spuriously high scores on difficult items. Two simulation studies were carried out to investigate the Type I error rates and empirical power of SHa(λ)* and SHb(β)*, compared with [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text]. The empirical results demonstrated satisfactory performance of the proposed indices. Recommendations were also made on the choice of different person fit indices based on specific purposes.


2017 ◽  
Vol 32 (3) ◽  
pp. 128-147
Author(s):  
Farida Agus Setiawati ◽  
Yulia Ayriza ◽  
Endah Retnowati ◽  
Rizki Nor Amelia

This study aims to identify: patterns of responses, the item parameters, and the possibility of gender bias in the career interest instrument developed by the authors based on the Holland’s theory. The sample of this study was 576 elementary students in Daerah Istimewa Yogyakarta who were recruited using the cluster random sampling method. Two parameters were employed to analyze the response patterns using BILOG program. The results were: (1) three items have inappropriate response patterns to the model; (2) all items of the career interest instrument showed good item parameter criteria; and (3) ten items were identified containing Differential Item Functioning (DIF) in relation to gender bias as shown by the Item Characteristic Curve (ICC). The implications of this study are this instrument can be used in assesing career interest of students and the information of biased items may be considered in the selection of careers for male and female students, including in scoring and interpretation.


Sign in / Sign up

Export Citation Format

Share Document