The Effect of Person Misfit on Item Parameter Estimation and Classification Accuracy: A Simulation Study

Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several person fit indices were developed to detect aberrant responding on educational and psychological tests. The majority of the person fit literature has been focused on creating and evaluating new indices. The aim of this study was to assess the effect of aberrant responding on the accuracy of estimated item parameters and refining estimations by using person fit statistics by means of simulation. Our results showed that the presence of aberrant response patterns created bias in the both b and a parameters at the item level and affected the classification of students, particularly high-performing students, into performance categories regardless of whether aberrant response patterns were present in the data or were removed. The results differed by test length and the percentage of students with aberrant response patterns. Practical and theoretical implications are discussed.

Download Full-text

Identifying and Classifying Aberrant Response Patterns Through Functional Data Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620911941 ◽

2020 ◽

Vol 45 (6) ◽

pp. 719-749

Author(s):

Eduardo Doval ◽

Pedro Delicado

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Simulated Data ◽

Real Data ◽

Response Patterns ◽

Fit Indices ◽

Person Fit ◽

Data Set ◽

Aberrant Response Patterns

We propose new methods for identifying and classifying aberrant response patterns (ARPs) by means of functional data analysis. These methods take the person response function (PRF) of an individual and compare it with the pattern that would correspond to a generic individual of the same ability according to the item-person response surface. ARPs correspond to atypical difference functions. The ARP classification is done with functional data clustering applied to the PRFs identified as ARP. We apply these methods to two sets of simulated data (the first is used to illustrate the ARP identification methods and the second demonstrates classification of the response patterns flagged as ARP) and a real data set (a Grade 12 science assessment test, SAT, with 32 items answered by 600 examinees). For comparative purposes, ARPs are also identified with three nonparametric person-fit indices (Ht, Modified Caution Index, and ZU3). Our results indicate that the ARP detection ability of one of our proposed methods is comparable to that of person-fit indices. Moreover, the proposed classification methods enable ARP associated with either spuriously low or spuriously high scores to be distinguished.

Download Full-text

Residual-Based Person Fit Statistics over Test Sections

Journal of Educational and Psychological Studies [JEPS] ◽

10.24200/jeps.vol13iss4pp687-702 ◽

2019 ◽

Vol 13 (4) ◽

pp. 687

Author(s):

Rashid Almehrizi

Keyword(s):

Real Data ◽

Person Fit ◽

Content Category ◽

Irt Model ◽

Fit Statistics ◽

Distributional Properties ◽

Item Parameters ◽

Item Level ◽

Different Content ◽

Section Level

Most tests are composed of multiple sections (each section has group of items) such as different item formats, different content category, competencies, different difficulty levels, test dimensions, testlets, and interpretive exercise items. Students could show unexpected and unacceptable responses across these sections. Studying person fit over item level cannot detect aberrant response over test sections. The study proposes a residual-based person fit statistic over test sections with a dichotomous IRT model. The paper demonstrates the new section-level person fit statistic and investigates its distributional properties and power of detecting aberrance in person responses with comparison to Wright's between person fit statistic. The proposed section-level person fit statistic shows superior distributional properties with both true and real ability and item parameters. Moreover, the performance of the proposed person fit statistic is also examined with real data.

Download Full-text

Detection and Treatment of Careless Responses to Improve Item Parameter Estimation

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998618825116 ◽

2019 ◽

Vol 44 (3) ◽

pp. 309-341 ◽

Cited By ~ 6

Author(s):

Jeffrey M. Patton ◽

Ying Cheng ◽

Maxwell Hong ◽

Qi Diao

Keyword(s):

Parameter Estimation ◽

Location Parameter ◽

Item Parameter ◽

Parameter Estimates ◽

False Positive Error ◽

Fit Statistics ◽

Calibration Sample ◽

Higher Power ◽

Item Parameter Estimation ◽

False Positive Error Rate

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation studies, the iterative procedure leads to nearly perfect power in detecting extremely careless responders and much higher power than the noniterative procedure in detecting moderately careless responders. Meanwhile, the false-positive error rate is close to the nominal level. In addition, item parameter estimation is much improved by iteratively cleansing the calibration sample. The bias in item discrimination and location parameter estimates is substantially reduced. The standard error estimates, which are spuriously small in the presence of careless responses, are corrected by the iterative cleansing procedure. An empirical example is also presented to illustrate the proposed procedure. These results suggest that the proposed procedure is a promising way to improve item parameter estimation for tests of 20 items or longer when data are contaminated by careless responses.

Download Full-text

The Power of Rasch Person-Fit Statistics in Detecting Unusual Response Patterns

Applied Psychological Measurement ◽

10.1177/01466216970213002 ◽

1997 ◽

Vol 21 (3) ◽

pp. 215-231 ◽

Cited By ~ 44

Author(s):

Mao-neng Fred Li ◽

Stephen Olejnik

Keyword(s):

Response Patterns ◽

Person Fit ◽

Fit Statistics

Download Full-text

The Estimates Item Parameter for Multidimensional Three-Parameter Logistics

KnE Social Sciences ◽

10.18502/kss.v4i14.7889 ◽

2020 ◽

Author(s):

Ode Zulaeha ◽

Wardani Rahayu ◽

Yuliatri Sastrawijaya

Keyword(s):

Sample Size ◽

Item Parameter ◽

Model Parameters ◽

Test Length ◽

Parameter Estimations ◽

Item Parameters ◽

Median Correlation ◽

Item Parameter Estimation ◽

Parameter Test ◽

The Stability

The purpose of this study is to measure the accuracy of item parameters and abilities by using the Multidimensional Three-Parameter Logistics (M3PL) model. M3PL is a series of tests that measure more than one dimension of ability (θ). Item parameter estimation and the ability to model M3PL are reviewed based on a sample size of 1000 and test lengths of 15, 25, and 40. Parameter estimations are obtained using the Wingen software that is converted to BILOG. The results show that the estimate obtained with a test length of 15 displays a median correlation of 0.787 (high). The study therefore concludes that the level of difficulty of the questions is higher or the questions given to respondents are more difficult, so many respondents guessed the answers. The results of the estimated grain parameters and capabilities indicated that scoring based on sample size greatly affects the stability of the test length. By using the M3PL model, parameters can be measured pseudo-guessing, parameters b and parameters a. MIRT is able to explain interactions between the items on the test and the answers of the participants. The estimated results of the item parameters and the ability parameters of the participants also proved to be accurate and efficient. Keywords: Multidimensional Three-Parameter Logistics (M3PL), distribution parameter, test length

Download Full-text

A literature review of detecting aberrant response patterns by using IRT -based fit indices

Journal of Curriculum and Evaluation ◽

10.29221/jce.2002.5.1.99 ◽

2002 ◽

Vol 5 (1) ◽

pp. 99-117

Author(s):

Hyunsoo Seol

Keyword(s):

Literature Review ◽

Response Patterns ◽

Fit Indices ◽

Aberrant Response Patterns

Download Full-text

Employing Person-Fit Statistics in the Identification of Aberrant Item-Response Patterns

PsycEXTRA Dataset ◽

10.1037/e511092007-001 ◽

2006 ◽

Author(s):

Troy Courville ◽

Eric Rolfhus

Keyword(s):

Item Response ◽

Response Patterns ◽

Person Fit ◽

Fit Statistics

Download Full-text

A Practical Guide to Check the Consistency of Item Response Patterns in Clinical Research Through Person-Fit Statistics

Assessment ◽

10.1177/1073191115577800 ◽

2015 ◽

Vol 23 (1) ◽

pp. 52-62 ◽

Cited By ~ 25

Author(s):

Rob R. Meijer ◽

A. Susan M. Niessen ◽

Jorge N. Tendeiro

Keyword(s):

Clinical Research ◽

Item Response ◽

Response Patterns ◽

Person Fit ◽

Practical Guide ◽

Fit Statistics

Download Full-text

Asymptotically Normally Distributed Person Fit Indices for Detecting Spuriously High Scores on Difficult Items

Applied Psychological Measurement ◽

10.1177/0146621617730391 ◽

2017 ◽

Vol 42 (5) ◽

pp. 343-358

Author(s):

Yan Xia ◽

Yi Zheng

Keyword(s):

Common Property ◽

Type I Error ◽

Error Rates ◽

Weight Functions ◽

Standard Normal Distribution ◽

Type I ◽

Response Patterns ◽

Fit Indices ◽

Person Fit ◽

Detection Rates

Snijders developed a family of person fit indices that asymptotically follow the standard normal distribution, when the ability parameter is estimated. So far, [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text] from this family have been proposed in previous literature. One common property shared by [Formula: see text], U*, and W* (also [Formula: see text] and [Formula: see text] in some specific conditions) is that they employ symmetric weight functions and thus identify spurious scores on both easy and difficult items in the same manner. However, when the purpose is to detect only the spuriously high scores on difficult items, such as cheating, guessing, and having item preknowledge, using symmetric weight functions may jeopardize the detection rates of the target aberrant response patterns. By specifying two types of asymmetric weight functions, this study proposes SHa(λ)* (λ = 1/2 or 1) and SHb(β)* (β = 2 or 3) based on Snijders’s framework to specifically detect spuriously high scores on difficult items. Two simulation studies were carried out to investigate the Type I error rates and empirical power of SHa(λ)* and SHb(β)*, compared with [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text]. The empirical results demonstrated satisfactory performance of the proposed indices. Recommendations were also made on the choice of different person fit indices based on specific purposes.

Download Full-text

The Response Patterns of the Career Interest Instrument Based on Holland’s Theory

ANIMA Indonesian Psychological Journal ◽

10.24123/aipj.v32i3.628 ◽

2017 ◽

Vol 32 (3) ◽

pp. 128-147

Author(s):

Farida Agus Setiawati ◽

Yulia Ayriza ◽

Endah Retnowati ◽

Rizki Nor Amelia

Keyword(s):

Gender Bias ◽

Characteristic Curve ◽

Item Parameter ◽

Response Patterns ◽

Career Interest ◽

Holland's Theory ◽

Item Functioning ◽

Item Parameters ◽

Two Parameters ◽

Selection Of

This study aims to identify: patterns of responses, the item parameters, and the possibility of gender bias in the career interest instrument developed by the authors based on the Holland’s theory. The sample of this study was 576 elementary students in Daerah Istimewa Yogyakarta who were recruited using the cluster random sampling method. Two parameters were employed to analyze the response patterns using BILOG program. The results were: (1) three items have inappropriate response patterns to the model; (2) all items of the career interest instrument showed good item parameter criteria; and (3) ten items were identified containing Differential Item Functioning (DIF) in relation to gender bias as shown by the Item Characteristic Curve (ICC). The implications of this study are this instrument can be used in assesing career interest of students and the information of biased items may be considered in the selection of careers for male and female students, including in scoring and interpretation.

Download Full-text