Quantitatively ranking incorrect responses to multiple-choice questions using item response theory

Abstract Inattentive respondents introduce noise into data sets, weakening correlations between items and increasing the likelihood of null findings. “Screeners” have been proposed as a way to identify inattentive respondents, but questions remain regarding their implementation. First, what is the optimal number of Screeners for identifying inattentive respondents? Second, what types of Screener questions best capture inattention? In this paper, we address both of these questions. Using item-response theory to aggregate individual Screeners we find that four Screeners are sufficient to identify inattentive respondents. Moreover, two grid and two multiple choice questions work well. Our findings have relevance for applied survey research in political science and other disciplines. Most importantly, our recommendations enable the standardization of Screeners on future surveys.

Download Full-text

QUALITY AND FEATURE OF MULTIPLE-CHOICE QUESTIONS IN EDUCATION

Problems of Education in the 21st Century ◽

10.33225/pec/20.78.576 ◽

2020 ◽

Vol 78 (4) ◽

pp. 576-594

Author(s):

Bing Jia ◽

Dan He ◽

Zhemin Zhu

Keyword(s):

Higher Education ◽

Item Response Theory ◽

Item Response ◽

Logistic Model ◽

Item Difficulty ◽

Multiple Choice ◽

Choice Test ◽

Multiple Choice Questions ◽

Response Theory

The quality of multiple-choice questions (MCQs) as well as the student's solve behavior in MCQs are educational concerns. MCQs cover wide educational content and can be immediately and accurately scored. However, many studies have found some flawed items in this exam type, thereby possibly resulting in misleading insights into students’ performance and affecting important decisions. This research sought to determine the characteristics of MCQs and factors that may affect the quality of MCQs by using item response theory (IRT) to evaluate data. For this, four samples of different sizes from US and China in secondary and higher education were chosen. Item difficulty and discrimination were determined using item response theory statistical item analysis models. Results were as follows. First, only a few guessing behaviors are included in MCQ exams because all data fit the two-parameter logistic model better than the three-parameter logistic model. Second, the quality of MCQs depended more on the degree of training of examiners and less on middle or higher education levels. Lastly, MCQs must be evaluated to ensure that high-quality items can be used as bases of inference in middle and higher education. Keywords: higher education, item evaluation, item response theory, multiple-choice test, secondary education

Download Full-text

Obtaining Classical Reliability Terms from Item Response Theory in Multiple Choice Tests

Ankara Universitesi Egitim Bilimleri Fakultesi Dergisi ◽

10.1501/egifak_0000000138 ◽

2006 ◽

pp. 001-018

Author(s):

Halil YURDUGÜL

Keyword(s):

Item Response Theory ◽

Item Response ◽

Multiple Choice ◽

Response Theory ◽

Multiple Choice Tests ◽

Choice Tests

Download Full-text

Item Response Theory Applied to Combinations of Multiple-Choice and Constructed-Response Items—Approximation Methods for Scale Scores

Test Scoring ◽

10.4324/9781410604729-15 ◽

2001 ◽

pp. 305-354

Keyword(s):

Item Response Theory ◽

Item Response ◽

Multiple Choice ◽

Approximation Methods ◽

Response Theory ◽

Constructed Response ◽

Scale Scores

Download Full-text

Comparison of Finite State Score Theory, Classical Test Theory, and Item Response Theory in Scoring Multiple-Choice Items

Educational and Psychological Measurement ◽

10.1177/0013164497057004004 ◽

1997 ◽

Vol 57 (4) ◽

pp. 580-589 ◽

Cited By ~ 5

Author(s):

Joyce L. Ndalichako ◽

W. Todd Rogers

Keyword(s):

Item Response Theory ◽

Item Response ◽

Classical Test Theory ◽

Multiple Choice ◽

Test Theory ◽

Response Theory ◽

Classical Test ◽

Finite State ◽

Multiple Choice Items ◽

State Score

Download Full-text

The Psychometrics Tests Properties of Multiple Choice and Completion Test “A comparison Study by Using Item Response Theory”

Journal of Educational & Psychological Sciences ◽

10.12785/jeps/130313 ◽

2012 ◽

Vol 13 (03) ◽

pp. 375-404

Author(s):

Hamdy Y. Abu Jarad

Keyword(s):

Item Response Theory ◽

Item Response ◽

Multiple Choice ◽

Comparison Study ◽

Response Theory ◽

Completion Test

Download Full-text

Applying the item response theory with two-parameter, three-parameter models in the evaluation of multiple choice tests

Dong Thap University Journal of Science ◽

10.52714/dthu.10.4.2021.878 ◽

2021 ◽

Vol 10 (4) ◽

pp. 17-28

Author(s):

Canh Nguyen Van

Keyword(s):

Item Response Theory ◽

Item Response ◽

Multiple Choice ◽

Response Theory ◽

Multiple Choice Tests ◽

Choice Tests ◽

Two Parameter

Download Full-text

Item Response Theory Applied to Combinations of Multiple-Choice and Constructed-Response Items--Scale Scores for Patterns of Summed Scores

Test Scoring ◽

10.4324/9781410604729-14 ◽

2001 ◽

pp. 265-304

Keyword(s):

Item Response Theory ◽

Item Response ◽

Multiple Choice ◽

Response Theory ◽

Constructed Response ◽

Scale Scores

Download Full-text

Validating Financial Knowledge Scale Using Item Response Theory

Vision The Journal of Business Perspective ◽

10.1177/09722629211001994 ◽

2021 ◽

pp. 097226292110019

Author(s):

Isha Bajaj ◽

Mandeep Kaur

Keyword(s):

Item Response Theory ◽

Item Response ◽

Financial Literacy ◽

Financial Knowledge ◽

Multiple Choice Questions ◽

Response Theory ◽

Wrong Answer ◽

Financial Decisions ◽

Difficulty Index ◽

Knowledge Scale

It is important to have knowledge about financial products and services to make rational financial decisions. Financial knowledge is a wider term and hence difficult to measure. The previous studies have used various methods and instruments to measure it. But there is a need of comprehensive and validated instrument to measure the financial knowledge. In this study, an attempt has been made to measure financial knowledge using a scale consisting of multiple-choice questions on basic and specific financial knowledge related with banking products and services. Each correct answer has been scored ‘1’ and each wrong answer has been scored ‘0’. This dichotomous scale has been validated using Item Response Theory. The theory focuses on the appropriateness of the questions (items) included in the scale with respect to difficulty and discriminability. The results reveal that the overall instrument fulfils both the criteria. The test consisting of twenty-two items is reliable as well as valid with the discrimination index having all positive values ranging from 0.23 to 1.96 and the difficulty index ranging from –5.66 to 0.90. The purpose of this article is to encourage the usage of validated scales for the measurement of financial knowledge and discourage the perplexity in the field of financial knowledge and financial literacy.

Download Full-text

Calibrating the Medical Council of Canada’s Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

Journal of Educational Evaluation for Health Professions ◽

10.3352/jeehp.2016.13.6 ◽

2016 ◽

Vol 13 ◽

pp. 6 ◽

Cited By ~ 4

Author(s):

Andre F. De Champlain ◽

Andre-Philippe Boulais ◽

Andrew Dallas

Keyword(s):

Item Response Theory ◽

Item Response ◽

Clinical Decision Making ◽

Multiple Choice ◽

Clinical Decision ◽

Pass Rate ◽

Type I ◽

Medical Council ◽

Response Theory ◽

Data Set

Purpose: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory. Methods: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. Results: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Conclusion: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.

Download Full-text