scholarly journals Obtaining Classical Reliability Terms from Item Response Theory in Multiple Choice Tests

Author(s):  
Halil YURDUGÜL
2018 ◽  
Vol 22 (2) ◽  
pp. 219-230 ◽  
Author(s):  
Khoirul Bashooir ◽  
Supahar Supahar

Penelitian ini merupakan bagian dari penelitian pengembangan asesmen kinerja literasi sains berbasis STEM pada pembelajaran fisika. Tujuan dari penelitian ini adalah untuk mengungkapkan validitas isi, validitas empiris, dan reliabilitas instrumen asesmen kinerja literasi sains berbasis STEM yang sebelumnya telah disusun. Instrumen yang dikembangkan berupa lembar pengamatan dan tes pilihan ganda. Analisis validitas isi dari lembar pengamatan menggunakan Koefisien V oleh Aiken sedangkan validitas isi instrumen tes dianalisis dengan menggunakan CVI (Content Validity Index) oleh Lawshe. Validitas empiris reliabilitas instrumen tes diestimasi dengan IRT (Item Response Theory). Reliabilitas lembar pengamatan ditentukan dengan ICC (Item Correlation Coefficient). Hasil dari penelitian ini menunjukan bahwa (1) Lembar pengamatan berupa rubrik penskoran dan penilaian diri  terbuktivalid dengan koefisien  V Aiken 0,75 dan reliabel dengan koefisien Reliabilitas Alfa > 0,8 dan ICC yang Excellent. (2) Instrumen tes terbukti realiabel untuk digunakan pada peserta didik dengan kategori sedang sampai dengan tinggi (-0,7 sampai dengan 6,7 ) dengan CVI=1 dan INFIT MNSQ sesuai model Rasch. Berdasarkan hasil penelitian tersebut maka asesmen kinerja Literasi Sains berbasis STEMlayak digunakan.Kata kunci:validitas isi, validitas empiris, asesmen kinerja, literasi sains, STEM VALIDITY AND RELIABILITY INSTRUMENT OF SCIENTIFIC LITERACY PERFORMANCE ASSESSMENT IN PHYSICS TEACHING BASED ON STEMAbstractThis research is part of the development of scientific literacy performance assessment based on STEM in teaching physics. The aim of this research is to reveal the validity (content and also empiric) and reliability of scientific literacy performance assessment instrument based on STEM. The kind of instruments were developed are observational sheet and multiple choice test. The content validity of observational sheet was revealed by used the Aiken’s V Coefficient. The content validity of multiple choice tests was revealed by used Content Validity Index (CVI) which proposed by Lawshe. The empirical validity and reliability of multiple choice tests was revealed by used Item Response Theory Analysis. The reliability of observational sheet was revealed by used ICC (Item Correlation Coefficient) Analysis. The results of this study are the validity from the contents and empirical trials from the developed instruments. The observation sheet from scoring rubric and self-assessment has been valid with Aiken’s V value that exceeds the standard of 0,75. The reliability of the scoring rubric has Alfa Reliability> 0.8 and Excellent of ICC. Validity values from The written test is shown with CVI of 1 and the MNSQ INFIT value which match to the Rasch model. Based on the TIC and SEM graphs, the written test is stated to be reliable for use in students with moderate to high categories (-0.7 to 6.7). STEM-based Science Literacy performance assessment with caloric material is appropriate to use.Keywords: content validity, empirical validity, performance assessment, scientific literacy, STEM


Author(s):  
Andre F. De Champlain ◽  
Andre-Philippe Boulais ◽  
Andrew Dallas

Purpose: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory. Methods: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. Results: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Conclusion: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.


Sign in / Sign up

Export Citation Format

Share Document