scholarly journals PEMILIHAN BUTIR SOAL PADA RANCANGAN TES ADAPTIF BERDASARKAN EFFICIENCY BALANCED INFORMATION

2014 ◽  
Vol 15 (1) ◽  
pp. 31-41
Author(s):  
Agus Santoso

One of the most popular item selection methods in the design of adaptive testing is Maximum Information Method. This method provides the items with the maximum information at a certain level of selecting ability. The lack of this method is rarely accurate in estimating the level of ability of the examinees at the beginning of the test and tends to select items with higher discrimination-parameter value than items with lower discrimination-parameter. It creates problem in maintaining item bank. Therefore, another method should be found. The objective of the study is to determine the performance of the application of the estimation method Balanced Efficiency Information (EBI) on the design of an adaptive test. This research was carried out through a simulation study in the setting of organizing the Open University final exams. Item bank for the purposes of simulation models based on 3 parameters Item Response Theory was contructed a total of 900 items were generated base on the ideal parameter of the item specifications. Two selection criteria items were simulated, namely Information Maximum and EBI Maximum. Those two criteria were also designed to meet the content balancing. This is to ensure that the algorithm is appropriate with the applicable UT modular learning, meaning items of each module were proportionally represented and suited the blueprint. The tests will be stopped at when the standard error of estimate ( standard error of estimation = SEE ) is 0.3.The study summarized that the algorithm of EBI was more accurate than the Maximum Information criteria in estimating performance capabilities of participants. This is indicated by the value of the bias and the standard deviation of EBI is smaller than Maximum Information criterias. Another advantage of the application of the EBI Maximum is optimally utilizing of the item bank. The items with low level of the discrimination-parameter will also be chosen at the begining of the test. The maximum information criterion is more efficient in terms of test length but less optimally of the item bank utilization. Salah satu metode pemilihan butir soal yang popular digunakan dalam rancangan tes adaptif adalah metode Informasi Maksimum. Melalui metode ini, butir soal yang memiliki informasi maksimum pada tingkat kemampuan tertentu akan dipilih dan diberikan kepada peserta tes. Namun kelemahan dari metode ini adalah kurang akurat dalam mengestimasi tingkat kemampuan peserta pada awal tes dan memiliki kecenderungan untuk memilih butir dengan nilai daya pembeda parameter butir yang tinggi dibandingkan butir dengan nilai parameter daya pembeda yang rendah, sehingga menimbulkan masalah pemeliharaan butir soal dalam bank soal. Karena itu dicari cara lain untuk mengatasi masalah tersebut. Penelitian ini bertujuan untuk mengetahui performa hasil estimasi dari penerapan metode Efficiency Balanced Information (EBI) pada rancangan tes adaptif. Penelitian ini dilakukan melalui studi simulasi dalam setting penyelenggaraan ujian akhir semester Universitas Terbuka. Bank soal untuk keperluan simulasi dibangkitkan berdasarkan model Item Response Theory 3 parameter. Sebanyak 900 butir soal dalam bank soal bangkitan dengan spesifikasi parameter butir yang ideal. Dua kriteria pemilihan butir soal yang disimulasikan yaitu Informasi Maksimum dan EBI Maksimum yang juga dirancang agar memenuhi keseimbangan isi. Hal ini agar menjamin bahwa algoritma yang dihasilkan sesuai dengan pembelajaran moduler yang diterapkan UT, artinya butir soal setiap modul secara proporsional terwakili dan sesuai kisi-kisi. Aturan pemberhentian tes menggunakan kesalahan baku estimasi (standard error of estimation=SEE) sebesar 0,3. Hasil penelitian menyimpulkan bahwa algoritma rancangan tes adaptif dengan kriteria EBI menghasilkan performa hasil estimasi kemampuan peserta yang lebih akurat dibandingkan kriteria Informasi Maksimum. Hal ini ditunjukkan oleh nilai bias dan simpangan baku pengukuran yang lebih kecil dibandingkan kriteria Informasi Maksimum. Kelebihan lain dari penerapan kriteria EBI Maksimum adalah kebermanfaatan bank soal lebih optimal karena butir-butir soal dengan tingkat daya beda rendah juga dimunculkan khususnya pada awal tes. Sedangkan kriteria Informasi Maksimum walaupun lebih efisien dari sisi panjang tes tetapi kurang optimal dalam memanfaatkan bank soal.

2011 ◽  
Vol 35 (8) ◽  
pp. 604-622 ◽  
Author(s):  
Hirotaka Fukuhara ◽  
Akihito Kamata

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.


2017 ◽  
Vol 8 (1) ◽  
pp. 14
Author(s):  
Rana Th. Momani

Item Response Theory becomes one of the most popular methods for instruments development and evaluation methods. This baseline study is a self-directed learning readiness (SDLR) 40 item scale with data from 648 undergraduate psychology female students attending Qassim University in Saudi Arabia through randomized selection to evaluate an SDLR scale at item and scale levels using GRM. Results provide more detailed diagnostic information to modulate the scale. GRM analysis led to the detection of two locally dependent items, one item with low discrimination parameter and 15 model misfit items. The scale often tends to measure low and moderate levels of SDLR. Advanced psychometric evaluations should be made and the SDLR scale must be reviewed based on quantitative and qualitative analysis.


2016 ◽  
Vol 59 (2) ◽  
pp. 373-383 ◽  
Author(s):  
J. Mirjam Boeschen Hospers ◽  
Niels Smits ◽  
Cas Smits ◽  
Mariska Stam ◽  
Caroline B. Terwee ◽  
...  

Purpose We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18–70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study “Netherlands Longitudinal Study on Hearing.” A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. Results The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. Conclusions This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.


2010 ◽  
Vol 11 (11) ◽  
pp. 1109-1119 ◽  
Author(s):  
James W. Varni ◽  
Brian D. Stucky ◽  
David Thissen ◽  
Esi Morgan DeWitt ◽  
Debra E. Irwin ◽  
...  

2021 ◽  
Vol 21 (2) ◽  
pp. 133-140
Author(s):  
Mohamad Masykurin Mafauzy ◽  
Tuan Hairulnizam Tuan Kamauzaman ◽  
Wan Nor Arifin ◽  
Hadi Fadhil Mat Said ◽  
Fatimah Ismail ◽  
...  

Flood disaster is the commonest natural disaster with huge impact on healthcare services in Malaysia. The FloodDMQ-BM© questionnaire was developed as a tool to assess the knowledge, attitude, and practice of healthcare providers regarding patient management during a flood disaster. We aim to further validate the FloodDMQ-BM© questionnaire by using Confirmatory Factor Analysis (CFA) and Item Response Theory (IRT).This cross-sectional study involved doctors, nurses and paramedics working in the Emergency Department of Hospital Universiti Sains Malaysia, Hospital Raja Perempuan Zainab II and Hospital Kuala Krai. Respondents were required to complete the FloodDMQ-BM© questionnaire. The responses were analysed by using CFA and IRT to establish its validity and reliability. A total of 215 respondents participated in this study. CFA analysis with Maximum Likelihood Robust as the estimation method, on the attitude and practice components resulted in good factor loadings (>0.5) in nearly all items and excellent model fit indices values (CFI = 0.96-0.98, TLI = 0.95-0.96, SRMR = 0.04-0.05, RMSEA = 0.07). Meanwhile, IRT analysis on the knowledge section showed a good two-way marginal fit based on S-X2, and a good model fit with RMSEA of 0.08. Based on the 2PL model by using the IRT assessment of the knowledge section, one item in the knowledge section (K3) was removed (chi-squared residual >4) resulting in improved model fit. The included items had well-standardized loadings (>0.3) and marginal reliability of 0. 651.Our results confirmed that the FloodDMQ-BM© questionnaire displayed valid and reliable psychometric properties.


2018 ◽  
Vol 22 (2) ◽  
pp. 130-142
Author(s):  
Thomas Mbenu Nulangi ◽  
Djemari Mardapi

This study aimed to describe (1) the characteristics of items based on the Item Response Theory, (2) the cheating level in the implementation of the national examinartion based on Angoffs B-Index method, Pair 1 method, Pair 2 method, Modified Error Similarity Analysis (MESA) method, and G2 method, (3) the most accurate method to detect the cheating in the mathematics national examination at the senior secondary school level in the academic year of 2015/2016 in East Nusa Tenggara Province. The result of the item response theory analysis showed that 17 (42.5%) items of the mathematics national examination fit with the 3-PL model, with the maximum information function of 58.0128 at 1.6, and the measurement error of 0.1313. The number of pairs detected to be cheating by Angoff’s B-Index method was 63 pairs, that by the Pair 1 method was 52 pairs, that by the Pair 2 method was 141 pairs, that by MESA method was 67 pairs, and that by the G2 method was 183 pairs. The methods which could detect most pairs doing cheating were the G2 method, the Pair 2 method, the MESA method, Angoff’s B-Index method, and the Pair 1 method successively. The methods which could accurately detect cheating based on the computation of the standard error were Angoff’s B-Index method, the G2 method, the MESA method, the Pair 1 method, and the Pair 2 method successively.


Sign in / Sign up

Export Citation Format

Share Document