EFISIENSI DAN AKURASI COMPUTERIZED ADAPTIVE TESTING PADA SISTEM UJIAN AKHIR SEMESTER UNIVERSITAS TERBUKA

2010 ◽  
Vol 11 (1) ◽  
pp. 1-9
Author(s):  
Agus Santoso

Universitas Terbuka (UT) applied online examination system (sistem ujian online SUO) for end of semester examination (ujian akhir semester-UAS), beside the paper and pencil test (P & P test). In order to improve efficiency, adaptive test application should be analyzed, as an alternative to present UAS system. The aim of the research was to compare the efficiency and accuracy level of the computerized adaptive testing (CAT) design and conventional test using both P & P test and SUO. The research was conducted by simulation procedure. The item bank for the simulation used calibrated 404 test items using item response theory model. In the research, CAT and P & P test algorithm was developed. To measure efficiency, the required number of the CAT design was analyzed, while to measure accuracy of the estimation, the bias and standard error of measurement of both design were compared. The simulation result showed that (1) CAT design was more efficient, since it required only half of the number of item which was used in P & P test, to estimate the ability of examinee, (2) CAT design was more accurate in estimating ability of examinee, compared to P & P test design, since it resulted lower bias and standard error of measurement compared to conventional test design. Therefore, CAT design could be applied in UTs UAS system, while considering the balance of content for each modules.

2021 ◽  
Vol 10 (6) ◽  
pp. 22
Author(s):  
Habis Saad Al-zboon ◽  
Amjad Farhan Alrekebat ◽  
Mahmoud Sulaiman Bani Abdelrahman

This study aims at identifying the effect of multiple-choice test items' difficulty degree on the reliability coefficient and the standard error of measurement depending on the item response theory IRT. To achieve the objectives of the study, (WinGen3) software was used to generate the IRT parameters (difficulty, discrimination, guessing) for four forms of the test. Each form consisted of (30) items with different difficulty coefficients averages (-0.24, 0.24, 0.42, 0.93). The resulting items parameters were utilized to generate the ability and responses of (3000) examinees based on the three-parameter model. These data were converted into a readable file using the (SPSS) and the (BILOG-MG3) software. Then the reliability coefficients for the four test forms, the items parameters, and the items information function were calculated, and dependence on the information function values to calculate the standard error of measurement for each item.The results of the study showed that there are statistically significant differences at the level of significance (α ≤ 0.05) between the averages of the values of the standard error of measurement attributed to the difference in the difficulty degree of the items in favor of the test with the higher difficulty coefficient. The results also found that there are apparent differences between the test reliability parameters attributed to the difficulty degree of the test according to the three-parameter model in favor of the form with the average difficulty degree.


1966 ◽  
Vol 19 (2) ◽  
pp. 611-617 ◽  
Author(s):  
Donald W. Zimmerman ◽  
Richard H. Williams

It is shown that for the case of non-independence of true scores and error scores interpretation of the standard error of measurement is modified in two ways. First, the standard deviation of the distribution of error scores is given by a modified equation. Second, the confidence interval for true score varies with the individual's observed score. It is shown that the equation, so=√[(N−O/a]+[so2(roō−roo)/roō]̄, where N is the number of items, O is the individual's observed score, a is the number of choices per item, so2 is observed variance, roo is test reliability as empirically determined, and roō is reliability for the case where only non-independent error is present, provides a more accurate interpretation of the test score of an individual.


1993 ◽  
Vol 2 (2) ◽  
pp. 97-103 ◽  
Author(s):  
Kelly R. Holcomb ◽  
Cheryl A. Skaggs ◽  
Teddy W. Worrell ◽  
Mark DeCarlo ◽  
K. Donald Shelbourne

A paucity of information exists concerning reliability of the KT-1000 knee arthrometer (MEDmetric Corp., San Diego, CA) when used by different clinicians to assess the same anterior cruciate ligament-deficient patient. The purpose of this study was to determine the reliability and standard error of measurement of four clinicians who routinely report KT-1000 arthrometer values to referring orthopedic surgeons. Two physical therapists and two athletic trainers performed anterior laxity tests using the KT-1000 on 19 subjects. Intraclass correlation coefficients (ICC) and standard error of measurement (SEM) were used to determine reliability. Intratester ICC ranged from .98 to 1.0 and intratesterSEMranged from 0.0 to .28 mm. Intertester ICC andSEMfor all four testers were .53 and 1.2 mm, respectively. A 95% confidence interval (M ± 1.96 ×SEM) of the intertester variability ranged from −0.18 to 4.52 mm. Therefore, large intertester variation existed in KT-1000 values. Each facility should standardize testing procedures and establish intratester and intertester reliability for all clinicians reporting KT-1000 values.


Author(s):  
Víctor Rodríguez-Rielves ◽  
Alejandro Martínez-Cava ◽  
Ángel Buendía-Romero ◽  
José Ramón Lillo-Beviá ◽  
Javier Courel-Ibáñez ◽  
...  

Purpose: To examine the reproducibility (intradevice and interdevice agreement) of the Rotor 2INpower device under a wide range of cycling conditions. Methods: Twelve highly trained male cyclists and triathletes completed 5 cycling tests, including graded exercise tests at different cadences (70–100 rpm), workloads (100–650 W), pedaling positions (seated and standing), and vibration conditions (20–40 Hz) and an 8-second maximal sprint (>1000 W). An intradevice analysis included a comparison between the power output registered by 3 units of Rotor 2INpower, whereas the power output provided by each one of these units and the gold-standard SRM crankset were compared for the interdevice analysis. Among others, statistical calculations included the standard error of measurement, expressed in absolute (in watts) and relative terms as the coefficient of variation (CV). Results: Except for the graded exercise test seated at 100 rpm/100 W (CV = 10.2%), the intradevice analysis showed an acceptable magnitude of error (CV ≤ 6.9%, standard error of measurement ≤ 12.3 W) between the 3 Rotor 2INpower. Similarly, these 3 units showed an acceptable agreement with the gold standard in all graded exercise test situations (CV ≤ 4.0%, standard error of measurement ≤ 13.1 W). On the other hand, both the intradevice and interdevice agreements proved to be slightly reduced under high cadences (intradevice: CV ≤ 10.2%; interdevice: CV ≤ 4.0%) and vibration (intradevice: CV ≤ 4.0%; interdevice: CV ≤ 3.6%), as well as during standing pedaling (intradevice: CV ≤ 4.1%; interdevice: CV ≤ 2.5%). Although within the limits of an acceptable agreement, measurement errors increased during the sprint tests (CV ≤ 7.4%). Conclusions: Based on these results, the Rotor 2INpower could be considered a reproducible tool to monitor power output in most cycling situations.


Author(s):  
Andrea Berger

Inhibition of Return (IOR) is a mechanism whereby the attentional system favors novel locations by inhibiting already scanned ones. In spatial attention tasks, it commonly occurs when the interval between cue onset and target onset is longer than 300 ms. The positive difference between reactions in the valid condition and those in the invalid one shows that responses to target stimuli are slower following a valid cue than responses to target stimuli following an invalid cue. IOR is a very robust phenomenon at the group mean level; however, this study demonstrates that its standard error of measurement is extremely high, which seriously challenges any attempt to interpret an individual score as representing the characteristics of a subject's attention system. Furthermore, this reliability problem might diminish the likelihood of finding differences between groups and conditions. The study shows that these problems may be partially corrected by employing the back-to-center paradigm.


Hand Therapy ◽  
2020 ◽  
Vol 25 (2) ◽  
pp. 56-62 ◽  
Author(s):  
Erfan Shafiee ◽  
Maryam Farzad ◽  
Joy Macdermid ◽  
Amirreza Smaeel Beygi ◽  
Atefeh Vafaei ◽  
...  

Introduction The Patient-Rated Tennis Elbow Evaluation (PRTEE) questionnaire is a tool designed for self-assessment of forearm pain and disability in patients with tennis elbow. The aims of this study were to translate and cross-culturally adapt the PRTEE questionnaire into Persian and evaluate its reliability and construct validity. Methods The PRTEE questionnaire was translated into and cross-culturally adapted to Persian in 90 consecutive patients with tennis elbow, according to well-established guidelines. Reliability was tested by means of test–retest and internal consistency. The measurement error was measured by calculating the standard error of measurement. Based on the standard error of measurement, the minimum detectable change was calculated. To evaluate construct and convergent validity, correlation with the PRTEE with the Disabilities of the Arm, Shoulder and Hand questionnaire and Visual analogue scale was used. Results In the process of cross-cultural adaptation, two items (6 and 8) were modified. In item 6, the term “door knob” was changed to “turn a key”, and in the item 8, “cup of coffee” was changed to “cup of milk”. Item-total correlations were greater than 0.55 (ranged from 0.55 to 0.76), internal consistency was high (Cronbach’s alpha, 0.94) and a high intraclass correlation coefficient (0.98) indicated excellent reliability of the P-PRTEE. The standard error of measurement and minimum detectable change were 5.40 and 14.24, respectively. The Persian version of the PRTEE questionnaire (P-PRTEE) shows strong construct and convergent validity ( r values = 0.85, p < 0.05). Conclusions The P-PRTEE is valid and reliable in assessing disability and pain in Persian patients with tennis elbow. The excellent psychometric properties of the P-PRTEE endorse the use of this questionnaire in clinical settings.


Sign in / Sign up

Export Citation Format

Share Document