scholarly journals Interrater Reliability: Item Analysis to Develop Valid Questions for Case-Study Scenarios

2020 ◽  
Vol 41 (S1) ◽  
pp. s303-s303
Author(s):  
Kelly Holmes ◽  
Mishga Moinuddin ◽  
Sandi Steinfeld

Background: Development of an interrater reliability (IRR) process for healthcare-associated infection surveillance is a valuable learning tool for infection preventionists (IPs) and increases accuracy and consistency in applying National Healthcare Safety Network (NHSN) definitions (1-3). Case studies from numerous resources were distributed to infection preventionists of varying experience levels (4-6). Item analysis, including item difficulty index and item discrimination index, was applied to individual test questions to determine the validity of the case scenarios at measuring individual mastery of the NHSN surveillance definitions (7-8). Methods: Beginning in 2016, a mandatory internal IRR program was developed and distributed to infection preventionists (IPs) of varying experience level. Each year through 2019, a test containing 30–34 case studies was developed with multiple-choice questions. Case studies were analyzed using 2 statistical methods to determine item difficulty and validity of written scenarios. P values for each test question were calculated using the item difficulty index formula, with harder questions resulting in values closer to 0.0. Point biserial correlation was applied to each question to determine highly discriminating questions, measured in a range from −1.0 and 1.0. Results: Between 2016 and 2019, 124 questions were developed and 145 respondents participated in the mandatory IRR program. The overall test difficulty was 0.70 (range, 0.64–0.74). Moreover, 17 questions (14%) were determined to have high “excellent” discrimination, 41 questions (33%) were determined to have “good” discrimination, 57 questions (46%) were determined to have “poor” discrimination, and 9 questions (7%) were found to have negative discrimination values. Conclusions: IRR testing identifies educational opportunities for IPs responsible for the correct application of NHSN surveillance definitions. Valid test scenarios are foundational components of IRR tests. Case scenarios that are determined to have a high discrimination index should be used to develop future test questions to better assess mastery of application of surveillance definitions to clinical cases.Funding: NoneDisclosures: None

2019 ◽  
pp. 1-6
Author(s):  
Bai Koyu ◽  
Rajkumar Josmee Singh ◽  
L. Devarani ◽  
Ram Singh ◽  
L. Hemochandra

The knowledge test was developed to measure the knowledge of large cardamom growers. All 32 items were primarily fabricated on the basis of indorsing rational rather than root memorization and discriminate the sound knowledgeable large cardamom growers from the ailing knowledgeable ones.The scores from selected respondents were subjected to item analysis, consisting of item difficulty index and item discrimination index.In the final selection, the scale consisted of 17 items with ranging from 30-80 and discrimination index ranging from 0.30 to 0.55. The reliability of knowledge test being developed was tested by using Split-Half method and it was found to be 0.704.


Author(s):  
Bai Koyu ◽  
Rajkumar Josmee Singh ◽  
L. Devarani ◽  
Ram Singh ◽  
L. Hemochandra

The knowledge test was developed to measure the knowledge level of kiwi growers. In all 36 items were predominantly fabricated on the basis of indorsing rational rather than root memorization and to discriminate the sound knowledgeable kiwi growers from the ailing knowledgeable ones. The scores obtained from sample respondents were imperilled to item analysis, embracing of item difficulty index & item discrimination index. In the ultimate selection, the scale consisted of 15 items with difficulty index ranging from 30-80 and discrimination index ranging from 0.30 to 0.55. Split-Half method was employed to check the reliability of knowledge test being developed and it was found to be 0.711.


Author(s):  
Leni Amelia Suek

While almost half of the teachers’ activities are assessing their students, they are not well-prepared with assessment literacy training. Hence, they are unable to produce good tests to measure students’ level of knowledge and skills. This study is aimed at analyzing item difficulty and item discrimination of a test made by an English teacher at a junior high school in Kupang. It was descriptive qualitative research and the instruments of the research were test items, answer keys, and students’ answer sheets. For the difficulty index, it was revealed that more than half of the test items were easy, while only 2% of the test items were difficult. In terms of the discrimination index, it was found that only 10% of the test items were excellent and most of the test items (46%) were poor. These findings indicated that the English test had a poor item difficulty index and low item discrimination index. Hence, it did not fulfill the criteria of a good test and could not measure students’ true ability. It is highly recommended for the teachers to improve the test items and for the government to provide assessment training for the teachers so that they can produce good tests.


2022 ◽  
Vol 22 (1) ◽  
pp. 142-145
Author(s):  
Subhransu Mohan Nanda ◽  

In the present study, to test the knowledge level of veterinary students on ICT, one hundred and seventy-one items were initially constructed on the basis of promoting thinking rather than rote memorization. It was designed in a manner that could differentiate the well-informed veterinary students from less informed ones. The scores of the respondents were subjects to item analysis to find the item difficulty index and item discrimination index. In the final selection, a total of 34 items with difficulty index between 30 and 80 and discrimination index ranging from 0.30 to 0.55 were selected. The reliability of the knowledge test developed was tested using split half technique. The coefficient of correlation value in split half test was 0.89, which was found to be significant at 1 per cent level of significance. It was found that, the developed knowledge test scale of Veterinary students on ICT was highly stable and can be used for measurement.


Author(s):  
Bai Koyu ◽  
Rajkumar Josmee Singh ◽  
L. Devarani ◽  
Ram Singh ◽  
L. Hemochandra

The knowledge test was developed to measure the knowledge level of apple growers. 32 items were primarily fabricated on the basis of ratifying rational rather than root memorization and to classify the sound erudite apple growers from the ailing erudite ones. The scores obtained from sample respondents were exposed to item analysis, embracing of item difficulty index & item discrimination index. In the ultimate selection, the scale consisted of 22 items with difficulty index ranging from 30-80 and discrimination index ranging from 0.30 to 0.55. To check the reliability of knowledge test being developed Split-Half method was employed and it was found to be 0.701.


Author(s):  
Anupama Jena ◽  
Mahesh Chander ◽  
Sushil K. Sinha

In the present study, a test was developed to measure the knowledge level of dairy farmers about scientific dairy farming. A preliminary set of 87 knowledge items was initially administered to 60 randomly selected dairy farmers for item analysis. The difficulty index and discrimination index was found out, and the items with difficulty index ranging from 30 to 80 and the discrimination index ranging from 0.30 to 0.55 were included in the final format of the knowledge test. A total of 48 items which fulfilled both the criteria were selected for the final format of knowledge test. Reliability of the test through split half method was found out to be 0.386 and the coefficient of correlation value by the test-retest method was 0.452, which was found to be significant at 1% level of significance. Hence, the knowledge test constructed was highly stable, reliable and validated for measuring what it intends to.


2019 ◽  
Vol 120 (5/6) ◽  
pp. 383-406 ◽  
Author(s):  
Ömer Demir ◽  
Süleyman Sadi Seferoğlu

Purpose The lack of a reliable and valid measurement tool for coding achievement emerges as a major problem in Turkey. Therefore, the purpose of this study is to develop a Scratch-based coding achievement test. Design/methodology/approach Initially, an item pool with 31 items was created. The item pool was classified within the framework of Bayman and Mayer’s (1988) types of coding knowledge to support content validity of the test. Then the item pool was applied to 186 volunteer undergraduates at Hacettepe University during the spring semester of the 2017-2018 academic year. Subsequently, the item analysis was conducted for construct validity of the test. Findings In all, 13 items were discarded from the test, leaving a total of 18 items. Out of the 18-item version of the coding achievement test, 4, 5 and 9 items measured syntactic, conceptual and strategic knowledge, respectively, among the types of coding knowledge. Furthermore, average item discrimination index (0.531), average item difficulty index (0.541) and Cronbach Alpha reliability coefficient (0.801) of the test were calculated. Practical implications Scratch users, especially those who are taking introductory courses at Turkish universities, could benefit from a reliable and valid coding achievement test developed in this study. Originality/value This paper has theoretical and practical value, as it provides detailed developmental stages of a reliable and valid Scratch-based coding achievement test.


Author(s):  
Ajeet Kumar Khilnani ◽  
Rekha Thaddanee ◽  
Gurudas Khilnani

<p class="abstract"><strong>Background:</strong> Multiple choice questions (MCQs) are routinely used for formative and summative assessment in medical education. Item analysis is a process of post validation of MCQ tests, whereby items are analyzed for difficulty index, discrimination index and distractor efficiency, to obtain a range of items of varying difficulty and discrimination indices. This study was done to understand the process of item analysis and analyze MCQ test so that a valid and reliable MCQ bank in otorhinolaryngology is developed.</p><p class="abstract"><strong>Methods:</strong> 158 students of 7<sup>th</sup> Semester were given an 8 item MCQ test. Based on the marks achieved, the high achievers (top 33%, 52 students) and low achievers (bottom 33%, 52 students) were included in the study. The responses were tabulated in Microsoft Excel Sheet and analyzed for difficulty index, discrimination index and distractor efficiency.  </p><p class="abstract"><strong>Results:</strong> The mean (SD) difficulty index (Diff-I) of 8 item test was 61.41% (11.81%). 5 items had a very good difficulty index (41% to 60%), while 3 items were easy (Diff-I &gt;60%). There was no item with Diff-I &lt;30%, i.e. a difficult item, in this test. The mean (SD) discrimination index (DI) of the test was 0.48 (0.15), and all items had very good discrimination indices of more than 0.25. Out of 24 distractors, 6 (25%) were non-functional distractors (NFDs). The mean (SD) distractor efficiency (DE) of the test was 74.62% (23.79%).</p><p class="abstract"><strong>Conclusions:</strong> Item analysis should be an integral and regular activity in each department so that a valid and reliable MCQ question bank is developed.</p>


Author(s):  
Manju K. Nair ◽  
Dawnji S. R.

Background: Carefully constructed, high quality multiple choice questions can serve as effective tools to improve standard of teaching. This item analysis was performed to find the difficulty index, discrimination index and number of non functional distractors in single best response type questions.Methods: 40 single best response type questions with four options, each carrying one mark for the correct response, was taken for item analysis. There was no negative marking. The maximum marks was 40. Based on the scores, the evaluated answer scripts were arranged with the highest score on top and the least score at the bottom. Only the upper third and lower third were included. The response to each item was entered in Microsoft excel 2010. Difficulty index, Discrimination index and number of non functional distractors per item were calculated.Results: 40 multiple choice questions and 120 distractors were analysed in this study. 72.5% items were good with a difficulty index between 30%-70%. 25% items were difficult and 2.5% items were easy. 27.5% items showed excellent discrimination between high scoring and low scoring students. One item had a negative discrimination index (-0.1). There were 9 items with non functional distractors.Conclusions: This study emphasises the need for improving the quality of multiple choice questions. Hence repeated evaluation by item analysis and modification of non functional distractors may be performed to enhance standard of teaching in Pharmacology.


Author(s):  
Amit P. Date ◽  
Archana S. Borkar ◽  
Rupesh T. Badwaik ◽  
Riaz A. Siddiqui ◽  
Tanaji R. Shende ◽  
...  

Background: Multiple choice questions (MCQs) are a common method for formative and summative assessment of medical students. Item analysis enables identifying good MCQs based on difficulty index (DIF I), discrimination index (DI), distracter efficiency (DE). The objective of this study was to assess the quality of MCQs currently in use in pharmacology by item analysis and develop a MCQ bank with quality items.Methods: This cross-sectional study was conducted in 148 second year MBBS students at NKP Salve institute of medical sciences from January 2018 to August 2018. Forty MCQs twenty each from the two term examination of pharmacology were taken for item analysis A correct response to an item was awarded one mark and each incorrect response was awarded zero. Each item was analyzed using Microsoft excel sheet for three parameters such as DIF I, DI, and DE.Results: In present study mean and standard deviation (SD) for Difficulty index (%) Discrimination index (%) and Distractor efficiency (%) were 64.54±19.63, 0.26±0.16 and 66.54±34.59 respectively. Out of 40 items large number of MCQs has acceptable level of DIF (70%) and good in discriminating higher and lower ability students DI (77.5%). Distractor efficiency related to presence of zero or 1 non-functional distrator (NFD) is 80%.Conclusions: The study showed that item analysis is a valid tool to identify quality items which regularly incorporated can help to develop a very useful, valid and a reliable question bank.


Sign in / Sign up

Export Citation Format

Share Document