Developing a Scratch-based coding achievement test

Purpose The lack of a reliable and valid measurement tool for coding achievement emerges as a major problem in Turkey. Therefore, the purpose of this study is to develop a Scratch-based coding achievement test. Design/methodology/approach Initially, an item pool with 31 items was created. The item pool was classified within the framework of Bayman and Mayer’s (1988) types of coding knowledge to support content validity of the test. Then the item pool was applied to 186 volunteer undergraduates at Hacettepe University during the spring semester of the 2017-2018 academic year. Subsequently, the item analysis was conducted for construct validity of the test. Findings In all, 13 items were discarded from the test, leaving a total of 18 items. Out of the 18-item version of the coding achievement test, 4, 5 and 9 items measured syntactic, conceptual and strategic knowledge, respectively, among the types of coding knowledge. Furthermore, average item discrimination index (0.531), average item difficulty index (0.541) and Cronbach Alpha reliability coefficient (0.801) of the test were calculated. Practical implications Scratch users, especially those who are taking introductory courses at Turkish universities, could benefit from a reliable and valid coding achievement test developed in this study. Originality/value This paper has theoretical and practical value, as it provides detailed developmental stages of a reliable and valid Scratch-based coding achievement test.

Download Full-text

Item Analysis of Final Test for the 9th Grade Students of SMPN 44 Surabaya in the Academic Year of 2019/2020

JournEEL (Journal of English Education and Literature) ◽

10.51836/journeel.v2i1.81 ◽

2020 ◽

Vol 2 (1) ◽

pp. 34-46

Author(s):

Siti Fatimah ◽

Achmad Bernhardo Elzamzami ◽

Joko Slamet

Keyword(s):

Test Scores ◽

Item Difficulty ◽

Statistical Tests ◽

Item Analysis ◽

Discrimination Power ◽

Test Items ◽

9Th Grade ◽

Research Findings ◽

Microsoft Office ◽

Academic Year

This research was conducted by focusing on the formulated question regarding the test scores validity, reliability and item analysis involving the discrimination power and index difficulty in order to provide detail information leading to the improvement of test items construction. The quality of each particular item was analyzed in terms of item difficulty, item discrimination and distractor analysis. The statistical tests were used to compute the reliability of the test by applying The Kuder-Richardson Formula (KR20). The analysis of 50 test items was computed using Microsoft Office Excel. A descriptive method was applied to describe and examined the data. The research findings showed the test fulfilled the criteria of having content validity which was categorized as a low validity. Meanwhile, the reliability value of the test scores was 0.521010831 (0.52) categorized as lower reliability and revision of test. Through the 50 items examined, there were 21 items that were in need of improvement which were classified into “easy” for the index difficulty and “poor” category for the discriminability by the total 26 items (52%). It means more than 50% of the test items need to be revised as the items do not meet the criteria. It is suggested that in order to measure students’ performance effectively, essential improvement need to be evaluated where items with “poor” discrimination index should be reviewed.

Download Full-text

Development of a listening and speaking achievement test for primary school 2nd grade English course

Pegem Eğitim ve Öğretim Dergisi ◽

10.14527/pegegog.2015.021 ◽

2015 ◽

Vol 5 (4) ◽

pp. 375-396

Author(s):

Fatma Özüdoğru ◽

Oktay Cem Adıgüzel

Keyword(s):

Foreign Language ◽

Primary School ◽

Internal Consistency ◽

Item Difficulty ◽

Achievement Test ◽

Spring Semester ◽

Listening Test ◽

Speaking Test ◽

Teaching Curriculum ◽

Listening And Speaking

In this research, it is aimed to develop an achievement test focusing on listening and speaking skills in order to reveal the realization of the objectives set forth for students studying primary school 2nd grade English teaching curriculum. For validity purposes of the draft achievement test, three foreign language field experts and three 2nd grade English teachers were consulted and the number of items in the achievement test was determined and then item discrimination and item difficulty analyses were carried out. The listening part of the achievement test was applied to 202 students and the speaking part was applied to 125 2nd grade students in the spring semester of 2013-2014 academic years. The final listening test included 20 items after 18 items were excluded after analysis and 2 easy items were also excluded for grading easiness. For the speaking part, only 4 items were left out after item difficulty analysis and 1 item was excluded for grading easiness, so 20 items were included in the final speaking test. KR-20 value of the test was examined for the internal consistency and it was found that the listening test had .70 reliability and the speaking test had .81 reliability, which are adequate.

Download Full-text

Pembinaan dan Penentusahan Instrumen Kemahiran Proses Sains Untuk Sekolah Menengah

Jurnal Teknologi ◽

10.11113/jt.v66.1748 ◽

2013 ◽

Vol 66 (1) ◽

Author(s):

Ong Eng Tek ◽

Mohd Al-Junaidi Mohamad

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Full Range ◽

Field Testing ◽

Choice Test ◽

Process Skills ◽

Science Process Skills ◽

Test Items ◽

The Mean ◽

Difficulty Index

This study aims to develop a valid and reliable multiple-choice test referred to as Test of Basic and Integrated Process Skills (T-BIPS) for secondary schools to measure the acquisition of a full range of 12 science process skills (SPS), namely 7 basic SPS and 5 integrated SPS. This study involves two phases. Phase one entails the generation of test items according to a set of item objectives, and the establishment of the content and face validities as well as response objectivity in a qualitative manner through the use of panel experts. Phase two involves validating the psychometric properties of the instrument using field testing data from 104 Form 4 students of top, average and bottom sets in urban and rural schools. The final set of T-BIPS consists of 60 items: 28 items for basic SPS (with the KR-20 reliability of 0.86) and 32 items for integrated SPS (with the KR-20 reliability of 0.89). The mean item difficulty index is 0.60, ranging between 0.37 and 0.75, while the mean item discrimination index is 0.52, ranging between 0.20 and 0.77. The results of item analysis indicate that T-BIPS with the appropriate psychometric characteristics is an acceptable, valid and reliable test to measure the acquisition of science process skills.

Download Full-text

Interrater Reliability: Item Analysis to Develop Valid Questions for Case-Study Scenarios

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.886 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s303-s303

Author(s):

Kelly Holmes ◽

Mishga Moinuddin ◽

Sandi Steinfeld

Keyword(s):

Case Studies ◽

Interrater Reliability ◽

Item Difficulty ◽

Item Analysis ◽

Discrimination Index ◽

Index Formula ◽

Healthcare Associated Infection ◽

Test Question ◽

Difficulty Index ◽

Infection Preventionists

Background: Development of an interrater reliability (IRR) process for healthcare-associated infection surveillance is a valuable learning tool for infection preventionists (IPs) and increases accuracy and consistency in applying National Healthcare Safety Network (NHSN) definitions (1-3). Case studies from numerous resources were distributed to infection preventionists of varying experience levels (4-6). Item analysis, including item difficulty index and item discrimination index, was applied to individual test questions to determine the validity of the case scenarios at measuring individual mastery of the NHSN surveillance definitions (7-8). Methods: Beginning in 2016, a mandatory internal IRR program was developed and distributed to infection preventionists (IPs) of varying experience level. Each year through 2019, a test containing 30–34 case studies was developed with multiple-choice questions. Case studies were analyzed using 2 statistical methods to determine item difficulty and validity of written scenarios. P values for each test question were calculated using the item difficulty index formula, with harder questions resulting in values closer to 0.0. Point biserial correlation was applied to each question to determine highly discriminating questions, measured in a range from −1.0 and 1.0. Results: Between 2016 and 2019, 124 questions were developed and 145 respondents participated in the mandatory IRR program. The overall test difficulty was 0.70 (range, 0.64–0.74). Moreover, 17 questions (14%) were determined to have high “excellent” discrimination, 41 questions (33%) were determined to have “good” discrimination, 57 questions (46%) were determined to have “poor” discrimination, and 9 questions (7%) were found to have negative discrimination values. Conclusions: IRR testing identifies educational opportunities for IPs responsible for the correct application of NHSN surveillance definitions. Valid test scenarios are foundational components of IRR tests. Case scenarios that are determined to have a high discrimination index should be used to develop future test questions to better assess mastery of application of surveillance definitions to clinical cases.Funding: NoneDisclosures: None

Download Full-text

Item Analysis of English Final Semester Test

Indonesian Journal of EFL and Linguistics ◽

10.21462/ijefl.v5i2.302 ◽

2020 ◽

Vol 5 (2) ◽

pp. 491

Author(s):

Amalia Vidya Maharani ◽

Nur Hidayanto Pancoro Setyo Putro

Keyword(s):

Quantitative Research ◽

Item Difficulty ◽

Item Analysis ◽

Low Achievers ◽

Item Discrimination ◽

Good Test ◽

Test Analysis ◽

Difficult Item ◽

Academic Year

Numerous studies have been conducted on the item test analysis in English test. However, investigation on the characteristics of a good test of English final semester test is still rare in several districts in East Java. This research sought to examine the quality of the English final semester test in the academic year of 2018/2019 in Ponorogo. A total of 151 samples in the form of students’ answers to the test were analysed based on item difficulty, item discrimination, and distractors’ effectiveness using Quest program. This descriptive quantitative research revealed that the test does not have good proportion among easy, medium, and difficult item. In the item discrimination, the test had 39 excellent items (97.5%) which meant that the test could discriminate among high and low achievers. Besides, the distractors could distract students since there were 32 items (80%) that had effective distractors. The findings of this research provided insights that item analysis became important process in constructing test. It related to find the quality of the test that directly affects the accuracy of students’ score.

Download Full-text

Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain

Sultan Qaboos University Medical Journal [SQUMJ] ◽

10.18295/squmj.2018.18.01.011 ◽

2018 ◽

Vol 18 (1) ◽

pp. 68 ◽

Cited By ~ 5

Author(s):

Deena Kheyami ◽

Ahmed Jaradat ◽

Tareq Al-Shibani ◽

Fuad A. Ali

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Arabian Gulf ◽

Multiple Choice ◽

Cross Sectional Study ◽

Optimal Number ◽

Multiple Choice Questions ◽

Cross Sectional ◽

The Mean ◽

Difficulty Index

Objectives: The current study aimed to carry out a post-validation item analysis of multiple choice questions (MCQs) in medical examinations in order to evaluate correlations between item difficulty, item discrimination and distraction effectiveness so as to determine whether questions should be included, modified or discarded. In addition, the optimal number of options per MCQ was analysed. Methods: This cross-sectional study was performed in the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain. A total of 800 MCQs and 4,000 distractors were analysed between November 2013 and June 2016. Results: The mean difficulty index ranged from 36.70–73.14%. The mean discrimination index ranged from 0.20–0.34. The mean distractor efficiency ranged from 66.50–90.00%. Of the items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% had zero, one, two, three and four nonfunctional distractors (NFDs), respectively. Using three or four rather than five options in each MCQ resulted in 95% or 83.6% of items having zero NFDs, respectively. The distractor efficiency was 91.87%, 85.83% and 64.13% for difficult, acceptable and easy items, respectively (P <0.005). Distractor efficiency was 83.33%, 83.24% and 77.56% for items with excellent, acceptable and poor discrimination, respectively (P <0.005). The average Kuder-Richardson formula 20 reliability coefficient was 0.76. Conclusion: A considerable number of the MCQ items were within acceptable ranges. However, some items needed to be discarded or revised. Using three or four rather than five options in MCQs is recommended to reduce the number of NFDs and improve the overall quality of the examination.

Download Full-text

Construction of Knowledge Test on Large Cardamom Cultivation Practices in Arunachal Pradesh

Advances in Research ◽

10.9734/air/2019/v18i330091 ◽

2019 ◽

pp. 1-6

Author(s):

Bai Koyu ◽

Rajkumar Josmee Singh ◽

L. Devarani ◽

Ram Singh ◽

L. Hemochandra

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Discrimination Index ◽

Arunachal Pradesh ◽

Knowledge Test ◽

Construction Of Knowledge ◽

Cultivation Practices ◽

Difficulty Index ◽

Final Selection ◽

Large Cardamom

The knowledge test was developed to measure the knowledge of large cardamom growers. All 32 items were primarily fabricated on the basis of indorsing rational rather than root memorization and discriminate the sound knowledgeable large cardamom growers from the ailing knowledgeable ones.The scores from selected respondents were subjected to item analysis, consisting of item difficulty index and item discrimination index.In the final selection, the scale consisted of 17 items with ranging from 30-80 and discrimination index ranging from 0.30 to 0.55. The reliability of knowledge test being developed was tested by using Split-Half method and it was found to be 0.704.

Download Full-text

Summative Test Items Analysis Using Classical Test Theory (CTT)/ Analisis Item Kertas Peperiksaan Sumatif Menggunakan Teori Ujian Klasik (TUK)

Sains Humanika ◽

10.11113/sh.v12n2-2.1788 ◽

2020 ◽

Vol 12 (2-2) ◽

Author(s):

Nor Aisyah Saat

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Difficulty Level ◽

Test Items ◽

Classical Test ◽

Difficulty Index ◽

The Given ◽

Examination Question

Item analysis is the process of examining student responses to test items individually in order to get clear picture on the quality of the item and the overall test. Teachers are encouraged to perform item analysis for each administered test in order to determine which items should be retained, modified, or discarded in the given test. This study aims to analyse items in 2 summative examination question papers by using classical test theory (CTT). The instruments used were the SPM Mathematics Trial Examination Questions 1 2019 which involved 50 students in form 5 students and the SPM Mathematics Trial Examination Question 1 2019 which involved 20 students. The SPM Mathematics Trial Examination Question paper 1 contains 40 objective questions while the SPM Mathematics Trial Examination paper 1 contains 25 subjective questions. The data obtained were analysed using Microsoft Excel software based on the formulas of item difficulty index and discrimination index. This analysis can help teachers for better understanding about the difficulty level of the items used. Finally, based on the analysis items obtained, the items were classified as good, good but improved, marginal or weak items.

Download Full-text

EVALUATING THE USE OF MCQ AS AN ASSESSMENT METHOD IN A MEDICAL SCHOOL FOR ASSESSING MEDICAL STUDENTS IN THE COMPETENCE-BASED CURRICULUM

Jurnal Pendidikan Kedokteran Indonesia The Indonesian Journal of Medical Education ◽

10.22146/jpki.35544 ◽

2018 ◽

Vol 7 (1) ◽

pp. 54

Author(s):

Mitayani Purwoko ◽

Trisnawati Mundijo

Keyword(s):

Medical Students ◽

Observational Study ◽

Item Analysis ◽

Assessment Method ◽

Cross Sectional ◽

Level Of Difficulty ◽

Difficulty Index ◽

Academic Year ◽

Type Question

Background: Student’s cognitive ability could be assessed using MCQ. The aim of this study was to evaluate the quality of MCQ as an assessment method in Medical Faculty of Muhammadiyah University Palembang. Method: This study was designed as a cross sectional descriptive observational study. Sample was MCQ assessment in Genetics and Molecular Biology Module for academic year 2013/2014 until 2015/2016 for total 299 questions. Item analysis was done manually.Results: The item analysis showed that 61.2% questions were recall-type question. This situation showed that the construction of the question was not good and only testing the lower cognitive area. There was 45.2% ideal question with 30-70% difficulty index and 23.1% questions whom distractor efficiency was 100%. Half of the questions (56.2%) should be revised. This revision-needed questions distributed equally into easy, ideal, and hard level of difficulty. Revision-needed questions had lower distractor efficiency mean compared to good questions. Conclusion: MCQ as an assessment method do not reach maximum target yet because there were many questions that should be revised. Faculty should enhance the development of the lecturer in writing the good MCQs.

Download Full-text

Developing an Intellectual Learning Scale to Test Knowledge Level of Kiwi Growers of Arunachal Pradesh on Package of Practices of Kiwi

Current Journal of Applied Science and Technology ◽

10.9734/cjast/2019/v32i630039 ◽

2019 ◽

pp. 1-6

Author(s):

Bai Koyu ◽

Rajkumar Josmee Singh ◽

L. Devarani ◽

Ram Singh ◽

L. Hemochandra

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Discrimination Index ◽

Knowledge Level ◽

Arunachal Pradesh ◽

Item Discrimination ◽

Knowledge Test ◽

Difficulty Index

The knowledge test was developed to measure the knowledge level of kiwi growers. In all 36 items were predominantly fabricated on the basis of indorsing rational rather than root memorization and to discriminate the sound knowledgeable kiwi growers from the ailing knowledgeable ones. The scores obtained from sample respondents were imperilled to item analysis, embracing of item difficulty index & item discrimination index. In the ultimate selection, the scale consisted of 15 items with difficulty index ranging from 30-80 and discrimination index ranging from 0.30 to 0.55. Split-Half method was employed to check the reliability of knowledge test being developed and it was found to be 0.711.

Download Full-text