Item Analysis for a Better Quality Test

This study is a small-scale study of item analysis of a teacher’s own-made summative test. It examines the quality of multiple-choice items in terms of the difficulty level, the discriminating power, and the effectiveness of distractors. The study employed a qualitative approach which also used a simple quantitative analysis to analyze the quality of the test items through the document analysis of the teacher’s English summative test and the students’ answer sheets. The result shows that the summative test has more easy items than difficult items with the ratio of 19:25:6 while they should be 1:2:1 for easy, medium, and difficult. In terms of the Discriminating Power, there are 3, 13, and 16 for excellent, Good, and satisfactory level, but there are 17 and 2 for poor and bad levels of Discriminating Power. There are 43 (21.5%) of all distractors which are dysfunctional which, in turns, makes the items too easy which also makes the items fail to discriminate the upper-group students from the lower ones. Therefore, the 43 dysfunctional distractors should be revised to alter the difficulty level and improve the discriminating power. This research is expected to serve as a reflective means for teachers to examine their own-made test to ensure the quality of their test items.

Download Full-text

Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test

EduLite Journal of English Education Literature and Culture ◽

10.30659/e.6.2.256-269 ◽

2021 ◽

Vol 6 (2) ◽

pp. 256

Author(s):

Sayit Abdul Karim ◽

Suryo Sudiro ◽

Syarifah Sakinah

Keyword(s):

Reading Comprehension ◽

Junior High School ◽

Test Item ◽

English Language ◽

Item Analysis ◽

Quality Level ◽

Test Items ◽

Discriminating Power ◽

Level Of Difficulty

Apart from teaching, English language teachers need to assess their students by giving a test to know the students� achievements. In general, teachers are barely conducting item analysis on their tests. As a result, they have no idea about the quality of their test distributed to the students. The present study attempts to figure out the levels of difficulty (LD) and the discriminating power (DP) of the multiple-choice (MC) test item constructed by an English teacher in the reading comprehension test utilizing test item analysis. This study employs a qualitative approach. For this purpose, a test of 50-MC test items of reading comprehension was obtained from the students� test results. Thirty-five students of grade eight took part in the MC test try-out. They are both male (15) and female (20) students of junior high school 2 Kempo, in West Nusa Tenggara Province. The findings revealed that16 items out of 50 test items were rejected due to the poor and worst quality level of difficulty and discriminating index. Meanwhile, 12 items need to be reviewed due to their mediocre quality, and 11 items are claimed to have good quality items. Besides, 11 items out of 50 test items were considered as the excellent quality as their DP scores reached around 0.44 through 0.78. The implications of the present study will shed light on the quality of teacher-made test items, especially for the MC test.

Download Full-text

ANALISIS BUTIR SOAL UJIAN TENGAH SEMESTER GANJIL MATA PELAJARAN SENI BUDAYA KELAS VII DI SMPN 5 PADANG TAHUN AJARAN 2019/2020

Jurnal Sendratasik ◽

10.24036/jsu.v9i1.109301 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1

Author(s):

Anggun Septiani Queenta ◽

Yuliasma Yuliasma

Keyword(s):

High Reliability ◽

Item Analysis ◽

Difficulty Level ◽

Descriptive Research ◽

Discriminating Power ◽

Item Functioning ◽

Question Item ◽

Art And Culture ◽

Quantitative Descriptive

AbstractThis study aims to determine the quality of question item of the Odd Mid-term Examination for Art and Culture subject in grade 7 of SMPN 5 Padang in Academic Year 2019/2020 in terms of validity, reliability, difficulty level, discriminating power and effectiveness of deceivers. This research is a quantitative descriptive research. The subjects in this study were all students (254 students) in grade 7. The object in this study was the question items of mid-term Exam, key answers, and participants' answers. The data were collected through documentation and interview. The data analysis was conducted by using the ANATES program version 4.0.9.The results of this study indicate that: (1) Based on validity, there are 45 valid questions (90%) and 5 invalid questions (10%). (2) Based on reliability, the items have high reliability which is 0.78. (3) Based on the level of difficulty, there are 6 difficult items (12%), 31 medium items (62%), and 13 easy items (26%). (4) Based on discriminating power, there is 1 question categorized as a bad item (2%), 10 question categorized as pretty good items (20%), 23 question categorized as good items (46%), and 16 question categorized as very good items (32%). (5) Based on the effectiveness of the deception/distractor, there are 28 items functioning very well (36%), 17 items functioning well (34%), 4 items functioning quite well (8%), and 1 item functioning poorly (2% ). (6) Based on the quality of the questions, there are 27 items which are good in quality (54%), 15 items which are less good (30%), and 8 items which are not good (16%).Keywords: Item Analysis, Examination, ANATES version 4.0.9

Download Full-text

Analisis Butir Soal Latihan pada Buku al-Naḥw al-Wāḍīḥ untuk Siswa Madrasah Aliyah

Al-Ma rifah ◽

10.21009/almakrifah.18.02.02 ◽

2021 ◽

Vol 18 (2) ◽

pp. 127-138

Author(s):

Bagusradityo Aryobimo

Keyword(s):

Mixed Method ◽

Native Speaker ◽

Item Analysis ◽

Difficulty Level ◽

Qualitative And Quantitative ◽

Arabic Speakers ◽

Discriminating Power ◽

Native Arabic Speakers ◽

Practice Questions

Practice questions are the main part of every textbook, especially to evaluate and measure students’ abilities and skills. One of the books that are often used as a textbook by various madrasas and Islamic boarding schools on learning naḥwu for beginners is the al-Naḥw al-Wāḍīḥ. Unfortunately, this book has not determined the quality in the validity of practice for the lesson, because it was written by an Arabic native speaker. So, the target of this book is not for non-native Arabic speakers. This study uses a mixed method of qualitative and quantitative to describe a more complete picture of item analysis. The purpose of this study was to determine the quality of practice questions in the al-Naḥw al-Wāḍīḥ book in terms of validity, reliability, level of difficulty, and discriminating power. The results obtained are, (1) 93% of the exercise sample is under content validity; (2) from the results of the ANOVA test of two exercise samples, this test is quite reliable; (3) the difficulty level of the sample practice questions on naḥwu is fairly easy; (4) the discriminating power of the nahwu sample is sufficient. Thus, the analysis of practice questions in al-Naḥw al-Wāḍīḥ is sufficient to measure students’ abilities.

Download Full-text

Reflection of the Test-Item Quality in State SMP and SMA in Bandar Lampung

AKSARA: Jurnal Bahasa dan Sastra ◽

10.23960/aksara/v20i2.pp72-87 ◽

2019 ◽

Vol 20 (2) ◽

pp. 72-87

Author(s):

Ujang Suparman ◽

Keyword(s):

Test Item ◽

Item Analysis ◽

Descriptive Statistics ◽

Test Items ◽

National Examination ◽

Discriminating Power ◽

Item Quality ◽

Level Of Difficulty

The objectives of this research are to analyze critically the quality of test items used in SMP and SMA (mid semester, final semester, and National Examination Practice) in terms of reliability as a whole, level of difficulty, discriminating power, the quality of answer keys and distractors. The methods used to analyze the test items are item analysis (ITEMAN), two types of descriptive statistics for analyzing test items and another for analyzing the options. The findings of the research are very far from what is believed, that is, the quality of majority of test items as well as key answers and distractors are unsatisfactory. Based the results of the analysis, conclusions are drawn and recommendations are put forward.

Download Full-text

Analysis of mathematics test items quality for high school

Jurnal Penelitian dan Evaluasi Pendidikan ◽

10.21831/pep.v25i1.39174 ◽

2021 ◽

Vol 25 (1) ◽

pp. 108-117

Author(s):

Budi Manfaat ◽

Ayu Nurazizah ◽

Muhamad Ali Misri

Keyword(s):

High School ◽

High Reliability ◽

Difficulty Level ◽

Test Items ◽

Discriminating Power ◽

Mathematics Test ◽

Answer Choice ◽

Very High ◽

The Ideal

This study aims to determine the quality of the mathematics test items at high school in terms of validity, reliability, differentiation, difficulty level, and distractor effectiveness. This study is an evaluation type of research with a quantitative approach. The subjects in this study were 44 class XII students of SMKN 3 Kuningan and 39 class XII students of SMAN 1 Jalaksana. The results show that the majority (96.67%) of the items are declared valid in content by the experts. The test has very high reliability (0.90). The items have the ideal difficulty level. Most of the questions (70%) have medium difficulty, a few questions (6.67%) are very easy, and a few questions (20%) are difficult, and (3.3%) are very difficult. Most of the items (83.33%) have good discriminating power, and only a few questions (16.67%) have poor discriminating power. Most (90%) of the questions have a well-functioning answer choice, and only a few questions (10%) have the answer choice not functioning properly. Overall, this study can be concluded that the Mathematics Test Questions at SMKN 3 Kuningan are of good quality.

Download Full-text

AN ITEM ANALYSIS OF ENGLISH SUMMATIVE TEST FOR THE TENTH GRADE STUDENTS OF SMA MUHAMMADIYAH 3 JAKARTA IN THE 2013/2014 ACADEMIC YEAR

Ed-Humanistics : Jurnal Ilmu Pendidikan ◽

10.33752/ed-humanistics.v5i1.700 ◽

2020 ◽

Vol 5 (1) ◽

pp. 610-615

Author(s):

Anisa Fitriani Lailamsyah ◽

Fitri Apriyanti

Keyword(s):

Research Method ◽

Qualitative Method ◽

Quantitative Method ◽

Item Analysis ◽

Test Items ◽

Discriminating Power ◽

Tenth Grade ◽

Academic Year ◽

Analyze Data

This research is intended to analysis of the English summative test for the tenth grade students of SMA Muhammadiyah 3 Jakarta in the 2013/2014 academic year. The research method used to analyze the item of English summative test for senior high school are quantitative and qualitative method. Quantitative method was used to analyze data of the facility value and discriminating power of each item. The qualitative method was used to describe and analyze the quality of test items. Keywords: Item Analysis, English Summative Test, Tenth Grade Student

Download Full-text

The Quality of English Final Test at the Second Semester of Third Grade Students of SMAN 1 Pagaran in Academic Year 2016/2017

GENRE Journal of Applied Linguistics of FBS Unimed ◽

10.24114/genre.v6i4.8529 ◽

2018 ◽

Vol 6 (4) ◽

Author(s):

Horhon Lumbantoruan ◽

Sri Minda Murni ◽

Isli Iriani Indiah Pane

Keyword(s):

Third Grade ◽

Difficulty Level ◽

Final Test ◽

Discrimination Power ◽

Test Items ◽

Discriminating Power ◽

Third Grade Students ◽

Power Of The Test ◽

Academic Year

The objective of this study is to find out the quality of the English final test designed for the second semester of third grade students of SMAN 1 Pagaran in academic year 2016/2017. It describes whether or not the test items have good characteristic of test in terms of validity, reliability, difficulty level, and discriminating power. The test consists of 35 items multiple choice forms. The research design uses in this study was Descriptive Qualitative Research. To find out the discriminating power of the test, the writer chose the top 31% for the upper group and top 31% for the lower group. The result of this study shows that there are 18 (51%) acceptable items to meet the criteria of validity and 17 items (49%) is Invalid. The test is reliable since has 0.676 the level of validity. The test has unacceptable index of difficulty since has 15 items (43%) too difficult and are only 5 items (14%) easy items. Whereas for discriminating power index, the writer found there are 7 (20%) has negative result of the point have to be discard, 6 (17%) poor items, 8 (22%) satisfactory items, 13 (38%) good items, and 1 (3 %) excellent item. In conclusion, English final test designed for the second semester of third grade students of SMAN 1 Pagaran in academic year 2016/2017 does not meet the criteria of effective and acceptable test.Keywords: Validity, Reliability, Level of Difficulty, Discrimination Power

Download Full-text

Test Item Analysis and Relationship Between Difficulty Level and Discrimination Index of Test Items in an Achievement Test in Biology

PARIPEX-INDIAN JOURNAL OF RESEARCH ◽

10.15373/22501991/june2014/18 ◽

2012 ◽

Vol 3 (6) ◽

pp. 56-58 ◽

Cited By ~ 2

Author(s):

Suruchi Suruchi ◽

◽

Surender Singh Rana

Keyword(s):

Test Item ◽

Item Analysis ◽

Achievement Test ◽

Discrimination Index ◽

Difficulty Level ◽

Test Items

Download Full-text

Examination of the Quality of Multiple-choice Items on Classroom Tests

The Canadian Journal for the Scholarship of Teaching and Learning ◽

10.5206/cjsotl-rcacea.2011.2.4 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 26

Author(s):

David DiBattista ◽

Laura Kurzawa

Keyword(s):

Nous Avons ◽

Test Scores ◽

Item Analysis ◽

Multiple Choice ◽

Discriminatory Power ◽

Multiple Choice Tests ◽

Choice Tests ◽

Discrimination Coefficient ◽

Multiple Choice Items

Because multiple-choice testing is so widespread in higher education, we assessed the quality of items used on classroom tests by carrying out a statistical item analysis. We examined undergraduates’ responses to 1198 multiple-choice items on sixteen classroom tests in various disciplines. The mean item discrimination coefficient was +0.25, with more than 30% of items having unsatisfactory coefficients less than +0.20. Of the 3819 distractors, 45% were flawed either because less than 5% of examinees selected them or because their selection was positively rather than negatively correlated with test scores. In three tests, more than 40% of the items had an unsatisfactory discrimination coefficient, and in six tests, more than half of the distractors were flawed. Discriminatory power suffered dramatically when the selection of one or more distractors was positively correlated with test scores, but it was only minimally affected by the presence of distractors that were selected by less than 5% of examinees. Our findings indicate that there is considerable room for improvement in the quality of many multiple-choice tests. We suggest that instructors consider improving the quality of their multiple-choice tests by conducting an item analysis and by modifying distractors that impair the discriminatory power of items. Étant donné que les examens à choix multiple sont tellement généralisés dans l’enseignement supérieur, nous avons effectué une analyse statistique des items utilisés dans les examens en classe afin d’en évaluer la qualité. Nous avons analysé les réponses des étudiants de premier cycle à 1198 questions à choix multiples dans 16 examens effectués en classe dans diverses disciplines. Le coefficient moyen de discrimination de l’item était +0.25. Plus de 30 % des items avaient des coefficients insatisfaisants inférieurs à + 0.20. Sur les 3819 distracteurs, 45 % étaient imparfaits parce que moins de 5 % des étudiants les ont choisis ou à cause d’une corrélation négative plutôt que positive avec les résultats des examens. Dans trois examens, le coefficient de discrimination de plus de 40 % des items était insatisfaisant et dans six examens, plus de la moitié des distracteurs était imparfaits. Le pouvoir de discrimination était considérablement affecté en cas de corrélation positive entre un distracteur ou plus et les résultatsde l’examen, mais la présence de distracteurs choisis par moins de 5 % des étudiants avait une influence minime sur ce pouvoir. Nos résultats indiquent que les examens à choix multiple peuvent être considérablement améliorés. Nous suggérons que les enseignants procèdent à une analyse des items et modifient les distracteurs qui compromettent le pouvoir de discrimination des items.

Download Full-text

A Latent Trait Look at Pretest-Posttest Validation of Criterion-referenced Test Items

Review of Educational Research ◽

10.3102/00346543051003379 ◽

1981 ◽

Vol 51 (3) ◽

pp. 379-402 ◽

Cited By ~ 12

Author(s):

Wim J. van der Linden

Keyword(s):

Item Analysis ◽

Latent Trait ◽

Point Of View ◽

Information Function ◽

P Values ◽

Test Items ◽

Discriminating Power ◽

Criterion Referenced Test ◽

Item Responses ◽

Criterion Referenced

Since Cox and Vargas (1966) introduced their pretest-posttest validity index for criterion-referenced test items, a great number of additions and modifications have followed. All are based on the idea of gain scoring; that is, they are computed from the differences between proportions of pretest and posttest item responses. Although the method is simple and generally considered as the prototype of criterion-referenced item analysis, it has many and serious disadvantages. Some of these go back to the fact that it leads to indices based on a dual test administration- and population-dependent item p values. Others have to do with the global information about the discriminating power that these indices provide, the implicit weighting they suppose, and the meaningless maximization of posttest scores they lead to. Analyzing the pretest-posttest method from a latent trait point of view, it is proposed to replace indices like Cox and Vargas’ Dpp by an evaluation of the item information function for the mastery score. An empirical study was conducted to compare the differences in item selection between both methods.

Download Full-text