The Role of Classical Test Theory to Determine the Quality of Classroom Teaching Test Items

The purpose of this study is to describe the use of Classical Test Theory (CTT) to investigate the quality of test items in measuring students' English competence. This study adopts a research method with a mixed methods approach. The results show that most items are within acceptable range of both indexes, with the exception of items in synonyms. Items that focus on vocabulary are more challenging. What is surprising is that the short answer items have an excellent item difficulty level and item discrimination index. General results from data analysis of items also support the hypothesis that items that have an ideal item difficulty value between 0.4 and 0.6 will have the same ideal item discrimination value. This paper reports part of a larger study on the quality of individual test items and overall tests.

Download Full-text

Summative Test Items Analysis Using Classical Test Theory (CTT)/ Analisis Item Kertas Peperiksaan Sumatif Menggunakan Teori Ujian Klasik (TUK)

Sains Humanika ◽

10.11113/sh.v12n2-2.1788 ◽

2020 ◽

Vol 12 (2-2) ◽

Author(s):

Nor Aisyah Saat

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Difficulty Level ◽

Test Items ◽

Classical Test ◽

Difficulty Index ◽

The Given ◽

Examination Question

Item analysis is the process of examining student responses to test items individually in order to get clear picture on the quality of the item and the overall test. Teachers are encouraged to perform item analysis for each administered test in order to determine which items should be retained, modified, or discarded in the given test. This study aims to analyse items in 2 summative examination question papers by using classical test theory (CTT). The instruments used were the SPM Mathematics Trial Examination Questions 1 2019 which involved 50 students in form 5 students and the SPM Mathematics Trial Examination Question 1 2019 which involved 20 students. The SPM Mathematics Trial Examination Question paper 1 contains 40 objective questions while the SPM Mathematics Trial Examination paper 1 contains 25 subjective questions. The data obtained were analysed using Microsoft Excel software based on the formulas of item difficulty index and discrimination index. This analysis can help teachers for better understanding about the difficulty level of the items used. Finally, based on the analysis items obtained, the items were classified as good, good but improved, marginal or weak items.

Download Full-text

ITEM ANALYSIS OF READING COMPREHENSION TEST FOR POST-GRADUATE STUDENTS

English Review Journal of English Education ◽

10.25134/erjee.v7i1.1493 ◽

2018 ◽

Vol 7 (1) ◽

pp. 29

Author(s):

Ari Arifin Danuwijaya

Keyword(s):

Reading Comprehension ◽

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Development ◽

Test Theory ◽

Difficulty Level ◽

Classical Test ◽

Comprehension Test

Developing a test is a complex and reiterative process which subject to revision even if the items were developed by skilful item writers. Many commercial test publishers need to conduct test analysis, rather than trusting the item writers� judgement and skills to improve the quality of items that need to be proven statistically after trying out was performed. This study is a part of test development process which aims to analyse the reading comprehension test items. One hundred multiple choice questions were pilot tested to 50 postgraduate students in one university. The pilot testing was aimed to investigate item quality which can further be developed better. The responses were then analysed using Classical Test Theory and using psychometric software called Lertap. The results showed that item difficulty level was mostly average. In terms of item discrimination, more than half of the total items were categorized marginal which required further modifications. This study suggests some recommendation that can be useful to improve the quality of the developed items.��Keywords: reading comprehension; item analysis; classical test theory; item difficulty; test development.

Download Full-text

Analisis Butir Soal IPA Try Out USBN Tahun Ajaran 2018/2019 dalam Kaitannya dengan Level Kognitif

MADRASAH ◽

10.18860/mad.v12i1.7686 ◽

2020 ◽

Vol 12 (1) ◽

pp. 29-39

Author(s):

Nuril Huda ◽

Tutik Sri Wahyuni

Keyword(s):

Classical Test Theory ◽

Answer Sheet ◽

Test Theory ◽

Difficulty Level ◽

Cognitive Level ◽

Descriptive Research ◽

Classical Test ◽

Discriminating Power ◽

Academic Year

This research aims to: 1) find out the characteristics of the science items try out National Standar School Exams (USBN) in the academic year 2018/2019 based on Classical Test Theory (CTT); 2) find out the number of the science items try out USBN in the academic year 2018/2019 in relation to cognitive level. This type of research is a descriptive research with a quantitative approach. The data obtained was a computer answer sheet of 5022 students who took USBN try out of Elementary School 2019 on February 21, 2019 in Tulungagung Regency. The results showed that: 1) The characteristics of the science items try out USBN in the academic year 2018/2019 based on Classical Test Theory (CTT) in aspects of: a) validity of 35 items valid; b) the reliability value of 0.818 is very high; c) the level of difficulty level: 4 items (11.43%) are difficult, 9 items (25.71%) are moderate, 16 items (45.71%) are easy and 6 items (7.140%) are very easy; d) discriminating power: 3 items (8.57%) are bad, 12 items (34.29%) are good enough, 15 items (42.86%) are moderate, and 5 items (14.29%) are good; e) the quality of options: 17 items (48.57%) without revision, 9 items (25.71%) one option revision, 5 items (14.29%) 2 option revisions, and 4 items (11.43% ) wrong revision of 3 options; f) 13 items (37.14%) about the science try out USBN in the academic year 2018/2019 have quite good and good characteristics, so they can be included in the question bank; 2) items the science of try out USBN in the academic year 2018/2019 in relationship with cognitive level, 11 items (31.43%) category L1 (knowledge), 10 items (28.57%) category L1 (understanding), 4 items (11.43%) category L2 (application), and 10 items (28.57%) category L3 (reasoning). Of the 13 items entered in the question bank with cognitive level, the science try out USBN in the academic year 2018/2019 was dominated at the cognitive level L1 (knowledge and understanding).

Download Full-text

EXAMINING THE QUALITY OF ENGLISH TEST ITEMS USING PSYCHOMETRIC AND LINGUISTIC CHARACTERISTICS AMONG GRADE SIX PUPILS

Malaysian Journal of Learning and Instruction ◽

10.32890/mjli2020.17.2.3 ◽

2020 ◽

Vol 17 (Number 2) ◽

pp. 63-101

Author(s):

S. Kanageswari Suppiah Shanmugam ◽

Vincent Wong ◽

Murugan Rajoo

Keyword(s):

Item Difficulty ◽

Classroom Teacher ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Psychometric Analysis ◽

Cognitive Interviews ◽

Test Items ◽

Item Quality

Purpose - This study examined the quality of English test items using psychometric and linguistic characteristics among Grade Six pupils. Method - Contrary to the conventional approach of relying only on statistics when investigating item quality, this study adopted a mixed-method approach by employing psychometric analysis and cognitive interviews. The former was conducted on 30 Grade Six pupils, with each item representing a different construct commonly found in English test papers. Qualitative input was obtained through cognitive interviews with five Grade Six pupils and expert judgements from three teachers. Findings - None of the items were found to be too easy or difficult, and all items had positive discrimination indices. The item on idioms was most ideal in terms of difficulty and discrimination. Difficult items were found to be vocabulary-based. Surprisingly, the higher-order-thinking subjective items proved to be excellent in difficulty, although improvements could be made on their ability to discriminate. The qualitative expert judgements agreed with the quantitative psychometric analysis. Certain results from the item analysis, however, contradicted past findings that items with the ideal item difficulty value between 0.4 and 0.6 would have equally ideal item discrimination index. Significance -The findings of the study can serve as a reminder on the significance of using Classical Test Theory, a non-complex psychometric approach in assisting classroom teacher practitioners during the meticulous process of test design and ensuring test item quality.

Download Full-text

Lessons Efficiency Paradox: The Unexpected Empirical Findings about the Amount of Homework and the Teacher’s Strictness

Pedagogika ◽

10.15823/p.2017.42 ◽

2017 ◽

Vol 127 (3) ◽

pp. 104-118

Author(s):

Gediminas Merkys ◽

Daiva Bubelienė

Keyword(s):

Social Relations ◽

Classical Test Theory ◽

Educational Process ◽

Test Theory ◽

Dimensional Structure ◽

Classical Test ◽

Integrated Index ◽

Different Types ◽

Metrological Quality

In the article a newly created questionnaire intended for older schoolchildren – “evaluate the teacher and his lessons” is introduced. The theoretical and practical context of the instrument based on 87 primary questions is named, the dimensional structure and metrological quality of the formed integrated scales and sub-scales is presented. The scales and sub-scales were formed following the classical test theory, combining logical and factorial validation. The secondary sub-scale factorization has indicated that it is expedient to distinguish between two integrated lesson dimensions (scales). The first integrated scale reflects the quality of social relations and teacher-centered orientation. The second scale reflects the management and didactics of the educational process. High correlation between the evaluations of integrated scales (r = 0.86) indicates that a generalized integrated index of evaluation of the teacher and his lesson can be derived by aggregating even 81 primary variables defining the most various aspects of the lesson. In the article the basis of statistic norming of the questionnaire possessed at present is described: Nschool children = 4024 and Nteachers = 200 which encompasses schools of different types from various regions of the country. The wide coverage of the content of created questionnaire, quite good quality of the scales opens good opportunities for its application in both the practice of schools evaluation and research. First of all, the methodical purpose of the article has been to introduce a new standardized instrument of survey. Secondly, the question why such indicators as “abundance of homework” and “level of the requirements set by the teacher” practically do not correlate with all the remaining scales, although the latter intercorrelate very significantly, is set. In the paper the question (and hypotheses) whether the mentioned variables can truly affect the didactic quality of the lesson counterproductively is elaborated.

Download Full-text

Karakteristik Butir Soal Penilaian Akhir Semester Mata Pelajaran Sejarah Kelas XI SMA Negeri 1 Pangkalpinang

Fajar Historia: Jurnal Ilmu Sejarah dan Pendidikan ◽

10.29408/fhs.v5i2.4609 ◽

2021 ◽

Vol 5 (2) ◽

pp. 210-221

Author(s):

Anis Faridah

Keyword(s):

Social Sciences ◽

Classical Test Theory ◽

Reliability Coefficient ◽

Test Theory ◽

Difficulty Level ◽

Theory Approach ◽

Final Exam ◽

Classical Test ◽

The Subject ◽

Quantitative Descriptive

This research is a study of quantitative descriptive. The purpose of this research is to describe the characteristics of final semester exam items for grade XI in the History subject at SMA Negeri 1 Pangkalpinang using the classical test theory approach. The research of the subject was 138 students of class XI in Social Sciences Major. The result of the research shows that final exam questions in the history subject class XI of SMA Negeri 1 Pangkalpinang are proper to use. This shows that from the validity of the items which there are 39 items of questions (97.5%) which are proven empirically valid with a 0.818 reliability coefficient. Other than that, there are 27 items of questions (67,5%) that can fulfill the criteria for the difficulty level, distinguishing power, and distractor function so it can be used directly to measure the student's ability without correction. While 12 items of questions (30%) need to be fixed and 1 item of question (2,5%) is declared to be invalid so it can't be used to measure the student's ability in History Subject. Permasalahan yang melatarbelakangi penelitian ini adalah pengembangan soal penilaian akhir semester mata pelajaran sejarah yang tidak melalui tahapan analisis butir soal sehingga kualitas butir soal tidak diketahui. Penelitian ini merupakan penelitian deskriptif kuantitatif. Tujuan penelitian ini adalah untuk mendeskripsikan karakteristik butir soal penilaian akhir semester mata pelajaran sejarah kelas XI SMA Negeri 1 Pangkalpinang menggunakan pendekatan teori tes klasik. Subjek penelitian berjumlah 138 peserta didik kelas XI jurusan IPS. Hasil penelitian menunjukkan bahwa soal PAS mata pelajaran sejarah kelas XI SMA Negeri 1 Pangkalpinang telah layak digunakan. Hal ini dibuktikan dari validitas butir soal yang mana terdapat 39 butir soal (97,5%) terbukti valid secara empirik dengan koefisien reliabilitas sebesar 0,818. Selain itu terdapat 27 butir soal (67,5%) yang memenuhi kriteria tingkat kesukaran, daya beda, dan keberfungsian distraktor sehingga dapat digunakan langsung untuk mengukur kemampuan peserta didik tanpa perbaikan. Sedangkan sebanyak 12 butir soal (30%) perlu dilakukan perbaikan dan 1 butir soal (2,5%) dinyatakan gugur sehingga tidak dapat digunakan untuk mengukur kemampuan peserta didik pada mata pelajaran sejarah.

Download Full-text

Conditional Reasoning: An Integrated Approach to Item Analysis

Organizational Research Methods ◽

10.1177/1094428119879756 ◽

2019 ◽

Vol 23 (1) ◽

pp. 124-153 ◽

Cited By ~ 1

Author(s):

Daniel R. Smith ◽

Michael E. Hoffman ◽

James M. LeBreton

Keyword(s):

Item Analysis ◽

Classical Test Theory ◽

Integrated Approach ◽

Conditional Reasoning ◽

Test Theory ◽

Analytic Framework ◽

Test Items ◽

Classical Test ◽

Reasoning Test ◽

Unique Nature

This article provides a review of the approach that James used when conducting item analyses on his conditional reasoning test items. That approach was anchored in classical test theory. Our article extends this work in two important ways. First, we offer a set of test development protocols that are tailored to the unique nature of conditional reasoning tests. Second, we further extend James’s approach by integrating his early test validation protocols (based on classical test theory) with more recent protocols (based on item response theory). We then apply our integrated item analytic framework to data collected on James’s first test, the conditional reasoning test for relative motive strength. We illustrate how this integrated approach furnishes additional diagnostic information that may allow researchers to make more informed and targeted revisions to an initial set of items.

Download Full-text

The Validation Study of Both the Modified Barthel and Barthel Index, and Their Comparison Based on Rasch Analysis in the Hospitalized Acute Stroke Elderly

The International Journal of Aging and Human Development ◽

10.1177/0091415020981775 ◽

2020 ◽

pp. 009141502098177

Author(s):

Reyhaneh Aminalroaya ◽

Fatemeh Sadat Mirzadeh ◽

Kazem Heidari ◽

Mahtab Alizadeh-Khoei ◽

Farshad Sharifi ◽

...

Keyword(s):

Acute Stroke ◽

Validation Study ◽

Barthel Index ◽

Rasch Analysis ◽

Item Difficulty ◽

Classical Test Theory ◽

Test Theory ◽

Stair Climbing ◽

Theory Approach ◽

Classical Test

A validation study the Iranian Modified Barthel Index (MBI) in hospitalized acute stroke elderly by classical test theory approach and investigate Rasch analysis for both Iranian version MBI and BI and compare the hierarchical item difficulty of them. Face-to-face interview with 100 geriatric stroke inpatients 60+ or their caregivers was done in a cross-sectional study. First, construct validity of MBI analyzed by the classical test theory, then Rasch analysis were done for BI and MBI. The reliability of the Iranian MBI was significant at 0.955. One factor achieved by the variance of 83.2%. In Rasch analysis for MBI, the most difficult item was stair climbing, whereas the simplest items were bowel and bladder control. In BI, the most difficult items were toilet use and ambulation. The Iranian MBI is very accurate and reliable; therefore the use of MBI to measure better outcomes in stroke elderly inpatients is recommended comparing with BI.

Download Full-text

ANALISIS METODE CHEATING PADA TES BERSKALA BESAR

Molluca Journal of Chemistry Education (MJoCE) ◽

10.30598/mjocevol9iss2pp133-146 ◽

2019 ◽

Vol 9 (2) ◽

pp. 133-146

Author(s):

Yance Manoppo ◽

Djemari Mardapi

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Difficulty ◽

Classical Test Theory ◽

Test Theory ◽

Theory Approach ◽

Response Theory ◽

Index Method ◽

National Examination ◽

Classical Test

This study aimed to reveal: (1) the characteristics of items of Chemistry Test in National Examination by using the classical test theory and item response theory; (2) the amount of cheating which occured by using Angoff's B-index Method, Pair 1 Method, Pair 2 Method, Modified Error Similarity Analysis (MESA) Method, and G2 Method; (3) the methods that detect more cheating in the implementation of the Chemistry Test in National Examination for high schools in the year 2011/2012 in Maluku Province. The results of the analysis with the classical test theory approach show that 77.5% items have item difficulty functioning well, 55% items have discrimination yet qualified and 70% items have distractor that works well with the index reliability test of 0,772. The analysis using the item response theory approach shows that 14 (35%) items fit with the model, the maximum function information is 11,4069 at θ = -1,6, and the magnitude of the error of measurement is 2,296. The number of pairs who are suspected of cheating is as follows: 13 pairs according to Angoff's B-index Method, 212 pairs according to Pair 1 Method, 444 pairs according to Pair 2 Method, 7 pairs according to MESA Method, and 102 pairs according to G2 Method. The most widely detecting cheating in a row is a Pair 2, Pair 1, G2, Angoff's B-index, and MESA.

Download Full-text