EXAMINING THE QUALITY OF ENGLISH TEST ITEMS USING PSYCHOMETRIC AND LINGUISTIC CHARACTERISTICS AMONG GRADE SIX PUPILS

S. Kanageswari Suppiah Shanmugam; Vincent Wong; Murugan Rajoo

doi:10.32890/mjli2020.17.2.3

EXAMINING THE QUALITY OF ENGLISH TEST ITEMS USING PSYCHOMETRIC AND LINGUISTIC CHARACTERISTICS AMONG GRADE SIX PUPILS

Malaysian Journal of Learning and Instruction ◽

10.32890/mjli2020.17.2.3 ◽

2020 ◽

Vol 17 (Number 2) ◽

pp. 63-101

Author(s):

S. Kanageswari Suppiah Shanmugam ◽

Vincent Wong ◽

Murugan Rajoo

Keyword(s):

Item Difficulty ◽

Classroom Teacher ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Psychometric Analysis ◽

Cognitive Interviews ◽

Test Items ◽

Item Quality

Purpose - This study examined the quality of English test items using psychometric and linguistic characteristics among Grade Six pupils. Method - Contrary to the conventional approach of relying only on statistics when investigating item quality, this study adopted a mixed-method approach by employing psychometric analysis and cognitive interviews. The former was conducted on 30 Grade Six pupils, with each item representing a different construct commonly found in English test papers. Qualitative input was obtained through cognitive interviews with five Grade Six pupils and expert judgements from three teachers. Findings - None of the items were found to be too easy or difficult, and all items had positive discrimination indices. The item on idioms was most ideal in terms of difficulty and discrimination. Difficult items were found to be vocabulary-based. Surprisingly, the higher-order-thinking subjective items proved to be excellent in difficulty, although improvements could be made on their ability to discriminate. The qualitative expert judgements agreed with the quantitative psychometric analysis. Certain results from the item analysis, however, contradicted past findings that items with the ideal item difficulty value between 0.4 and 0.6 would have equally ideal item discrimination index. Significance -The findings of the study can serve as a reminder on the significance of using Classical Test Theory, a non-complex psychometric approach in assisting classroom teacher practitioners during the meticulous process of test design and ensuring test item quality.

Download Full-text

Summative Test Items Analysis Using Classical Test Theory (CTT)/ Analisis Item Kertas Peperiksaan Sumatif Menggunakan Teori Ujian Klasik (TUK)

Sains Humanika ◽

10.11113/sh.v12n2-2.1788 ◽

2020 ◽

Vol 12 (2-2) ◽

Author(s):

Nor Aisyah Saat

Keyword(s):

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Difficulty Level ◽

Test Items ◽

Classical Test ◽

Difficulty Index ◽

The Given ◽

Examination Question

Item analysis is the process of examining student responses to test items individually in order to get clear picture on the quality of the item and the overall test. Teachers are encouraged to perform item analysis for each administered test in order to determine which items should be retained, modified, or discarded in the given test. This study aims to analyse items in 2 summative examination question papers by using classical test theory (CTT). The instruments used were the SPM Mathematics Trial Examination Questions 1 2019 which involved 50 students in form 5 students and the SPM Mathematics Trial Examination Question 1 2019 which involved 20 students. The SPM Mathematics Trial Examination Question paper 1 contains 40 objective questions while the SPM Mathematics Trial Examination paper 1 contains 25 subjective questions. The data obtained were analysed using Microsoft Excel software based on the formulas of item difficulty index and discrimination index. This analysis can help teachers for better understanding about the difficulty level of the items used. Finally, based on the analysis items obtained, the items were classified as good, good but improved, marginal or weak items.

Download Full-text

Comparative Analysis of Classical Test Theory and Item Response Theory using Chemistry Test Data

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1179.0585c1 ◽

2019 ◽

Vol 8 (5C) ◽

pp. 1260-1266

Keyword(s):

Measurement Errors ◽

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Test Items ◽

Irt Models ◽

Item Statistics ◽

Pass Through ◽

Descriptive Survey

Assessment of learning involves deciding whether or not the content and objectives of education are down pat by administering quality tests. This study assesses the standard of Chemistry action take a look at and compares the item statistics generated mistreatment CTT and IRT strategies. A descriptive survey was adopted involving a sample of N=530 students. The specialised XCALIBRE 4 and ITEMAN 4 softwares were used to conduct the item analysis. Results indicate that, both the two methods commonly identified 13(32.5%) items as “problematic” and 27(67.5%) were “good”. Similarly, a significantly higher correlation exists between item statistics derived from the CTT and IRT models, [(r=-0.985,) and (r=0.801) p<0.05] for item difficulty and discrimination respectively; the study concludes that the Chemistry Achievement test used do not pass through the processes of standardisation. Secondly, CTT and IRT frameworks appeared to be effective and reliable in assessing test items as the two frameworks provide similar and comparable results. The study recommends that the teacher made Chemistry tests used in measuring students’ achievement should be made to pass through all the processes of standardisation. Meanwhile, CTT and IRT approaches of item analysis ought to be integrated within the aspects of item development and analysis because of their superiority within the investigation of reliability and minimising measurement errors

Download Full-text

ITEM ANALYSIS OF READING COMPREHENSION TEST FOR POST-GRADUATE STUDENTS

English Review Journal of English Education ◽

10.25134/erjee.v7i1.1493 ◽

2018 ◽

Vol 7 (1) ◽

pp. 29

Author(s):

Ari Arifin Danuwijaya

Keyword(s):

Reading Comprehension ◽

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Development ◽

Test Theory ◽

Difficulty Level ◽

Classical Test ◽

Comprehension Test

Developing a test is a complex and reiterative process which subject to revision even if the items were developed by skilful item writers. Many commercial test publishers need to conduct test analysis, rather than trusting the item writers� judgement and skills to improve the quality of items that need to be proven statistically after trying out was performed. This study is a part of test development process which aims to analyse the reading comprehension test items. One hundred multiple choice questions were pilot tested to 50 postgraduate students in one university. The pilot testing was aimed to investigate item quality which can further be developed better. The responses were then analysed using Classical Test Theory and using psychometric software called Lertap. The results showed that item difficulty level was mostly average. In terms of item discrimination, more than half of the total items were categorized marginal which required further modifications. This study suggests some recommendation that can be useful to improve the quality of the developed items.��Keywords: reading comprehension; item analysis; classical test theory; item difficulty; test development.

Download Full-text

The Role of Classical Test Theory to Determine the Quality of Classroom Teaching Test Items

PEDAGOGIA Jurnal Pendidikan ◽

10.21070/pedagogia.v9i1.123 ◽

2020 ◽

Vol 9 (1) ◽

pp. 5-34

Author(s):

Wong Vincent ◽

S.Kanageswari Suppiah Shanmugam

Keyword(s):

Item Difficulty ◽

Classical Test Theory ◽

Test Theory ◽

Difficulty Level ◽

Item Discrimination ◽

Test Items ◽

Classical Test ◽

English Competence

The purpose of this study is to describe the use of Classical Test Theory (CTT) to investigate the quality of test items in measuring students' English competence. This study adopts a research method with a mixed methods approach. The results show that most items are within acceptable range of both indexes, with the exception of items in synonyms. Items that focus on vocabulary are more challenging. What is surprising is that the short answer items have an excellent item difficulty level and item discrimination index. General results from data analysis of items also support the hypothesis that items that have an ideal item difficulty value between 0.4 and 0.6 will have the same ideal item discrimination value. This paper reports part of a larger study on the quality of individual test items and overall tests.

Download Full-text

The transition to digital presentation of the diagnostic imaging domain of the Part IV examination of the National Board of Chiropractic Examiners

Journal of Chiropractic Education ◽

10.7899/jce-19-2 ◽

2020 ◽

Vol 34 (1) ◽

pp. 52-67 ◽

Cited By ~ 3

Author(s):

Igor Himelfarb ◽

Margaret A. Seron ◽

John K. Hyland ◽

Andrew R. Gow ◽

Nai-En Tang ◽

...

Keyword(s):

Diagnostic Imaging ◽

Item Difficulty ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

High Stakes ◽

Radiographic Images ◽

National Board ◽

Classical Statistics ◽

Chiropractic Students

Objective: This article introduces changes made to the diagnostic imaging (DIM) domain of the Part IV of the National Board of Chiropractic Examiners examination and evaluates the effects of these changes in terms of item functioning and examinee performance. Methods: To evaluate item function, classical test theory and item response theory (IRT) methods were employed. Classical statistics were used for the assessment of item difficulty and the relation to the total test score. Item difficulties along with item discrimination were calculated using IRT. We also studied the decision accuracy of the redesigned DIM domain. Results: The diagnostic item analysis revealed similarity in item function across test forms and across administrations. The IRT models found a reasonable fit to the data. The averages of the IRT parameters were similar across test forms and across administrations. The classification of test takers into ability (theta) categories was consistent across groups (both norming and all examinees), across all test forms, and across administrations. Conclusion: This research signifies a first step in the evaluation of the transition to digital DIM high-stakes assessments. We hope that this study will spur further research into evaluations of the ability to interpret radiographic images. In addition, we hope that the results prove to be useful for chiropractic faculty, chiropractic students, and the users of Part IV scores.

Download Full-text

Conditional Reasoning: An Integrated Approach to Item Analysis

Organizational Research Methods ◽

10.1177/1094428119879756 ◽

2019 ◽

Vol 23 (1) ◽

pp. 124-153 ◽

Cited By ~ 1

Author(s):

Daniel R. Smith ◽

Michael E. Hoffman ◽

James M. LeBreton

Keyword(s):

Item Analysis ◽

Classical Test Theory ◽

Integrated Approach ◽

Conditional Reasoning ◽

Test Theory ◽

Analytic Framework ◽

Test Items ◽

Classical Test ◽

Reasoning Test ◽

Unique Nature

This article provides a review of the approach that James used when conducting item analyses on his conditional reasoning test items. That approach was anchored in classical test theory. Our article extends this work in two important ways. First, we offer a set of test development protocols that are tailored to the unique nature of conditional reasoning tests. Second, we further extend James’s approach by integrating his early test validation protocols (based on classical test theory) with more recent protocols (based on item response theory). We then apply our integrated item analytic framework to data collected on James’s first test, the conditional reasoning test for relative motive strength. We illustrate how this integrated approach furnishes additional diagnostic information that may allow researchers to make more informed and targeted revisions to an initial set of items.

Download Full-text

Reflection of the Test-Item Quality in State SMP and SMA in Bandar Lampung

AKSARA: Jurnal Bahasa dan Sastra ◽

10.23960/aksara/v20i2.pp72-87 ◽

2019 ◽

Vol 20 (2) ◽

pp. 72-87

Author(s):

Ujang Suparman ◽

Keyword(s):

Test Item ◽

Item Analysis ◽

Descriptive Statistics ◽

Test Items ◽

National Examination ◽

Discriminating Power ◽

Item Quality ◽

Level Of Difficulty

The objectives of this research are to analyze critically the quality of test items used in SMP and SMA (mid semester, final semester, and National Examination Practice) in terms of reliability as a whole, level of difficulty, discriminating power, the quality of answer keys and distractors. The methods used to analyze the test items are item analysis (ITEMAN), two types of descriptive statistics for analyzing test items and another for analyzing the options. The findings of the research are very far from what is believed, that is, the quality of majority of test items as well as key answers and distractors are unsatisfactory. Based the results of the analysis, conclusions are drawn and recommendations are put forward.

Download Full-text

Analysis on Achievement Test in Intensive English Program of IAIN Samarinda

FENOMENA ◽

10.21093/fj.v10i2.1320 ◽

2018 ◽

Vol 10 (2) ◽

pp. 117-134

Author(s):

Sari Agung Sucahyo ◽

Widya Noviana Noor

Keyword(s):

Item Difficulty ◽

Achievement Test ◽

Intensive English ◽

Test Quality ◽

Intensive English Program ◽

Good Test ◽

Test Items ◽

Item Quality ◽

English Program

As one of the tests, achievement test has to be qualified. A qualified test will be able to give the information about teaching correctly. If the achievement test is less qualified, the information related to students’ sucesss to achieve the instructional objective will also be less qualified. It means the test has to meet the characteristics of a good test. In fact, there has not been any effort yet to identify the quality of the achievement test which is used in Intensive English program. It means the information of the test quality cannot be found yet. Therefore, researchers are interested in analyzing the quality of achievement test for students in Intensive English program of IAIN Samarinda. Design of this research belongs to Content Analysis. Subject of this research is English achievement tests and 28 to 30 students were involved in the process of try out. Data were collected through three steps. Data were analyzed based on validity, reliability, and item quality. Finding of the research reveals 60 % of the tests have a good construct validity justified by related theories. It was found 55% of the tests have a good content validity. Reliability coefficient of the first tests format is 0, 65 and the second tests format shows 0, 52. Calculation of item difficulty shows 68% of the test items were between 0,20 – 0,80. The estimation of item discrimination shows 73% of the test items were between 0,20 – 0,50. While calculation of distracter efficiency shows 65% of the distracters were effective to distract the test takers.

Download Full-text

The quality of an English summative test of a public junior high school, Kupang-NTT

English Language Teaching Educational Journal ◽

10.12928/eltej.v3i2.2311 ◽

2020 ◽

Vol 3 (2) ◽

pp. 133

Author(s):

Thresia Trivict Semiun ◽

Fransiska Densiana Luruk

Keyword(s):

High School ◽

Public School ◽

Junior High School ◽

Content Validity ◽

Item Difficulty ◽

Item Analysis ◽

Item Discrimination ◽

Test Items ◽

Evaluative Research

This study aimed at examining the quality of an English summative test of grade VII in a public school located in Kupang. Particularly, this study examined content validity, reliability, and conducted item analysis including item validity, item difficulty, item discrimination, and distracter effectiveness. This study was descriptive evaluative research with documentation to collect data. The data was analyzed quantitatively except for content validity, which was done qualitatively. Content validity was analyzed by matching the test items with materials stated in the curriculum. The findings revealed that the English summative test had a high content validity. The reliability was estimated by applying the Kuder-Richardson’s formula (K-R20). The result showed that the test was reliable and very good for a classroom test. The item analysis was conducted by using ITEMAN 3.0. and it revealed that the the test was mostly constructed by easy items, most of the items could discriminate the students, most distracters were able to perform well, and the most of items were valid.

Download Full-text

Time Saving Students’ Formative Assessment: Algorithm to Balance Number of Tasks and Result Reliability

Applied Sciences ◽

10.3390/app11136048 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6048

Author(s):

Jaroslav Melesko ◽

Simona Ramanauskaite

Keyword(s):

Formative Assessment ◽

Item Response ◽

Item Difficulty ◽

Classical Test Theory ◽

Personalized Learning ◽

Test Theory ◽

Time Saving ◽

Learning Path ◽

Crucial Component

Feedback is a crucial component of effective, personalized learning, and is usually provided through formative assessment. Introducing formative assessment into a classroom can be challenging because of test creation complexity and the need to provide time for assessment. The newly proposed formative assessment algorithm uses multivariate Elo rating and multi-armed bandit approaches to solve these challenges. In the case study involving 106 students of the Cloud Computing course, the algorithm shows double learning path recommendation precision compared to classical test theory based assessment methods. The algorithm usage approaches item response theory benchmark precision with greatly reduced quiz length without the need for item difficulty calibration.

Download Full-text