scholarly journals EMPIRICAL ANALYSIS OF ITEM DIFFICULTY AND DISCRIMINATION INDICES OF NATIONAL BUSINESS AND TECHNICAL EXAMINATION BOARD (NABTEB) ECONOMICS ESSAY TEST ITEMS FROM 2013-2015

2021 ◽  
Vol 04 (02) ◽  
pp. 23-28
Author(s):  
Romy. O. Okoye ◽  
Joy Nneka Ndubuzor
Author(s):  
HANNAH JUDITH OSARUMWENSE ◽  
CHISOM PERPETUAL DURU

This study was based on the assessment of model fit for 2016 and 2017 Biology multiple choice test items of the National Business and Technical Examination Board. It aimed at empirically investigating the model fit of the 1, 2, and 3 Parameter Logistic Models (PLM) of the examinations using Item Response Theory. Three research questions were raised with two hypotheses formulated and tested. The expo-facto research design was adopted for this study. The population for the study was 5,115 and 4600 candidates in public and private schools in south-south geo-political zone in Nigeria for 2016 and 2017 respectively. A total of 2000 students were sampled using Simple random sampling technique. The instruments for data collection was the NABTEB 2016 and 2017 Biology multiple choice question papers. The instruments are said to be valid and reliable as they were developed by a standard examination body. The responses from the instruments were used for data analysis. The results obtained from the study revealed that the 1, 2 and 3 PLM fit the 2017 and 2016 NABTEB May/June Biology multiple choice test items. However, the 1PLM provided a better fit to the data than other models. Based on the findings of the study, it was recommended among others that the examining bodies should make sure that model fit the data well before they are used to make inferences regarding the data.


2020 ◽  
Vol 3 (1) ◽  
pp. 102-113
Author(s):  
Sutami

This research aims to produce a valid and reliable Indonesian language assessment instrument in form of HOTS test items and it describes the quality of HOTS test items to measure HOTS skill for the tenth grade of SMA and SMK students. This study was a research and development study adapted from Borg & Gall’s development model, including the following steps: research and information collection, planning, early product development, limited try out, revising the early product, field try out, and revising the final product. The research’s result shows that the HOTS assessment instrument in the form of HOTS test consists of 40 multiple choice items and 5 essay test items. Based on the judgment of the materials, construction, and language was valid and appropriate to be used. The reliability coefficients were 0.88 for the multiple-choice items, and 0.79 for essays. The multiple-choice items have the average difficulty 0.57 (average), the average of item discrimination 0.44 (good), and the distractors function well. The essay items have the average of item difficulty 0.60 (average) and the average of item discrimination 0.45 (good)


2020 ◽  
Vol 3 (1) ◽  
pp. 19
Author(s):  
I Gede Wahyu Suwela Antara ◽  
I Komang Sudarma ◽  
I.Ketut Dibia

This study aims to (1) developing a mathematics assessment instrument based on Higher Order Thinking Skills (HOTS); and describe the quality of the instrument. This study was a research and development study adapting 4D model from Thiagarajan. The model including the following steps : (1) define, (2) design, (3) develop, and (4) desseminate. Due to limited of time, this research was only carried out until the developing step.The result shows that the instrument that consists of 18 essay test item are valid and appropriate to be used. The instrumen’s reliability coefficients are 0.659 (High). The instrument has the average of item discrimination  0.44 (Very Good) and the average of item difficulty of the instrument are 0.584 (Medium). The conclusion is the assessment instrument is feasibel being as an assessment instrumen to measure the high order thinking skill toward two-dimentional geometry topic.


2020 ◽  
Vol 2 (1) ◽  
pp. 34-46
Author(s):  
Siti Fatimah ◽  
Achmad Bernhardo Elzamzami ◽  
Joko Slamet

This research was conducted by focusing on the formulated question regarding the test scores validity, reliability and item analysis involving the discrimination power and index difficulty in order to provide detail information leading to the improvement of test items construction. The quality of each particular item was analyzed in terms of item difficulty, item discrimination and distractor analysis. The statistical tests were used to compute the reliability of the test by applying The Kuder-Richardson Formula (KR20). The analysis of 50 test items was computed using Microsoft Office Excel. A descriptive method was applied to describe and examined the data. The research findings showed the test fulfilled the criteria of having content validity which was categorized as a low validity. Meanwhile, the reliability value of the test scores was 0.521010831 (0.52) categorized as lower reliability and revision of test. Through the 50 items examined, there were 21 items that were in need of improvement which were classified into “easy” for the index difficulty and “poor” category for the discriminability by the total 26 items (52%). It means more than 50% of the test items need to be revised as the items do not meet the criteria. It is suggested that in order to measure students’ performance effectively, essential improvement need to be evaluated where items with “poor” discrimination index should be reviewed.    


SAGE Open ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 215824402110469
Author(s):  
Ahmet Bildiren ◽  
Özge Bıkmaz Bilgen ◽  
Mediha Korkmaz

The aim of the present study is to develop a national non-verbal cognitive ability test in Turkey. Test items were developed during the first stage and applied as a pilot study on 3,073 children in the age interval of 4 to 13. The test was given its final form based on the values of item difficulty, item distinctiveness, item total score correlation. Norm study was carried out at 12 different provinces with a total of 9,129 children comprised of 4,464 females (49%) and 4,665 (51%) males. Test-retest, split-halves, KR-20, and KR-21 methods were applied for the reliability analyses of the study. Standard error, standard deviation, and reliability coefficient were calculated for the measurement. Content and construct validity along with criterion-related validity analysis methods were used for validity analyses. The KR-20 reliability coefficient obtained from the complete sample group was estimated as 0.92. Test-retest reliability coefficient was determined as 0.80. A correlation of .71 was determined between Naglieri Cognitive Ability test and BNV test. A correlation of .67 was determined between Toni-3 test and BNV test while a correlation of .86 was determined between BNV and Colored Progressive Matrices Test.


2018 ◽  
Vol 122 (2) ◽  
pp. 748-772 ◽  
Author(s):  
Wen-Ta Tseng ◽  
Tzi-Ying Su ◽  
John-Michael L. Nix

This study applied the many-facet Rasch model to assess learners’ translation ability in an English as a foreign language context. Few attempts have been made in extant research to detect and calibrate rater severity in the domain of translation testing. To fill the research gap, this study documented the process of validating a test of Chinese-to-English sentence translation and modeled raters’ scoring propensity defined by harshness or leniency, expert/novice effects on severity, and concomitant effects on item difficulty. Two hundred twenty-five, third-year senior high school Taiwanese students and six educators from tertiary and secondary educational institutions served as participants. The students’ mean age was 17.80 years ( SD = 1.20, range 17–19). The exam consisted of 10 translation items adapted from two entrance exam tests. The results showed that this subjectively scored performance assessment exhibited robust unidimensionality, thus reliably measuring translation ability free from unmodeled disturbances. Furthermore, discrepancies in ratings between novice and expert raters were also identified and modeled by the many-facet Rasch model. The implications for applying the many-facet Rasch model in translation tests at the tertiary level were discussed.


Author(s):  
Tesalonika Br Karo ◽  
Viator Lumbanraja ◽  
Novalina Sembiring

The purpose of this research is to describe the ability of the eleventh-grade students of SMA Deli Murni Bandar Baru on using Countable and Uncountable Nouns. The population of this research was the eleventh-grade students, with 58 students taken as sample. The instrument of collecting data is a test concerning Countable and Uncountable Nouns. The tryout test was done to know the validity, reliability, item difficulty of test items. The result showed that 5 students (15 %) belong to the high category, 24 students (73 %) to the moderate category, and 4 students (12 %) to the low category. The mean score was 61,39 it was only 24 % of the total students who can do the test well with 12 students who get a score above 75, it means that the eleventh-grade students of SMA Deli Murni Bandar Baru are not yet able to use Countable and Uncountable Nouns. Based on the total incorrect answers made by the students in using countable and uncountable was 502. The percentage of students’ mistakes made by students in uncountable multiple choice including indefinite and quantifier uncountable was 33 %, in countable multiple choice including singular, regular, irregular countable 34%, in the countable essay including regular and irregular countable was 33%. Based on the findings and conclusions, some suggestions are offered to English teachers, English students, and other researchers. Especially to English teachers, who teach in school, are advised to improve students' ability to use countable and uncountable nouns.


2014 ◽  
Vol 18 (2) ◽  
pp. 188-201
Author(s):  
Dina Huriaty ◽  
Djemari Mardapi

Penelitian ini bertujuan untuk (1) mengidentifikasi karakteristik butir-butir tes pada perangkat soal ujian nasional mata pelajaran Matematika tingkat SMP tahun pelajaran 2009/2010 yang dikalibrasi dengan metode kalibrasi fixed parameter, dan (2) mengetahui metode kalibrasi fixed parameter yang paling akurat di antara metode NWU-OEM (no prior weights updating and one expectation-maximization cycle), NWU-MEM (no prior weights updating and multiple expectation-maximization cycles), OWU-OEM (one  prior weights updating and one expectation-maximization cycle), OWU-MEM (one prior weights updating and multiple expectation-maximization cycles), dan MWU-MEM (multiple weights updating and multiple expectation-maximization cycles). Penelitian ini menggunakan pendekatan kuantitatif deskriptif. Subjek penelitian adalah data respons ujian nasional mata pelajaran Matematika tingkat SMP tahun pelajaran 2009/2010 dari provinsi DI Yogyakarta. Kriteria akurasi metode adalah nilai fungsi informasi tes dan kesalahan pengukuran. Hasil penelitian adalah sebagai berikut. (1) Statistik parameter butir-butir tes pada perangkat ujian nasional mata pelajaran Matematika tingkat SMP tahun pelajaran 2009/2010 menunjukkan rerata indeks daya beda butir berada pada interval [1,07 sampai  1,14], rerata indeks kesukaran butir [-0,35 sampai  -0,20], dan rerata pseudo guessing < 0,25. Nilai theta-nilai kemampuan-pada posisi  fungsi informasi butir menjadi maksimal menunjukkan grafik fungsi kelima metode kalibrasi fixed-parameter hampir berimpit. (2) Metode OWU-OEM merupakan metode yang paling akurat dalam mengestimasi parameter butir pada perangkat tes ujian nasional mata pelajaran Matematika tahun pelajaran 2009/2010.Kata kunci: akurasi, kalibrasi, fixed parameter, algoritma, Expectation-Maximization______________________________________________________________THE ACCURACY OF THE FIXED PARAMETER CALIBRATION METHOD:STUDY OF MATHEMATICS NATIONAL EXAMINATION TESTAbstract This study aimed to: (1) identify the characteristics of the test items on the mathematics test of the national examination which are calibrated with the fixed parameter calibration methods, and (2) reveal the most accurate fixed parameter calibration methods among NWU-OEM (no prior weights updating and one expectation-maximization cycle), NWU-MEM (no prior weights updating and multiple expectation-maximization cycles), OWU-OEM (one  prior weights updating and one expectation-maximization cycle), OWU-MEM (one prior weights updating and multiple expectation-maximization cycles), and MWU-MEM (multiple weights updating and multiple expectation-maximization cycles) methods. This study used descriptive quantitative approach. The subject is the testee’   responses to the mathematics national examination in junior high school in 2009/2010. The criteria of the accuracy methods are TIF and SEM. The research results are as follows. (1) Item of statistical parameter on Mathematics national examination test in 2009/2010 showed the average of item discrimination on the interval [1.07, 1.14], the average of item difficulty on the interval [-0.35, -0.20], and the average of pseudo guessing is c < 0.25. Theta - ability - score where the  item information function maximalist showed the function of five fixed-parameter calibration methods almost coincides. (2) OEM-OWU method is the most accurate in estimating the parameters on mathematics national examination test in 2009/2010. Keywords: Accuracy, Calibration, Fixed Parameter, Algorithm, Expectation-Maximization


Sign in / Sign up

Export Citation Format

Share Document