Automatic Difficulty Level Estimation of Multimedia Math Test Items

2010 ◽  
pp. 125-138
Author(s):  
R. Shen ◽  
I. Cheng ◽  
A. Basu
2008 ◽  
Vol 78 (2) ◽  
pp. 333-368 ◽  
Author(s):  
MARIA MARTINIELLO

In this article, Maria Martiniello reports the findings of a study of the linguistic complexity of math word problems that were found to exhibit differential item functioning for English-language learners (ELLs) and non-ELLs taking the Massachusetts Comprehensive Assessment System (MCAS) fourth-grade math test. It builds on prior research showing that greater linguistic complexity increases the difficulty of Englishlanguage math items for ELLs compared to non-ELLs of equivalent math proficiency. Through textual analyses, Martiniello describes the linguistic features of some of the 2003 MCAS math word problems that posed disproportionate difficulty for ELLs. Martiniello also uses excerpts from children's think-aloud transcripts to illustrate the reading comprehension challenges these features pose to Spanish-speaking ELLs. Through both DIF statistics and the voices of children, the article scrutinizes the appropriateness of inferences about ELLs' math knowledge based on linguistically complex test items.


2013 ◽  
Vol 30 (4) ◽  
pp. 479-486
Author(s):  
Odoisa Antunes de Queiroz ◽  
Ricardo Primi ◽  
Lucas de Francisco Carvalho ◽  
Sônia Regina Fiorim Enumo

Dynamic testing, with an intermediate phase of assistance, measures changes between pretest and post-test assuming a common metric between them. To test this assumption we applied the Item Response Theory in the responses of 69 children to dynamic cognitive testing Children's Analogical Thinking Modifiability Test adapted, with 12 items, totaling 828 responses, with the purpose of verifying if the original scale yields the same results as the equalized scale obtained by Item Response Theory in terms of "changes quantifying". We followed the steps: 1) anchorage of the pre and post-test items through a cognitive analysis, finding 3 common items; 2) estimation of the items' difficulty level parameter and comparison of those; 3) equalization of the items and estimation of "thetas"; 4) comparison of the scales. The Children's Analogical Thinking Modifiability Test metric was similar to that estimated by the TRI, but it is necessary to differentiate the pre and post-test items' difficulty, adjusting it to samples with high and low performance.


Author(s):  
Hardi Tambunan

The quality mapping of educational unit program is important issue in education in Indonesia today in an effort to improve the quality of education. The objective of this study is to make a mathematical model to find out the map of students’ capability in mathematics. It has been made a mathematical model to be used in the mapping of students’ capability. Demonstration of the use of models performed in accordance with the data of the results from the math test given to 147 students in grade XII, state senior high school, science program, and academic year 2015-2016. The map of students’ capability can be known that 48 test items are derived from 16 sub topics of three cognitive domain, only 19 test items are achieved. The achieved map lies in 8 sub topics for knowledge domain and 6 sub topics for comprehension domain and application domain has 5 sub topics. So that sub topic and cognitive domain which can not be achieved can be done further corrective action to obtain the maximum results. This paper demonstrates how operational research techniques can be applied for problem solving in education.


2020 ◽  
Vol 1 (191) ◽  
pp. 197-203
Author(s):  
Liudmyla Yaremenko ◽  
◽  
Antonina Kendyuhova ◽  
Yurii Yaremenko ◽  
◽  
...  

The article is devoted to the analysis of test tasks designed to assess the general pedagogical competence of teachers in postgraduate pedagogical education, and their main characteristics by means of modern testing theory. Designed tests for the start and final assessment are one of the effective, objective and reliable tools for assessing the level of general pedagogical competence of teachers in the context of postgraduate pedagogical education. During the approbation, significant statistical material was accumulated, the mathematical and statistical processing of which by means of modern IRT testing theory made it possible to determine estimates of latent parameters of subjects and parameters of test tasks by applying classical measurement models. Due to the invariance property of the IRT mathematical apparatus, the performed calculations ensure the objectivity of the assessment of the level of training of each student, which does not depend on the difficulty of the test tasks. This made it possible to correctly compare the test results of teachers who performed test tasks of different difficulty. The estimates of the difficulty of the test items obtained by the algorithm are also invariant with respect to the level of pedagogical training of applicants for education in the tested group. The characteristic curves of the difficulty level of the test tasks and the level of training of the test participants were constructed using the Rush model. The analysis of their mutual arrangement made it possible to identify the ways of further improvement of the test, create parallel tests, form a system of tasks that are most effective for assessing the level of pedagogical training of each applicant for education. When designing the test, it should be borne in mind that the proposed tasks should be different in content, form and complexity, then the designed test will carry more information about the subjects and will be suitable for assessing the level of general pedagogical competence of teachers in postgraduate pedagogical education. The testing of teachers of different specialties made it possible to check the quality of the developed test tasks and to establish the level of general pedagogical competence of the tested teachers in the conditions of postgraduate pedagogical education. Calibrated test items were entered in the bank of tasks on pedagogy and were used in the educational process.


2017 ◽  
Vol 6 (1) ◽  
pp. 52
Author(s):  
Wahyu Arta S ◽  
Abdul Asib ◽  
Dewi Sri Wahyuni

The objective of this study is to identify the quality of the test items<strong> </strong>used as a final test in the second semester for the eleventh grade students in SMA N in Magetan. This research used descriptive method. In collecting the data the writer used document (English final test items, syllabus, and students‟ answer sheets) as data sources. The data were analyzed by using the formula given by Ahmann and Glock. The results of this study shows that 57.5% of the total items have a good level in discriminating index, 45% of the items have fulfilled satisfactory criteria in difficulty level, 11 items had possessed the effective distracter, while the item‟s indicator 92.5 % of the items are compatible with the learning indicator mentioned in the syllabus, and in the construction aspect, 75% of the total items possess a good stem and 82.5% of the total items are able to fulfill all the aspects of good alternatives. In short, the items used as final test have good quality in constructing aspects, and its compatibility with the syllabus. However, some items are less effective viewed from its level of difficulty and the effectiveness of the disctracter aspect.


Author(s):  
Amardeep Kaur

The present study was conducted to Construct and Standardize an Achievement Test in English for IX standard students . Test items were selected from syllabus of VIII grade prescribed by Punjab School Education Board, Mohali . Since the achievement test was intended for standard IX , therefore the VIII grade English textbook was used for constructing the achievement test. The entire syllabus was thoroughly scrutinized and then items were selected from the books of class VIII of P.S.E.B. In all 130 items from 14 aspects of class VIII were taken. After seeking expert opinion, items were reduced to 120. Each item was allotted one mark. Further , 20 items were rejected on the basis of difficulty level and discriminating value of the items. 100 items were selected which lie between .40 to .60. Content validity of the achievement test in English was established with help of experts' opinion i.e. English teachers of different schools. The split-half method was used to establish reliability and its calculated reliability is 0.86.


2021 ◽  
Vol 17 (2) ◽  
pp. 187-197
Author(s):  
Eun-Yeong Shin

Purpose: The purpose of the present study was to develop lists of phoneme perception tests for school-aged children.Methods: The 127 initial and 94 final consonant test items were modified by the difficulty level (reduced the number for the multiple-choice and controlled familiarity of target words and foil words). The validity of the results for normal hearing children was evaluated. Through discussions by experts in various fields, the target word list was revised. Words with a low percentage (<90%) of correct answers, vowel-consonant in an initial consonant test item, and consonant-vowel (CV) in final consonant item were eliminated and produced the last revised consonant perception test item for school-aged children.Results: The consonant test item consisted of three multiple-choice words type of consonant-vowel-consonant or CV. The 50 initial and 25 final consonant perception test items included the high degree of familiarity, corresponding frequency of phonemes for daily life speech sounds of children.Conclusion: To analyze the phoneme perception ability of school-age children by listening and to evaluate phoneme errors in children with congenital high frequency hearing loss, the results of this study are useful.


2019 ◽  
Vol 9 (1) ◽  
pp. 91-100
Author(s):  
Nurhayati Nurhayati ◽  
◽  
Wahyudi Wahyudi ◽  
Syarif Lukman Hakim ◽  
◽  
...  

This study aims to 1) produce a HOTS assessment instrument; 2) knowing the quality of the test instrument in terms of the feasibility of construction, material feasibility, and language feasibility according to the expert; and 3) knowing the quality of the test items in terms of validity, reliability, difficulty level and distinguishing power based on the test results. Research and Development ware used as a research method with 4D procedural development model consists of four stages, namely: the define stage, the design stage, the development stage, and the dissemination stage. The questionnaire was used for expert judgment validation. The characteristics measurement of the HOTS items instrument including the validity, reliability, difficulty level and distinguishing power of the questions. The HOTS assessment instrument developed was in the form of multiple-choice options with a reason based on HOTS in aspects of analyzing, evaluating and creating. The results of expert validation show that the average item with criteria is very good in terms of content, construct and language aspects. Instruments that have been validated and revised were tested on students who had studied vibration and wave material and the test results showed that 77% of the questions developed were of good quality with valid criteria, good distinguishing criteria, level of difficulty at moderate and easy levels and very strong reliability so that feasible and ready to be used to measure students' higher order thinking skills in vibrations and waves material. Keywords: HOTS, Instruments test, Vibrations and waves


Sign in / Sign up

Export Citation Format

Share Document