Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam

Author(s):  
Erik C. Roelofs ◽  
Wilco H. M. Emons ◽  
Angela J. Verschoor
2016 ◽  
Vol 35 (1) ◽  
pp. 17
Author(s):  
Armel Brizuela ◽  
Karol Jiménez-Alfaro ◽  
Nelson Pérez-Rojas ◽  
Guaner Rojas-Rojas

<p>Los estándares actuales, para la evaluación de la calidad psicométrica de las pruebas psicológicas y educativas, estipulan que una de las evidencias requeridas que justifican las inferencias derivadas de la aplicación de un test se refiere a las estrategias para contestar a los ítems que lo componen. Por lo tanto, el objetivo del presente artículo se propone presentar los resultados de una investigación, que  consistió en la ejecución de entrevistas semiestructuradas a un conjunto de 15 estudiantes universitarios de primer ingreso, cuyos reportes orales fueron analizados con el objetivo de fundamentar un conjunto de estrategias para contestar los ítems verbales de la Prueba de Aptitud Académica de la Universidad de Costa Rica, que habían sido identificadas previamente. Los resultados indican que efectivamente los participantes emplearon las estrategias propuestas, lo cual constituye una evidencia  de gran importancia sobre las habilidades de razonamiento que se miden con los ítems verbales de la Prueba de Aptitud Académica. Finalmente, se concluye con una discusión sobre los resultados acerca de la utilidad de los autorreportes verbales que recaban evidencias de validez para un test y sobre futuras investigaciones en esta línea.</p><p> </p><p>Abstract:</p><p>Abstract Current standards for assessing the psychometric quality of psychological and educational tests stipulate that one indication required to justify the inferences derived from the application of a test are those related to answering strategies for the test items. Thus, this article presents the results of a study that involved the execution of semi-structured interviews with a group of 15 college freshmen, whose oral reports were analyzed to provide support for a set of strategies to answer previously identified verbal items from the “Academic Aptitude Test” (Prueba de Aptitud Académica) at the Universidad de Costa Rica. The results indicate that participants actually used the proposed strategies, which is important evidence about the reasoning skills measured by the Prueba de Aptitud Académica verbal items. Finally, we conclude with a discussion of the results, the usefulness of verbal self-reports to gather evidence for test validity and future research along these lines.</p>


Author(s):  
Yannik Terhorst ◽  
Paula Philippi ◽  
Lasse Sander ◽  
Dana Schultchen ◽  
Sarah Paganini ◽  
...  

BACKGROUND Mobile health apps (MHA) have the potential to improve health care. The commercial MHA market is rapidly growing, but the content and quality of available MHA are unknown. Consequently, instruments of high psychometric quality for the assessment of the quality and content of MHA are highly needed. The Mobile Application Rating Scale (MARS) is one of the most widely used tools to evaluate the quality of MHA in various health domains. Only few validation studies investigating its psychometric quality exist with selected samples of MHAs. No study has evaluated the construct validity of the MARS and concurrent validity to other instruments. OBJECTIVE This study evaluates the construct validity, concurrent validity, reliability, and objectivity, of the MARS. METHODS MARS scoring data was pooled from 15 international app quality reviews to evaluate the psychometric properties of the MARS. The MARS measures app quality across four dimensions: engagement, functionality, aesthetics and information quality. App quality is determined for each dimension and overall. Construct validity was evaluated by assessing related competing confirmatory models that were explored by confirmatory factor analysis (CFA). A combination of non-centrality (RMSEA), incremental (CFI, TLI) and residual (SRMR) fit indices was used to evaluate the goodness of fit. As a measure of concurrent validity, the correlations between the MARS and 1) another quality assessment tool called ENLIGHT, and 2) user star-rating extracted from app stores were investigated. Reliability was determined using Omega. Objectivity was assessed in terms of intra-class correlation. RESULTS In total, MARS ratings from 1,299 MHA covering 15 different health domains were pooled for the analysis. Confirmatory factor analysis confirmed a bifactor model with a general quality factor and an additional factor for each subdimension (RMSEA=0.074, TLI=0.922, CFI=0.940, SRMR=0.059). Reliability was good to excellent (Omega 0.79 to 0.93). Objectivity was high (ICC=0.82). The overall MARS rating was positively associated with ENLIGHT (r=0.91, P<0.01) and user-ratings (r=0.14, P<0.01). CONCLUSIONS he psychometric evaluation of the MARS demonstrated its suitability for the quality assessment of MHAs. As such, the MARS could be used to make the quality of MHA transparent to health care stakeholders and patients. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.


Author(s):  
Gomolemo Mahakwe ◽  
Ensa Johnson ◽  
Katarina Karlsson ◽  
Stefan Nilsson

Anxiety has been identified as one of the most severe and long-lasting symptoms experienced by hospitalized children with cancer. Self-reports are especially important for documenting emotional and abstract concepts, such as anxiety. Children may not always be able to communicate their symptoms due to language difficulties, a lack of developmental language skills, or the severity of their illness. Instruments with sufficient psychometric quality and pictorial support may address this communication challenge. The purpose of this review was to systematically search the published literature and identify validated and reliable self-report instruments available for children aged 5–18 years to use in the assessment of their anxiety to ensure they receive appropriate anxiety-relief intervention in hospital. What validated self-report instruments can children with cancer use to self-report anxiety in the hospital setting? Which of these instruments offer pictorial support? Eight instruments were identified, but most of the instruments lacked pictorial support. The Visual Analogue Scale (VAS) and Pediatric Quality of Life (PedsQL™) 3.0 Brain Tumor Module and Cancer Module proved to be useful in hospitalized children with cancer, as they provide pictorial support. It is recommended that faces or symbols be used along with the VAS, as pictures are easily understood by younger children. Future studies could include the adaptation of existing instruments in digital e-health tools.


Assessment ◽  
2020 ◽  
pp. 107319112096456
Author(s):  
Jessica L. Harrison ◽  
Charlotte L. Brownlow ◽  
Michael J. Ireland ◽  
Adina M. Piovesana

Empathy is essential for social functioning and is relevant to a host of clinical conditions. This COSMIN review evaluated the empirical support for empathy self-report measures used with autistic and nonautistic adults. Given autism is characterized by social differences, it is the subject of a substantial proportion of empathy research. Therefore, this review uses autism as a lens through which to scrutinize the psychometric quality of empathy measures. Of the 19 measures identified, five demonstrated “High-Quality” evidence for “Insufficient” properties and cannot be recommended. The remaining 14 had noteworthy gaps in evidence and require further evaluation before use with either group. Without tests of measurement invariance or differential item functioning, the extent to which observed group differences represent actual trait differences remains unknown. Using autism as a test case highlights an alarming tendency for empathy measures to be used to characterize, and potentially malign vulnerable populations before sufficient validation.


2021 ◽  
Vol 6 (2) ◽  
pp. 256
Author(s):  
Sayit Abdul Karim ◽  
Suryo Sudiro ◽  
Syarifah Sakinah

Apart from teaching, English language teachers need to assess their students by giving a test to know the students� achievements. In general, teachers are barely conducting item analysis on their tests. As a result, they have no idea about the quality of their test distributed to the students. The present study attempts to figure out the levels of difficulty (LD) and the discriminating power (DP) of the multiple-choice (MC) test item constructed by an English teacher in the reading comprehension test utilizing test item analysis. This study employs a qualitative approach. For this purpose, a test of 50-MC test items of reading comprehension was obtained from the students� test results. Thirty-five students of grade eight took part in the MC test try-out. They are both male (15) and female (20) students of junior high school 2 Kempo, in West Nusa Tenggara Province. The findings revealed that16 items out of 50 test items were rejected due to the poor and worst quality level of difficulty and discriminating index. Meanwhile, 12 items need to be reviewed due to their mediocre quality, and 11 items are claimed to have good quality items. Besides, 11 items out of 50 test items were considered as the excellent quality as their DP scores reached around 0.44 through 0.78. The implications of the present study will shed light on the quality of teacher-made test items, especially for the MC test.


2018 ◽  
Author(s):  
Boris Forthmann ◽  
Paul - Christian Bürkner ◽  
Mathias Benedek ◽  
Carsten Szardenings ◽  
Heinz Holling

In the presented work, a shift of perspective with respect to the dimensionality of divergent thinking tasks is introduced moving from the question of multidimensionality across divergent thinking scores to the question of multidimensionality across the scale of divergent thinking scores. We apply IRTree models to test if the same latent trait can be assumed can be assumed across the whole scale in snapshot scoring of divergent thinking tests and holds for different task instructions and varying levels of fluency. This way, multidimensionality can be explored across scale points of a Likert-type rating scale and also multidimensionality due to differences in number of responses of ideational pools can be assessed. It was found that evidence for unidimensionality across scale points was stronger with be-creative instructions as compared to be-fluent instructions which suggests better psychometric quality of ratings when be-creative instructions are used. In addition, latent variables pertaining to low-fluency and high-fluency ideational pools shared around 50% of variance which suggests both strong overlap and evidence for differentiation. The presented approach allows to further examine the psychometric quality of subjective ratings and to examine new questions with respect to within-item multidimensionality in divergent thinking.


2021 ◽  
Vol 23 (1) ◽  
pp. 18-23
Author(s):  
Belalov R.М.

At the present stage, pedagogical tests are one of the most accessible and developed methods for assessing students' knowledge. Evaluation of the results of global research on testing problems showed that the developers of test items pay special attention to the form and their standardization, the processing of test results and the interpretation of the data obtained, methods of automatic test formation. The goal is to study the possibility of using testing to assess the formation of students' competencies. Materials and methods. Theoretical: a review of psychological and pedagogical literature on research issues, analysis, synthesis, generalization, systematization. Results. Testing characterizes a number of properties, the main of which is the objectivity of the assessment of results, the definition of "gaps" in knowledge. The result of testing is obtaining objective information about the quality of knowledge and skills, determining the sections that are the worst learned by the students. All of the above makes it possible to adjust the course of training. Test control can be external, when the subjects are offered a dichotometric range, which contains both completed and unfulfilled tasks on a wide range of issues. During the testing process, students have the opportunity to independently identify gaps in their own knowledge. The study confirms the possibility of using testing in assessing the formation of students' competencies and ultimately provides an increase in the level of knowledge. Conclusion. At the present stage, the introduction of tests into educational practice is an inevitable process, therefore, efforts should be directed to the development of a theoretical platform for the testing system, which will entail an increase in the efficiency of using tests as a form of control.


2010 ◽  
Vol 35 (1) ◽  
pp. 12-16 ◽  
Author(s):  
Sandra L. Clifton ◽  
Cheryl L. Schriner

Author(s):  
Hardi Tambunan

The quality mapping of educational unit program is important issue in education in Indonesia today in an effort to improve the quality of education. The objective of this study is to make a mathematical model to find out the map of students’ capability in mathematics. It has been made a mathematical model to be used in the mapping of students’ capability. Demonstration of the use of models performed in accordance with the data of the results from the math test given to 147 students in grade XII, state senior high school, science program, and academic year 2015-2016. The map of students’ capability can be known that 48 test items are derived from 16 sub topics of three cognitive domain, only 19 test items are achieved. The achieved map lies in 8 sub topics for knowledge domain and 6 sub topics for comprehension domain and application domain has 5 sub topics. So that sub topic and cognitive domain which can not be achieved can be done further corrective action to obtain the maximum results. This paper demonstrates how operational research techniques can be applied for problem solving in education.


Sign in / Sign up

Export Citation Format

Share Document