The Influence of Changes in Assessment Design on the Psychometric Quality of Scores

BACKGROUND Mobile health apps (MHA) have the potential to improve health care. The commercial MHA market is rapidly growing, but the content and quality of available MHA are unknown. Consequently, instruments of high psychometric quality for the assessment of the quality and content of MHA are highly needed. The Mobile Application Rating Scale (MARS) is one of the most widely used tools to evaluate the quality of MHA in various health domains. Only few validation studies investigating its psychometric quality exist with selected samples of MHAs. No study has evaluated the construct validity of the MARS and concurrent validity to other instruments. OBJECTIVE This study evaluates the construct validity, concurrent validity, reliability, and objectivity, of the MARS. METHODS MARS scoring data was pooled from 15 international app quality reviews to evaluate the psychometric properties of the MARS. The MARS measures app quality across four dimensions: engagement, functionality, aesthetics and information quality. App quality is determined for each dimension and overall. Construct validity was evaluated by assessing related competing confirmatory models that were explored by confirmatory factor analysis (CFA). A combination of non-centrality (RMSEA), incremental (CFI, TLI) and residual (SRMR) fit indices was used to evaluate the goodness of fit. As a measure of concurrent validity, the correlations between the MARS and 1) another quality assessment tool called ENLIGHT, and 2) user star-rating extracted from app stores were investigated. Reliability was determined using Omega. Objectivity was assessed in terms of intra-class correlation. RESULTS In total, MARS ratings from 1,299 MHA covering 15 different health domains were pooled for the analysis. Confirmatory factor analysis confirmed a bifactor model with a general quality factor and an additional factor for each subdimension (RMSEA=0.074, TLI=0.922, CFI=0.940, SRMR=0.059). Reliability was good to excellent (Omega 0.79 to 0.93). Objectivity was high (ICC=0.82). The overall MARS rating was positively associated with ENLIGHT (r=0.91, P<0.01) and user-ratings (r=0.14, P<0.01). CONCLUSIONS he psychometric evaluation of the MARS demonstrated its suitability for the quality assessment of MHAs. As such, the MARS could be used to make the quality of MHA transparent to health care stakeholders and patients. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.

Download Full-text

Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam

International Journal of Testing ◽

10.1080/15305058.2021.1916506 ◽

2021 ◽

pp. 1-25

Author(s):

Erik C. Roelofs ◽

Wilco H. M. Emons ◽

Angela J. Verschoor

Keyword(s):

Test Items ◽

Task Features ◽

Psychometric Quality

Download Full-text

A Systematic Review of Self-Report Instruments for the Measurement of Anxiety in Hospitalized Children with Cancer

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18041911 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1911

Author(s):

Gomolemo Mahakwe ◽

Ensa Johnson ◽

Katarina Karlsson ◽

Stefan Nilsson

Keyword(s):

Hospital Setting ◽

Self Report ◽

Hospitalized Children ◽

Abstract Concepts ◽

Children With Cancer ◽

Language Difficulties ◽

Self Reports ◽

Psychometric Quality ◽

Communication Challenge

Anxiety has been identified as one of the most severe and long-lasting symptoms experienced by hospitalized children with cancer. Self-reports are especially important for documenting emotional and abstract concepts, such as anxiety. Children may not always be able to communicate their symptoms due to language difficulties, a lack of developmental language skills, or the severity of their illness. Instruments with sufficient psychometric quality and pictorial support may address this communication challenge. The purpose of this review was to systematically search the published literature and identify validated and reliable self-report instruments available for children aged 5–18 years to use in the assessment of their anxiety to ensure they receive appropriate anxiety-relief intervention in hospital. What validated self-report instruments can children with cancer use to self-report anxiety in the hospital setting? Which of these instruments offer pictorial support? Eight instruments were identified, but most of the instruments lacked pictorial support. The Visual Analogue Scale (VAS) and Pediatric Quality of Life (PedsQL™) 3.0 Brain Tumor Module and Cancer Module proved to be useful in hospitalized children with cancer, as they provide pictorial support. It is recommended that faces or symbols be used along with the VAS, as pictures are easily understood by younger children. Future studies could include the adaptation of existing instruments in digital e-health tools.

Download Full-text

Empathy Measurement in Autistic and Nonautistic Adults: A COSMIN Systematic Literature Review

Assessment ◽

10.1177/1073191120964564 ◽

2020 ◽

pp. 107319112096456

Author(s):

Jessica L. Harrison ◽

Charlotte L. Brownlow ◽

Michael J. Ireland ◽

Adina M. Piovesana

Keyword(s):

Empirical Support ◽

Self Report ◽

Group Differences ◽

Test Case ◽

Social Differences ◽

Item Functioning ◽

The Subject ◽

Clinical Conditions ◽

Psychometric Quality

Empathy is essential for social functioning and is relevant to a host of clinical conditions. This COSMIN review evaluated the empirical support for empathy self-report measures used with autistic and nonautistic adults. Given autism is characterized by social differences, it is the subject of a substantial proportion of empathy research. Therefore, this review uses autism as a lens through which to scrutinize the psychometric quality of empathy measures. Of the 19 measures identified, five demonstrated “High-Quality” evidence for “Insufficient” properties and cannot be recommended. The remaining 14 had noteworthy gaps in evidence and require further evaluation before use with either group. Without tests of measurement invariance or differential item functioning, the extent to which observed group differences represent actual trait differences remains unknown. Using autism as a test case highlights an alarming tendency for empathy measures to be used to characterize, and potentially malign vulnerable populations before sufficient validation.

Download Full-text

A New Perspective on the Multidimensionality of Divergent Thinking Tasks

10.31234/osf.io/tvzs6 ◽

2018 ◽

Author(s):

Boris Forthmann ◽

Paul - Christian Bürkner ◽

Mathias Benedek ◽

Carsten Szardenings ◽

Heinz Holling

Keyword(s):

Latent Variables ◽

Divergent Thinking ◽

Rating Scale ◽

Latent Trait ◽

Subjective Ratings ◽

Task Instructions ◽

New Perspective ◽

Psychometric Quality

In the presented work, a shift of perspective with respect to the dimensionality of divergent thinking tasks is introduced moving from the question of multidimensionality across divergent thinking scores to the question of multidimensionality across the scale of divergent thinking scores. We apply IRTree models to test if the same latent trait can be assumed can be assumed across the whole scale in snapshot scoring of divergent thinking tests and holds for different task instructions and varying levels of fluency. This way, multidimensionality can be explored across scale points of a Likert-type rating scale and also multidimensionality due to differences in number of responses of ideational pools can be assessed. It was found that evidence for unidimensionality across scale points was stronger with be-creative instructions as compared to be-fluent instructions which suggests better psychometric quality of ratings when be-creative instructions are used. In addition, latent variables pertaining to low-fluency and high-fluency ideational pools shared around 50% of variance which suggests both strong overlap and evidence for differentiation. The presented approach allows to further examine the psychometric quality of subjective ratings and to examine new questions with respect to within-item multidimensionality in divergent thinking.

Download Full-text

Quality of WIL assessment design in higher education: a systematic literature review

Higher Education Research & Development ◽

10.1080/07294360.2018.1450359 ◽

2018 ◽

Vol 37 (4) ◽

pp. 788-804 ◽

Cited By ~ 4

Author(s):

Michelle Lasen ◽

Snowy Evans ◽

Komla Tsey ◽

Claire Campbell ◽

Irina Kinchin

Keyword(s):

Higher Education ◽

Literature Review ◽

Systematic Literature Review ◽

Assessment Design

Download Full-text

Tweede Taalbeheersing Meten Met Een Multiple-Choice Cloze-Toets

Toegepaste Taalwetenschap in Artikelen ◽

10.1075/ttwia.29.04joc ◽

1987 ◽

Vol 29 ◽

pp. 67-82

Author(s):

W. Jochems ◽

F. Montens

Keyword(s):

Second Language ◽

Empirical Research ◽

Foreign Students ◽

Multiple Choice ◽

Clear Correlation ◽

Technical University ◽

Psychometric Quality ◽

Cloze Test

This article reports on empirical research on the psychometric quality of multiple-choice cloze-tests, specifically their validity. The command of Dutch as a second language of groups of foreign students who attended the course "Dutch for foreigners" at the Technical University of Delft was measured. There were high correlations between the scores on a number of multiple-choice cloze-tests and the achievement in (part of) a four-skills test. In addition a clear correlation was found between the degree of language pro-ficiency and the subjects' scores on a multiple-choice cloze-test. These results suggest that a subject's score on a good quality multiple-choice cloze-test is a good indicator of his proficiency in a second language.

Download Full-text

Kwaliteit En Efficientie Van Multiple-Choice Cloze-Toetsing

Toegepaste Taalwetenschap in Artikelen ◽

10.1075/ttwia.31.12joc ◽

1988 ◽

Vol 31 ◽

pp. 116-126

Author(s):

W. Jochems ◽

F. Montens

Keyword(s):

Language Proficiency ◽

Multiple Choice ◽

Proficiency Tests ◽

Reading And Writing ◽

Skills Testing ◽

Foreign Speakers ◽

Psychometric Quality ◽

Quality Instruments ◽

Large Groups

This article presents and discusses a number of empirical findings concerning the psychometric quality of multiple-choice cloze tests as tests of general language proficiency, with emphasis on their validity and efficiency. The Dutch proficiency of various groups of foreign speakers was measured both by a series of separate proficiency tests in listening, speaking, reading and writing and by a series of multiple-choice cloze tests. Scores on multiple-choice cloze tests were found to correlate significantly with those on each of the proficiency tests. In addition, scores on multiple-choice cloze tests appeared to form a solid basis for predicting the total scores for listening, speaking, reading and writing taken together. Further, a clear relation was found to exist between levels of language proficiency and subjects' scores on multiple-choice cloze tests. Our conclusion is that the multiple-choice cloze tests under investigation have proved to be high-quality instruments for measuring proficiency in Dutch as a second language. Compared to a four-skills test, a multiple-choice cloze test is a very efficient instrument. Administering and processing take only little time. Besides, they can be administered to very large groups of subjects. Because of its quality and efficiency, multiple-choice cloze testing should be preferred to four-skills testing.

Download Full-text

A nonparametric procedure for exploring differences in rating quality across test-taker subgroups in rater-mediated writing assessments

Language Testing ◽

10.1177/0265532219838014 ◽

2019 ◽

Vol 36 (4) ◽

pp. 595-616 ◽

Cited By ~ 1

Author(s):

Stefanie A. Wind

Keyword(s):

Measurement Properties ◽

Scale Analysis ◽

Test Taker ◽

Writing Assessments ◽

Research And Practice ◽

Basic Measurement ◽

Psychometric Quality ◽

Nonparametric Procedure ◽

Insight Into

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups. Nonparametric procedures for exploring these differences are promising because they allow researchers and practitioners to examine important characteristics of ratings without potentially inappropriate parametric transformations or assumptions. This study illustrates a nonparametric method based on Mokken scale analysis (MSA) that researchers and practitioners can use to identify and explore differences in the quality of rater judgments between subgroups of test-takers. Overall, the results suggest that MSA provides insight into differences in rating quality across test-taker subgroups based on demographic characteristics. Differences in the degree to which raters adhere to basic measurement properties suggest that the interpretation of ratings may vary across subgroups. The implications of this study for research and practice are discussed.

Download Full-text

Satisfação com os Cuidados Anestésicos num Hospital Central

Acta Médica Portuguesa ◽

10.20344/amp.4042 ◽

2014 ◽

Vol 27 (1) ◽

pp. 33 ◽

Cited By ~ 1

Author(s):

Ana Catarina Moura ◽

Maria Amélia Ferreira ◽

Joselina Barbosa ◽

Joana Mourão

Keyword(s):

Elective Surgery ◽

Quality Criteria ◽

Three Dimensions ◽

Mean Values ◽

Global Satisfaction ◽

Assess Patient Satisfaction ◽

High Satisfaction ◽

The Mean ◽

Psychometric Quality

Introduction: The satisfaction level with health care reflects the quality of care from the patient’s perspective. The aim of this study is to assess patient satisfaction with anesthesia care in a Portuguese general hospital by using the “The Heidelberg Peri-anaesthetic Questionnaire”. Material and Methods: The questionnaire was translated and tested based on psychometric quality criteria in a sample of 107 patients who underwent elective surgery as inpatients at Hospital de São João. The global satisfaction and for each dimension of care were calculated. We analyzed the differences between patients with different levels of satisfaction, identifying potential confounding factors. Results: The Portuguese version of the questionnaire has 32 items distributed in three dimensions: ‘staff’, ‘discomfort’ and ‘fear’. The mean values of satisfaction for each dimension were 83.4%, 66.8% and 65.9%, respectively. The internal consistence was demonstrated by a Cronbach’s alpha coefficient ranging from 0.776 to 0.875 in the three dimensions. Satisfied and dissatisfied patients differed in the three dimensions, but to a lesser degree in ‘staff’. In the multivariate analysis we found significant influence of gender in the ‘discomfort’ dimension. Discussion: The questionnaire has good psychometric characteristics. The domain ‘staff’ includes three domains of the source questionnaire. Conclusions: Its application revealed high satisfaction levels regarding the staff. Dissatisfaction was mainly seen in the “fear” and “discomfort” dimensions, the latter being significantly lower in males.

Download Full-text