A nonparametric procedure for exploring differences in rating quality across test-taker subgroups in rater-mediated writing assessments

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups. Nonparametric procedures for exploring these differences are promising because they allow researchers and practitioners to examine important characteristics of ratings without potentially inappropriate parametric transformations or assumptions. This study illustrates a nonparametric method based on Mokken scale analysis (MSA) that researchers and practitioners can use to identify and explore differences in the quality of rater judgments between subgroups of test-takers. Overall, the results suggest that MSA provides insight into differences in rating quality across test-taker subgroups based on demographic characteristics. Differences in the degree to which raters adhere to basic measurement properties suggest that the interpretation of ratings may vary across subgroups. The implications of this study for research and practice are discussed.

Download Full-text

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

International Journal of Testing ◽

10.1080/15305058.2017.1361426 ◽

2017 ◽

Vol 18 (1) ◽

pp. 27-49 ◽

Cited By ~ 3

Author(s):

Stefanie A. Wind ◽

Edward W. Wolfe ◽

George Engelhard ◽

Peter Foltz ◽

Mark Rosenstein

Keyword(s):

Automated Scoring ◽

Rater Effects ◽

Writing Assessments ◽

Psychometric Quality ◽

Training Sets

Download Full-text

Implementation outcome assessment instruments used in physical healthcare settings and their measurement properties: a systematic review protocol

BMJ Open ◽

10.1136/bmjopen-2017-017972 ◽

2017 ◽

Vol 7 (10) ◽

pp. e017972 ◽

Cited By ~ 5

Author(s):

Zarnie Khadjesari ◽

Silia Vitoratou ◽

Nick Sevdalis ◽

Louise Hull

Keyword(s):

Systematic Review ◽

Measurement Properties ◽

Cochrane Library ◽

Health Measurement ◽

Healthcare Settings ◽

Implementation Outcomes ◽

Psychometric Quality ◽

The Impact ◽

Selection Of

IntroductionOver the past 10 years, research into methods that promote the uptake, implementation and sustainability of evidence-based interventions has gathered pace. However, implementation outcomes are defined in different ways and assessed by different measures; the extent to which these measures are valid and reliable is unknown. The aim of this systematic review is to identify and appraise studies that assess the measurement properties of quantitative implementation outcome instruments used in physical healthcare settings, to advance the use of precise and accurate measures.Methods and analysisThe following databases will be searched from inception to March 2017: MEDLINE, EMBASE, PsycINFO, CINAHL and the Cochrane Library. Grey literature will be sought via HMIC, OpenGrey, ProQuest for theses and Web of Science Conference Proceedings Citation Index-Science. Reference lists of included studies and relevant reviews will be hand searched. Three search strings will be combined to identify eligible studies: (1) implementation literature, (2) implementation outcomes and (3) measurement properties. Screening of titles, abstracts and full papers will be assessed for eligibility by two reviewers independently and any discrepancies resolved via consensus with the wider team. The methodological quality of the studies will be assessed using the COnsensus-based Standards for the selection of health Measurement INstruments checklist. A set of bespoke criteria to determine the quality of the instruments will be used, and the relationship between instrument usability and quality will be explored.Ethics and disseminationEthical approval is not necessary for systematic review protocols. Researchers and healthcare professionals can use the findings of this systematic review to guide the selection of implementation outcomes instruments, based on their psychometric quality, to assess the impact of their implementation efforts. The findings will also provide a useful guide for reviewers of papers and grants to determine the psychometric quality of the measures used in implementation research.Trial registration numberInternational Prospective Register of Systematic Reviews (PROSPERO):CRD42017065348.

Download Full-text

On the Practical Consequences of Misfit in Mokken Scaling

10.31234/osf.io/kgc3a ◽

2019 ◽

Author(s):

Daniela Ramona Crișan ◽

Jorge Tendeiro ◽

Rob Meijer

Keyword(s):

Test Scores ◽

Simulated Data ◽

Scale Analysis ◽

Mokken Scaling ◽

Psychometric Instruments ◽

Rules Of Thumb ◽

Personality Questionnaires ◽

Psychometric Quality ◽

Selection Of

Mokken scale analysis is a popular method to evaluate the psychometric quality of clinical and personality questionnaires and their individual items. Although many empirical papers report on the extent to which sets of items form Mokken scales, there is less attention for the effect of violations of commonly used rules of thumb. In this study we investigated the practical consequences of retaining or removing items with psychometric properties that do not comply with these rules-of-thumb. Using simulated data, we concluded that items with low scalability had some influence on the reliability of test scores, person ordering and selection, and criterion-related validity estimates. Removing the misfitting items from the scale had, in general, a small effect on the outcomes. Although important outcome variables were fairly robust against scale violations in some conditions, we conclude that researchers should not rely exclusively on algorithms allowing automatic selection of items. In particular, content validity must be taken into account in order to build sensible psychometric instruments.

Download Full-text

On the Practical Consequences of Misfit in Mokken Scaling

Applied Psychological Measurement ◽

10.1177/0146621620920925 ◽

2020 ◽

Vol 44 (6) ◽

pp. 482-496 ◽

Cited By ~ 1

Author(s):

Daniela Ramona Crişan ◽

Jorge N. Tendeiro ◽

Rob R. Meijer

Keyword(s):

Test Scores ◽

Simulated Data ◽

Scale Analysis ◽

Mokken Scaling ◽

Psychometric Instruments ◽

Rules Of Thumb ◽

Personality Questionnaires ◽

Psychometric Quality ◽

Selection Of

Mokken scale analysis is a popular method to evaluate the psychometric quality of clinical and personality questionnaires and their individual items. Although many empirical papers report on the extent to which sets of items form Mokken scales, there is less attention for the effect of violations of commonly used rules of thumb. In this study, the authors investigated the practical consequences of retaining or removing items with psychometric properties that do not comply with these rules of thumb. Using simulated data, they concluded that items with low scalability had some influence on the reliability of test scores, person ordering and selection, and criterion-related validity estimates. Removing the misfitting items from the scale had, in general, a small effect on the outcomes. Although important outcome variables were fairly robust against scale violations in some conditions, authors conclude that researchers should not rely exclusively on algorithms allowing automatic selection of items. In particular, content validity must be taken into account to build sensible psychometric instruments.

Download Full-text

The adaptation process and preliminary psychometric evaluation of the Polish version of Kiddo-KINDL questionnaire

Anthropological Review ◽

10.2478/anre-2019-0021 ◽

2019 ◽

Vol 82 (3) ◽

pp. 287-295

Author(s):

Tomasz Hanć ◽

Ulrike Ravens-Sieberer

Keyword(s):

Discriminant Validity ◽

Psychometric Evaluation ◽

Measurement Properties ◽

Public Health Medicine ◽

Floor Effect ◽

Basic Measurement ◽

Polish Version ◽

Related Quality ◽

Health Related

Abstract The assessment of health-related quality of life (HRQoL) is increasingly important in fields of public health, medicine, sociology and psychology. The aim of the study was to evaluate the psychometric properties of the Polish version of generic Kiddo-KINDL questionnaire for adolescents. The psychometric evaluation was performed using 96 questionnaires fulfilled by adolescents aged 12–16 years. Cronbach’s α coefficient for internal consistency and split-half reliability was estimated as well as ceiling, floor effect and correlations among the subscales and total score. The mean reliability for subscales was 0.65 and the Cronbach’s α coefficient for the total score was 0.85. The lowest α coefficient was for the School dimension (0.44) and the highest was achieved for the Self-esteem (0.80). The correlation between two parts of the questionnaire and split-half reliability was 0.66 and 0.80 respectively. The first psychometric evaluation of the Polish Kiddo-KINDL showed promising basic measurement properties, but it needs farther assessment, including convergent, construct and discriminant validity estimation.

Download Full-text

Athlete Self-Report Measures in Research and Practice: Considerations for the Discerning Reader and Fastidious Practitioner

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2016-0395 ◽

2017 ◽

Vol 12 (s2) ◽

pp. S2-127-S2-135 ◽

Cited By ~ 31

Author(s):

Anna E. Saw ◽

Michael Kellmann ◽

Luana C. Main ◽

Paul B. Gastin

Keyword(s):

Psychometric Properties ◽

Self Report ◽

Valuable Insight ◽

Research Papers ◽

Empirical Measures ◽

Research And Practice ◽

Training Response ◽

Insight Into

Athlete self-report measures (ASRM) have the potential to provide valuable insight into the training response; however, there is a disconnect between research and practice that needs to be addressed; namely, the measure or methods used in research are not always reflective of practice, or data primarily obtained from practice lacks empirical quality. This commentary reviews existing empirical measures and the psychometric properties required to be considered acceptable for research and practice. This information will allow discerning readers to make a judgment on the quality of ASRM data being reported in research papers. Fastidious practitioners and researchers are also provided with explicit guidelines for selecting and implementing an ASRM and reporting these details in research papers.

Download Full-text

Psychometric evaluation of the Mobile Application Rating Scale (MARS) (Preprint)

10.2196/preprints.17689 ◽

2020 ◽

Cited By ~ 2

Author(s):

Yannik Terhorst ◽

Paula Philippi ◽

Lasse Sander ◽

Dana Schultchen ◽

Sarah Paganini ◽

...

Keyword(s):

Health Care ◽

Construct Validity ◽

Mobile Application ◽

Concurrent Validity ◽

Rating Scale ◽

Psychometric Evaluation ◽

Confirmatory Factor ◽

Health Domains ◽

Psychometric Quality

BACKGROUND Mobile health apps (MHA) have the potential to improve health care. The commercial MHA market is rapidly growing, but the content and quality of available MHA are unknown. Consequently, instruments of high psychometric quality for the assessment of the quality and content of MHA are highly needed. The Mobile Application Rating Scale (MARS) is one of the most widely used tools to evaluate the quality of MHA in various health domains. Only few validation studies investigating its psychometric quality exist with selected samples of MHAs. No study has evaluated the construct validity of the MARS and concurrent validity to other instruments. OBJECTIVE This study evaluates the construct validity, concurrent validity, reliability, and objectivity, of the MARS. METHODS MARS scoring data was pooled from 15 international app quality reviews to evaluate the psychometric properties of the MARS. The MARS measures app quality across four dimensions: engagement, functionality, aesthetics and information quality. App quality is determined for each dimension and overall. Construct validity was evaluated by assessing related competing confirmatory models that were explored by confirmatory factor analysis (CFA). A combination of non-centrality (RMSEA), incremental (CFI, TLI) and residual (SRMR) fit indices was used to evaluate the goodness of fit. As a measure of concurrent validity, the correlations between the MARS and 1) another quality assessment tool called ENLIGHT, and 2) user star-rating extracted from app stores were investigated. Reliability was determined using Omega. Objectivity was assessed in terms of intra-class correlation. RESULTS In total, MARS ratings from 1,299 MHA covering 15 different health domains were pooled for the analysis. Confirmatory factor analysis confirmed a bifactor model with a general quality factor and an additional factor for each subdimension (RMSEA=0.074, TLI=0.922, CFI=0.940, SRMR=0.059). Reliability was good to excellent (Omega 0.79 to 0.93). Objectivity was high (ICC=0.82). The overall MARS rating was positively associated with ENLIGHT (r=0.91, P<0.01) and user-ratings (r=0.14, P<0.01). CONCLUSIONS he psychometric evaluation of the MARS demonstrated its suitability for the quality assessment of MHAs. As such, the MARS could be used to make the quality of MHA transparent to health care stakeholders and patients. Future studies could extend the present findings by investigating the re-test reliability and predictive validity of the MARS.

Download Full-text

Attitudes of Elderly People in the South Moravian Region to Further Education

Lifelong Learning ◽

10.11118/lifele2014040254 ◽

2014 ◽

Vol 4 (2) ◽

pp. 54-78

Author(s):

Petr Adamec ◽

Marián Svoboda

Keyword(s):

Quality Of Life ◽

Elderly People ◽

Further Education ◽

The South ◽

The Third ◽

Interesting Insight ◽

Survey Results ◽

Insight Into ◽

Demographic Features

This paper deals with the results of sociological survey focused on identification of the attitudes of elderly people to further education. The research was carried out in September 2010. Experience of elderly people with further education, their readiness (determination) for further education as well as their motivation and barriers in further education were also subjects of this research. Detecting elderly population’s awareness of universities of the third age and finding out their further education preferences were an integral part of the research. Research sample consisted of citizens over 55 years living in the South Moravian region. The survey results are structured by socio-demographic features e.g.: age, sex, educational attainment etc. and provide an interesting insight into the attitudes of the target group to one of the activities that contributes to improvement of their quality of life.

Download Full-text

Comparisons among rainbow trout, Oncorhynchus mykiss, populations of maternal transcript profile associated with egg viability

BMC Genomics ◽

10.1186/s12864-021-07773-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gregory M. Weber ◽

Jill Birkett ◽

Kyle Martin ◽

Doug Dixon ◽

Guangtu Gao ◽

...

Keyword(s):

Rainbow Trout ◽

Egg Quality ◽

Transcript Abundance ◽

Differentially Expressed ◽

Rainbow Trout Oncorhynchus Mykiss ◽

Analysis Group ◽

Maternal Transcripts ◽

Single Batch ◽

Insight Into

Abstract Background Transcription is arrested in the late stage oocyte and therefore the maternal transcriptome stored in the oocyte provides nearly all the mRNA required for oocyte maturation, fertilization, and early cleavage of the embryo. The transcriptome of the unfertilized egg, therefore, has potential to provide markers for predictors of egg quality and diagnosing problems with embryo production encountered by fish hatcheries. Although levels of specific transcripts have been shown to associate with measures of egg quality, these differentially expressed genes (DEGs) have not been consistent among studies. The present study compares differences in select transcripts among unfertilized rainbow trout eggs of different quality based on eyeing rate, among 2 year classes of the same line (A1, A2) and a population from a different hatchery (B). The study compared 65 transcripts previously reported to be differentially expressed with egg quality in rainbow trout. Results There were 32 transcripts identified as DEGs among the three groups by regression analysis. Group A1 had the most DEGs, 26; A2 had 15, 14 of which were shared with A1; and B had 12, 7 of which overlapped with A1 or A2. Six transcripts were found in all three groups, dcaf11, impa2, mrpl39_like, senp7, tfip11 and uchl1. Conclusions Our results confirmed maternal transcripts found to be differentially expressed between low- and high-quality eggs in one population of rainbow trout can often be found to overlap with DEGs in other populations. The transcripts differentially expressed with egg quality remain consistent among year classes of the same line. Greater similarity in dysregulated transcripts within year classes of the same line than among lines suggests patterns of transcriptome dysregulation may provide insight into causes of decreased viability within a hatchery population. Although many DEGs were identified, for each of the genes there is considerable variability in transcript abundance among eggs of similar quality and low correlations between transcript abundance and eyeing rate, making it highly improbable to predict the quality of a single batch of eggs based on transcript abundance of just a few genes.

Download Full-text

Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam

International Journal of Testing ◽

10.1080/15305058.2021.1916506 ◽

2021 ◽

pp. 1-25

Author(s):

Erik C. Roelofs ◽

Wilco H. M. Emons ◽

Angela J. Verschoor

Keyword(s):

Test Items ◽

Task Features ◽

Psychometric Quality

Download Full-text