Psychometric properties of gross motor assessment tools for children: a systematic review

ObjectiveGross motor assessment tools have a critical role in identifying, diagnosing and evaluating motor difficulties in childhood. The objective of this review was to systematically evaluate the psychometric properties and clinical utility of gross motor assessment tools for children aged 2–12 years.MethodA systematic search of MEDLINE, Embase, CINAHL and AMED was performed between May and July 2017. Methodological quality was assessed with the COnsensus-based Standards for the selection of health status Measurement INstruments checklist and an outcome measures rating form was used to evaluate reliability, validity and clinical utility of assessment tools.ResultsSeven assessment tools from 37 studies/manuals met the inclusion criteria: Bayley Scale of Infant and Toddler Development-III (Bayley-III), Bruininks-Oseretsky Test of Motor Proficiency-2 (BOT-2), Movement Assessment Battery for Children-2 (MABC-2), McCarron Assessment of Neuromuscular Development (MAND), Neurological Sensory Motor Developmental Assessment (NSMDA), Peabody Developmental Motor Scales-2 (PDMS-2) and Test of Gross Motor Development-2 (TGMD-2). Methodological quality varied from poor to excellent. Validity and internal consistency varied from fair to excellent (α=0.5–0.99). The Bayley-III, NSMDA and MABC-2 have evidence of predictive validity. Test–retest reliability is excellent in the BOT-2 (intraclass correlation coefficient (ICC)=0.80–0.99), PDMS-2 (ICC=0.97), MABC-2 (ICC=0.83–0.96) and TGMD-2 (ICC=0.81–0.92). TGMD-2 has the highest inter-rater (ICC=0.88–0.93) and intrarater reliability (ICC=0.92–0.99).ConclusionsThe majority of gross motor assessments for children have good-excellent validity. Test–retest reliability is highest in the BOT-2, MABC-2, PDMS-2 and TGMD-2. The Bayley-III has the best predictive validity at 2 years of age for later motor outcome. None of the assessment tools demonstrate good evaluative validity. Further research on evaluative gross motor assessment tools are urgently needed.

Download Full-text

Measuring test-retest reliability (TRR) of AMSTAR provides moderate to perfect agreement – a contribution to the discussion of the importance of TRR in relation to the psychometric properties of assessment tools

BMC Medical Research Methodology ◽

10.1186/s12874-021-01231-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Stefanie Bühn ◽

Peggy Ober ◽

Tim Mathes ◽

Uta Wegewitz ◽

Anja Jacobs ◽

...

Keyword(s):

Psychometric Properties ◽

Systematic Reviews ◽

Methodological Quality ◽

Assessment Tools ◽

Measurement Properties ◽

Perfect Agreement ◽

Retest Reliability ◽

Test Retest Reliability ◽

The Impact

Abstract Background Systematic Reviews (SRs) can build the groundwork for evidence-based health care decision-making. A sound methodological quality of SRs is crucial. AMSTAR (A Measurement Tool to Assess Systematic Reviews) is a widely used tool developed to assess the methodological quality of SRs of randomized controlled trials (RCTs). Research shows that AMSTAR seems to be valid and reliable in terms of interrater reliability (IRR), but the test retest reliability (TRR) of AMSTAR has never been investigated. In our study we investigated the TRR of AMSTAR to evaluate the importance of its measurement and contribute to the discussion of the measurement properties of AMSTAR and other quality assessment tools. Methods Seven raters at three institutions independently assessed the methodological quality of SRs in the field of occupational health with AMSTAR. Between the first and second ratings was a timespan of approximately two years. Answers were dichotomized, and we calculated the TRR of all raters and AMSTAR items using Gwet’s AC1 coefficient. To investigate the impact of variation in the ratings over time, we obtained summary scores for each review. Results AMSTAR item 4 (Was the status of publication used as an inclusion criterion?) provided the lowest median TRR of 0.53 (moderate agreement). Perfect agreement of all reviewers was detected for AMSTAR-item 1 with a Gwet’s AC1 of 1, which represented perfect agreement. The median TRR of the single raters varied between 0.69 (substantial agreement) and 0.89 (almost perfect agreement). Variation of two or more points in yes-scored AMSTAR items was observed in 65% (73/112) of all assessments. Conclusions The high variation between the first and second AMSTAR ratings suggests that consideration of the TRR is important when evaluating the psychometric properties of AMSTAR.. However, more evidence is needed to investigate this neglected issue of measurement properties. Our results may initiate discussion of the importance of considering the TRR of assessment tools. A further examination of the TRR of AMSTAR, as well as other recently established rating tools such as AMSTAR 2 and ROBIS (Risk Of Bias In Systematic reviews), would be useful.

Download Full-text

ShareDisk: A novel visual tool to assess perceptions about who should be responsible for supporting persons with mental health problems

International Journal of Social Psychiatry ◽

10.1177/0020764020913580 ◽

2020 ◽

Vol 66 (4) ◽

pp. 411-418

Author(s):

Srividya N Iyer ◽

Megan A Pope ◽

Gerald Jordan ◽

Greeshma Mohan ◽

Heleen Loohuis ◽

...

Keyword(s):

Mental Health ◽

Psychometric Properties ◽

Clinical Utility ◽

Mental Health Problems ◽

Health Problems ◽

Language And Literacy ◽

Retest Reliability ◽

Visual Tool ◽

Test Retest Reliability ◽

High Test

Objectives: Views on who bears how much responsibility for supporting individuals with mental health problems may vary across stakeholders (patients, families, clinicians) and cultures. Perceptions about responsibility may influence the extent to which stakeholders get involved in treatment. Our objective was to report on the development, psychometric properties and usability of a first-ever tool of this construct. Methods: We created a visual weighting disk called ‘ShareDisk’, measuring perceived extent of responsibility for supporting persons with mental health problems. It was administered (twice, 2 weeks apart) to patients, family members and clinicians in Chennai, India ( N = 30, 30 and 15, respectively) and Montreal, Canada ( N = 30, 32 and 15, respectively). Feedback regarding its usability was also collected. Results: The English, French and Tamil versions of the ShareDisk demonstrated high test–retest reliability ( rs = .69–.98) and were deemed easy to understand and use. Conclusion: The ShareDisk is a promising measure of a hitherto unmeasured construct that is easily deployable in settings varying in language and literacy levels. Its clinical utility lies in clarifying stakeholder roles. It can help researchers investigate how stakeholders’ roles are perceived and how these perceptions may be shaped by and shape the organization and experience of healthcare across settings.

Download Full-text

Psychometric properties of the PHQ-9 depression scale in people with multiple sclerosis: a systematic review

10.1101/321653 ◽

2018 ◽

Author(s):

Sarah Patrick ◽

Peter Connick

Keyword(s):

Multiple Sclerosis ◽

Psychometric Properties ◽

Depression Scale ◽

Assessment Tools ◽

Patient Health Questionnaire ◽

Health Questionnaire ◽

Eligibility Criteria ◽

Retest Reliability ◽

Patient Health ◽

Test Retest Reliability

AbstractBackgroundDepression affects approximately 25% of people with MS (pwMS) at any given time. It is however under recognised in clinical practice, in part due to a lack of uptake for brief assessment tools and uncertainty about their psychometric properties. The 9-item Patient Health Questionnaire (PHQ-9) is an attractive candidate for this role.ObjectiveTo synthesise published findings on the psychometric properties of the 9-item Patient Health Questionnaire (PHQ-9) when applied to people with multiple sclerosis (pwMS).Data sourcesPubMed, Medline and ISI Web of Science databases, supplemented by hand-searching of references from all eligible sources.Study eligibility criteriaPrimary literature written in English and published following peer-review with a primary aim to evaluate the performance of the PHQ-9 in pwMS.Outcome measuresPsychometric performance with respect to appropriateness, reliability, validity, responsiveness, precision, interpretability, acceptability, and feasibility.ResultsSeven relevant studies were identified, these were of high quality and included 5080 participants from all MS disease-course groups. Strong evidence was found supporting the validity of the PHQ-9 as a unidimensional measure of depression. Used as a screening tool for major depressive disorder (MDD) with a cut-point of 11, sensitivity was 95% sensitivity and specificity 88.3% (PPV 51.4%, NPV 48.6%). Alternative scoring systems that may address the issue of overlap between somatic features of depression and features of MS per se are being developed, although their utility remains unclear. However data on reliability was limited, and no specific evidence was available on test-retest reliability, responsiveness, acceptability, or feasibility.ConclusionsThe PHQ-9 represents a suitable tool to screen for MDD in pwMS. However use as a diagnostic tool cannot currently be recommended, and the potential value for monitoring depressive symptoms cannot be established without further evidence on test-retest reliability, responsiveness, acceptability, and feasibility.PROSPERO register ID: CRD42017067814

Download Full-text

Comparative study of psychometric properties of three assessment tools for degenerative rotator cuff disease

Clinical Rehabilitation ◽

10.1177/0269215518796888 ◽

2018 ◽

Vol 33 (2) ◽

pp. 277-284 ◽

Cited By ~ 3

Author(s):

Etienne James-Belin ◽

Anne Laure Roy ◽

Sandra Lasbleiz ◽

Agnès Ostertag ◽

Alain Yelnik ◽

...

Keyword(s):

Rotator Cuff ◽

Psychometric Properties ◽

Intraclass Correlation ◽

Assessment Tools ◽

University Hospital ◽

Rotator Cuff Disease ◽

Retest Reliability ◽

Good For ◽

Test Retest Reliability ◽

Improvement Score

Objective: To compare psychometric properties of Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire, Shoulder Pain and Disability Index (SPADI) and Constant–Murley scale, in patients with degenerative rotator cuff disease (DRCD). Design: Longitudinal cohort. Setting: One French university hospital. Methods: The scales were applied twice at one-week interval before physiotherapy and once after physiotherapy two months later. The perceived improvement after treatment was self-assessed on a numerical scale (0–4). The test–retest reliability of the DASH, SPADI and Constant–Murley scales was assessed before treatment by the intraclass correlation coefficient (ICC). The responsiveness was assessed by the paired t-test ( P < 0.05) and standardized mean difference (SMD). The correlation between the percentage of variation in scale scores and the self-assessed improvement score after treatment was measured by the Spearman coefficient. Results: Fifty-three patients were included. Twenty-six only were available for reliability. The test–retest reliability was very good for the DASH (ICC = 0.97), SPADI (0.95) and Constant–Murley (0.92). The scale score was improved after treatment for each scale ( P < 0.05). The SMD was moderate for the DASH (0.56) and SPADI (0.56) scales, and small for the Constant–Murley (0.44). The correlation between the percentage of variation in scores and self-assessed improvement score after treatment was high, moderate and not significant for the SPADI (0.59, P < 0.0001), DASH (0.42, P < 0.01) and Constant–Murley scales, respectively. Conclusion: The test–retest reliability of the DASH, SPADI and Constant–Murley scales is very good for patients with DRCD. The highest responsiveness was achieved with the SPADI.

Download Full-text

Psychometric Properties of Working Memory Test for Cycle Two (Grades 5-7) of Basic Education in Muscat Governorate in Oman

Journal of Educational and Psychological Studies [JEPS] ◽

10.24200/jeps.vol12iss3pp484-503 ◽

2018 ◽

Vol 12 (3) ◽

pp. 484

Author(s):

Faiza G. Albalushi ◽

Rashid S. Almehirzi ◽

Abdulqawi S. Al Zubaidi

Keyword(s):

Working Memory ◽

Psychometric Properties ◽

Predictive Validity ◽

Basic Education ◽

Memory Test ◽

Retest Reliability ◽

Cronbach Alpha ◽

Alpha Reliability ◽

Test Retest Reliability ◽

Norm Scores

The study aimed to develop a test to measure working memory for students in grades 5-7 in Basic Education and examine its psychometric properties. The study aimed also to develop the required norm scores for score interpretation. The sample consisted of 300 male and female students from grades 5, 6 and 7 in Muscat governorate. The validity of the working memory test was examined through face validity, construct validity, predictive validity and concurrent validity. The results supported the validity of the test. Confirmatory factor analysis showed the fit of data to the factorial structure of the working memory. In addition, the test showed high predictive validity of student achievement for several subjects. The reliability of the test was examined using Cronbach alpha reliability and test-retest reliability. Cronbach Alpha reliability ranged between 0.62 and 0.86 for the three grades. Similarly, test-retest reliability was between 0.35 and 0.75. The norm scores were computed using standardized scores and percentile ranks for each of the three subtests within each grade.

Download Full-text

Reliability and validity of the Persian version of 5-D itching scale among patients with chronic kidney disease

BMC Nephrology ◽

10.1186/s12882-020-02220-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Amin Kordi Yoosefinejad ◽

Fatemeh Karjalian ◽

Marzieh Momennasab ◽

Shahrokh Ezzatzadegan Jahromi

Keyword(s):

Chronic Kidney Disease ◽

Kidney Disease ◽

Psychometric Properties ◽

Internal Consistency ◽

Reliability And Validity ◽

Hemodialysis Patients ◽

Life Questionnaire ◽

Retest Reliability ◽

Persian Version ◽

Test Retest Reliability

Abstract Background Hemodialysis is considered a major therapeutic method for patients with chronic kidney disease. Pruritus is a common complaint of hemodialysis patients. The 5-D pruritus scale is amongst the most common tools to evaluate several dimensions of itch. Psychometric properties of the 5-D scale have not been evaluated in Persian speaking population with hemodialysis; hence, the objective of this study was to assess reliability and validity of the Persian version of the scale. Methods Ninety hemodialysis patients (men: 50, women: 40, mean age: 54.4 years) participated in this cross-sectional study. The final Persian version of 5-D scale was given to the participants. Tests Compared: One-third of the participants completed the scale twice within 3–7 days apart to evaluate test- retest reliability. Other psychometric properties including internal consistency, absolute reliability, convergent, discriminative and construct validity, floor/ceiling effects were also evaluated. Results The Persian 5-D scale has strong test-retest reliability (ICC= 0.98) and internal consistency (Cronbach’s alpha= 0.99). Standard error of measurement and minimal detectable change were 0.33 and 0.91, respectively. Regarding convergent validity, the scale had moderate correlation with numeric rating scale (r =0.67) and quality of life questionnaire related to itch (r = 0.59). Exploratory factor analysis revealed two factors within the scale. No floor or ceiling effect was found for the scale. Conclusion The Persian version of 5-D the itching scale is a brief instrument with acceptable reliability and validity. Therefore, the scale could be used by experts, nurses, and other health service providers to evaluate pruritus among Persian speaking hemodialysis patients.

Download Full-text

The design fluency test: a reliable and valid instrument for the assessment of game intelligence?

German Journal of Exercise and Sport Research ◽

10.1007/s12662-020-00697-0 ◽

2021 ◽

Author(s):

Thomas Finkenzeller ◽

Björn Krenn ◽

Sabine Würth ◽

Günter Amesberger

Keyword(s):

Psychometric Properties ◽

Team Sports ◽

Correlation Coefficients ◽

Soccer Players ◽

Scientific Instrument ◽

Retest Reliability ◽

Adolescent Students ◽

Design Fluency ◽

Test Retest Reliability ◽

Fluency Test

AbstractThe design fluency test (DFT) has been reported to predict successful sports performance of soccer players and has therefore been in the spotlight of sport psychology research. There is, however, a lack of research regarding the psychometric properties of the DFT in elite sports. Thus, the aim of this research was to provide findings of test–retest reliability, practice effects and the diagnostic power of the DFT. Multiple studies of youth and adult elite athletes, as well as nonathlete students, were conducted in applied settings. Test–retest relationship demonstrated poor to acceptable short-term and long-term correlations. Furthermore, significant changes between test and retest were obtained in some variables that differed among samples. The differential value of the DFT was corroborated by significant differences between adolescent students and adolescent elite soccer players. Regarding the prospective value, significant partial correlation coefficients were found between DFT scores and volleyball performance in adult elite players. Although our research partially confirmed previous findings on the differential and prospective power of the DFT, the findings on test–retest reliability indicate that the DFT cannot be recommended for application in sports. The psychometric properties—in particular the findings on test–retest reliability—of the DFT have to be improved before research can be carried out on the application for the selection of team sport athletes and for the prediction of future success in team sports. Further research is needed to develop a scientific instrument for the assessment of game intelligence.

Download Full-text

Psychometric properties of the critical thinking disposition assessment test amongst medical students in China: a cross-sectional study

BMC Medical Education ◽

10.1186/s12909-020-02437-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Liyuan Cui ◽

Yaxin Zhu ◽

Jinglou Qu ◽

Liming Tie ◽

Ziqi Wang ◽

...

Keyword(s):

Factor Analysis ◽

Critical Thinking ◽

Medical Students ◽

Psychometric Properties ◽

Internal Consistency ◽

Discriminant Validity ◽

Retest Reliability ◽

Critical Thinking Disposition ◽

Thinking Disposition ◽

Test Retest Reliability

Abstract Background Critical thinking disposition helps medical students and professionals overcome the effects of personal values and beliefs when exercising clinical judgment. The lack of effective instruments to measure critical thinking disposition in medical students has become an obstacle for training and evaluating students in undergraduate programs in China. The aim of this study was to evaluate the psychometric properties of the CTDA test. Methods A total of 278 students participated in this study and responded to the CTDA test. Cronbach’s α coefficient, internal consistency, test-retest reliability, floor effects and ceiling effects were measured to assess the reliability of the questionnaire. Construct validity of the pre-specified three-domain structure of the CTDA was evaluated by explanatory factor analysis (EFA) and confirmatory factor analysis (CFA). The convergent validity and discriminant validity were also analyzed. Results Cronbach’s alpha coefficient for the entire questionnaire was calculated to be 0.92, all of the domains showed acceptable internal consistency (0.81–0.86), and the test-retest reliability indicated acceptable intra-class correlation coefficients (ICCs) (0.93, p < 0.01). The EFA and the CFA demonstrated that the three-domain model fitted the data adequately. The test showed satisfactory convergent and discriminant validity. Conclusions The CTDA is a reliable and valid questionnaire to evaluate the disposition of medical students towards critical thinking in China and can reasonably be applied in critical thinking programs and medical education research.

Download Full-text

Validity, reliability, and calibration of the physical activity unit 7 item screener (PAU-7S) at population scale

International Journal of Behavioral Nutrition and Physical Activity ◽

10.1186/s12966-021-01169-w ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Helmut Schröder ◽

Isaac Subirana ◽

Julia Wärnberg ◽

María Medrano ◽

Marcela González-Gross ◽

...

Keyword(s):

Physical Activity ◽

Predictive Validity ◽

Internal Consistency ◽

Regression Models ◽

Weighted Kappa ◽

Linear Regression Models ◽

Retest Reliability ◽

Activity Unit ◽

Test Retest Reliability ◽

Acceptable Internal Consistency

Abstract Background Validation of self-reported tools, such as physical activity (PA) questionnaires, is crucial. The aim of this study was to determine test-retest reliability, internal consistency, and the concurrent, construct, and predictive validity of the short semi-quantitative Physical Activity Unit 7 item Screener (PAU-7S), using accelerometry as the reference measurement. The effect of linear calibration on PAU-7S validity was tested. Methods A randomized sample of 321 healthy children aged 8–16 years (149 boys, 172 girls) from the nationwide representative PASOS study completed the PAU-7S before and after wearing an accelerometer for at least 7 consecutive days. Weight, height, and waist circumference were measured. Cronbach alpha was calculated for internal consistency. Test-retest reliability was determined by intra-class correlation (ICC). Concurrent validity was assessed by ICC and Spearman correlation coefficient between moderate to vigorous PA (MVPA) derived by the PAU-7S and by accelerometer. Concordance between both methods was analyzed by absolute agreement, weighted kappa, and Bland-Altman statistics. Multiple linear regression models were fitted for construct validity and predictive validity was determined by leave-one-out cross-validation. Results The PAU-7S overestimated MVPA by 18%, compared to accelerometers (106.5 ± 77.0 vs 95.2 ± 33.2 min/day, respectively). A Cronbach alpha of 0.76 showed an acceptable internal consistency of the PAU-7S. Test-retest reliability was good (ICC 0.71 p < 0.001). Spearman correlation and ICC coefficients of MVPA derived by the PAU-7S and accelerometers increased from 0.31 to 0.62 and 0.20 to 0.62, respectively, after calibration of the PAU-7S. Between-methods concordance improved from a weighted kappa of 0.24 to 0.50 after calibration. A slight reduction in ICC, from 0.62 to 0.60, yielded good predictive validity. Multiple linear regression models showed an inverse association of MVPA with standardized body mass index (β − 0.162; p < 0.077) and waist to height ratio (β − 0.010; p < 0.014). All validity dimensions were somewhat stronger in boys compared to girls. Conclusion The PAU-7S shows a good test-retest reliability and acceptable internal consistency. All dimensions of validity increased from poor/fair to moderate/good after calibration. The PAU-7S is a valid instrument for measuring MVPA in children and adolescents. Trial registration Trial registration numberISRCTN34251612.

Download Full-text

Psychometric properties of the short form of the Stroke Impact Scale in German-speaking stroke survivors

Health and Quality of Life Outcomes ◽

10.1186/s12955-021-01826-5 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Anna Coppers ◽

Jens Carsten Möller ◽

Detlef Marks

Keyword(s):

Factor Analysis ◽

Psychometric Properties ◽

Short Form ◽

Trial Registration ◽

Cognitive Domain ◽

Retest Reliability ◽

Stroke Impact Scale ◽

Impact Scale ◽

German Speaking ◽

Test Retest Reliability

Abstract Background The short form of the Stroke Impact Scale (SF-SIS) consists of eight questions and provides an overall index of health-related quality of life after stroke. The goal of the study was the evaluation of construct validity, reliability and responsiveness of the SF-SIS for the use in German-speaking stroke patients in rehabilitation. Methods The SF-SIS, the Stroke Impact Scale 2.0 (SIS 2.0), EQ-5D-5L, National Institutes of Health Stroke Scale (NIHSS) and de Morton Mobility Index were assessed in 150 inpatients after stroke, with a second measurement two weeks later for the analyses of responsiveness. In 55 participants, the test–retest-reliability was assessed one week after the first measurement. The study was designed following the recommendations of the COSMIN initiative. Results The correlations of the SF-SIS with the SIS 2.0 (ρ = 0.90), as well as the EQ-5D-5L (ρ = 0.79) were high, as expected. There was adequate discriminatory ability of the SF-SIS index between patients who were less and more severely affected by stroke, as assessed by the NIHSS. Exploratory factor analysis indicated a two-factor structure of the SF-SIS explaining 59.9% of the total variance, providing better model fit in the confirmatory factor analysis than the one-factorial structure. Analyses of test–retest-reliability showed an intraclass correlation coefficient of 0.88 (95% CI 0.75–0.94). Hypotheses concerning responsiveness were not confirmed due to lower correlations between the assessments change scores. Conclusion Results of this analysis of the SF-SIS’s psychometric properties are matching with the validity analysis of the English original version, confirming the high correlations with the Stroke Impact Scale and the EQ-5D-5L. Examination of structural validity did not confirm the presumed unidimensionality of the scale and found evidence of an underlying two-factor solution with a physical and cognitive domain. Sufficient test–retest reliability and internal consistency were found. In addition, this study provides first results for the responsiveness of the German version. Trial registration The study was registered at the German Clinical Trials Register. Trial registration number: DRKS00011933, date of registration: 07.04.2017

Download Full-text