A Dynamic Speech Comprehension Test for Assessing Real-World Listening Ability

Background: Many listeners with hearing loss report particular difficulties with multitalker communication situations, but these difficulties are not well predicted using current clinical and laboratory assessment tools. Purpose: The overall aim of this work is to create new speech tests that capture key aspects of multitalker communication situations and ultimately provide better predictions of real-world communication abilities and the effect of hearing aids. Research Design: A test of ongoing speech comprehension introduced previously was extended to include naturalistic conversations between multiple talkers as targets, and a reverberant background environment containing competing conversations. In this article, we describe the development of this test and present a validation study. Study Sample: Thirty listeners with normal hearing participated in this study. Data Collection and Analysis: Speech comprehension was measured for one-, two-, and three-talker passages at three different signal-to-noise ratios (SNRs), and working memory ability was measured using the reading span test. Analyses were conducted to examine passage equivalence, learning effects, and test–retest reliability, and to characterize the effects of number of talkers and SNR. Results: Although we observed differences in difficulty across passages, it was possible to group the passages into four equivalent sets. Using this grouping, we achieved good test–retest reliability and observed no significant learning effects. Comprehension performance was sensitive to the SNR but did not decrease as the number of talkers increased. Individual performance showed associations with age and reading span score. Conclusions: This new dynamic speech comprehension test appears to be valid and suitable for experimental purposes. Further work will explore its utility as a tool for predicting real-world communication ability and hearing aid benefit.

Download Full-text

The Brain Metastases Symptom Checklist as a novel tool for symptom measurement in patients with brain metastases undergoing whole-brain radiotherapy

Current Oncology ◽

10.3747/co.23.2936 ◽

2016 ◽

Vol 23 (3) ◽

pp. 239 ◽

Cited By ~ 5

Author(s):

D. Rodin ◽

B. Banihashemi ◽

L. Wang ◽

A. Lau ◽

S. Harris ◽

...

Keyword(s):

Brain Metastases ◽

Intraclass Correlation ◽

Whole Brain Radiotherapy ◽

Assessment Tools ◽

Symptom Checklist ◽

Retest Reliability ◽

Brain Radiotherapy ◽

Test Retest Reliability ◽

The Brain ◽

Responsiveness To Change

Purpose We evaluated the feasibility, reliability, and validity of the Brain Metastases Symptom Checklist (BMSC), a novel self-report measure of common symptoms experienced by patients with brain metastases.Methods Patients with first-presentation symptomatic brain metastases (n = 137) referred for whole-brain radiotherapy (WBRT) completed the BMSC at time points before and after treatment. Their caregivers (n = 48) provided proxy ratings twice on the day of consultation to assess reliability, and at week 4 after WBRT to assess responsiveness to change. Correlations with 4 other validated assessment tools were evaluated.Results The symptoms reported on the BMSC were largely mild to moderate, with tiredness (71%) and difficulties with balance (61%) reported most commonly at baseline. Test–retest reliability for individual symptoms had a median intraclass correlation of 0.59 (range: 0.23–0.85). Caregiver proxy and patient responses had a median intraclass correlation of 0.52. Correlation of absolute scores on the BMSC and other symptom assessment tools was low, but consistency in the direction of symptom change was observed. At week 4, change in symptoms was variable, with improvements in weight gain and sleep of 42% and 41% respectively, and worsening of tiredness and drowsiness of 62% and 59% respectively.Conclusions The BMSC captures a wide range of symptoms experienced by patients with brain metastases, and it is sensitive to change. It demonstrated adequate test–retest reliability and face validity in terms of its responsiveness to change. Future research is needed to determine whether modifications to the BMSC itself or correlation with more symptom-specific measures will enhance validity.

Download Full-text

Psychometric properties of gross motor assessment tools for children: a systematic review

BMJ Open ◽

10.1136/bmjopen-2018-021734 ◽

2018 ◽

Vol 8 (10) ◽

pp. e021734 ◽

Cited By ~ 33

Author(s):

Alison Griffiths ◽

Rachel Toovey ◽

Prue E Morgan ◽

Alicia J Spittle

Keyword(s):

Psychometric Properties ◽

Predictive Validity ◽

Methodological Quality ◽

Clinical Utility ◽

Assessment Tools ◽

Motor Assessment ◽

Gross Motor ◽

Retest Reliability ◽

Validity Test ◽

Test Retest Reliability

ObjectiveGross motor assessment tools have a critical role in identifying, diagnosing and evaluating motor difficulties in childhood. The objective of this review was to systematically evaluate the psychometric properties and clinical utility of gross motor assessment tools for children aged 2–12 years.MethodA systematic search of MEDLINE, Embase, CINAHL and AMED was performed between May and July 2017. Methodological quality was assessed with the COnsensus-based Standards for the selection of health status Measurement INstruments checklist and an outcome measures rating form was used to evaluate reliability, validity and clinical utility of assessment tools.ResultsSeven assessment tools from 37 studies/manuals met the inclusion criteria: Bayley Scale of Infant and Toddler Development-III (Bayley-III), Bruininks-Oseretsky Test of Motor Proficiency-2 (BOT-2), Movement Assessment Battery for Children-2 (MABC-2), McCarron Assessment of Neuromuscular Development (MAND), Neurological Sensory Motor Developmental Assessment (NSMDA), Peabody Developmental Motor Scales-2 (PDMS-2) and Test of Gross Motor Development-2 (TGMD-2). Methodological quality varied from poor to excellent. Validity and internal consistency varied from fair to excellent (α=0.5–0.99). The Bayley-III, NSMDA and MABC-2 have evidence of predictive validity. Test–retest reliability is excellent in the BOT-2 (intraclass correlation coefficient (ICC)=0.80–0.99), PDMS-2 (ICC=0.97), MABC-2 (ICC=0.83–0.96) and TGMD-2 (ICC=0.81–0.92). TGMD-2 has the highest inter-rater (ICC=0.88–0.93) and intrarater reliability (ICC=0.92–0.99).ConclusionsThe majority of gross motor assessments for children have good-excellent validity. Test–retest reliability is highest in the BOT-2, MABC-2, PDMS-2 and TGMD-2. The Bayley-III has the best predictive validity at 2 years of age for later motor outcome. None of the assessment tools demonstrate good evaluative validity. Further research on evaluative gross motor assessment tools are urgently needed.

Download Full-text

Reliability of Translated Measures Assessing Dating Violence Among Mexican Adolescents

Violence and Victims ◽

10.1891/0886-6708.21.1.117 ◽

2006 ◽

Vol 21 (1) ◽

pp. 117-127 ◽

Cited By ~ 8

Author(s):

Audrey Hokoda ◽

Luciana Ramos-Lira ◽

Patricia Celaya ◽

Keleigh Vilhauer ◽

Manuel Angeles ◽

...

Keyword(s):

Dating Violence ◽

Internal Consistency ◽

Research Team ◽

Assessment Tools ◽

Dating Relationships ◽

Retest Reliability ◽

Test Retest Reliability ◽

Acceptable Internal Consistency ◽

Reliability Coefficients ◽

Mexican Adolescents

Research on the prevalence and correlates of dating violence in Mexican teens is challenged by the lack of culturally and linguistically appropriate assessment tools. This study modified, translated, and back-translated the Conflict in Adolescent Dating Relationships Inventory (CADRI; Wolfe et al., 2001) and the Attitudes Towards Dating Violence Scales (Price, Byers, & the Dating Violence Research Team, 1999) for Mexican adolescents. Analyses on 307 adolescents (15–18 years old) from Monterrey and Mexicali, Mexico, revealed that most of the translated CADRI subscales and Attitudes Towards Dating Violence Scales had acceptable internal consistency and test-retest reliability coefficients. The study offers some evidence that the measures may be useful in assessing dating violence in Mexican teens.

Download Full-text

Measuring test-retest reliability (TRR) of AMSTAR provides moderate to perfect agreement – a contribution to the discussion of the importance of TRR in relation to the psychometric properties of assessment tools

BMC Medical Research Methodology ◽

10.1186/s12874-021-01231-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Stefanie Bühn ◽

Peggy Ober ◽

Tim Mathes ◽

Uta Wegewitz ◽

Anja Jacobs ◽

...

Keyword(s):

Psychometric Properties ◽

Systematic Reviews ◽

Methodological Quality ◽

Assessment Tools ◽

Measurement Properties ◽

Perfect Agreement ◽

Retest Reliability ◽

Test Retest Reliability ◽

The Impact

Abstract Background Systematic Reviews (SRs) can build the groundwork for evidence-based health care decision-making. A sound methodological quality of SRs is crucial. AMSTAR (A Measurement Tool to Assess Systematic Reviews) is a widely used tool developed to assess the methodological quality of SRs of randomized controlled trials (RCTs). Research shows that AMSTAR seems to be valid and reliable in terms of interrater reliability (IRR), but the test retest reliability (TRR) of AMSTAR has never been investigated. In our study we investigated the TRR of AMSTAR to evaluate the importance of its measurement and contribute to the discussion of the measurement properties of AMSTAR and other quality assessment tools. Methods Seven raters at three institutions independently assessed the methodological quality of SRs in the field of occupational health with AMSTAR. Between the first and second ratings was a timespan of approximately two years. Answers were dichotomized, and we calculated the TRR of all raters and AMSTAR items using Gwet’s AC1 coefficient. To investigate the impact of variation in the ratings over time, we obtained summary scores for each review. Results AMSTAR item 4 (Was the status of publication used as an inclusion criterion?) provided the lowest median TRR of 0.53 (moderate agreement). Perfect agreement of all reviewers was detected for AMSTAR-item 1 with a Gwet’s AC1 of 1, which represented perfect agreement. The median TRR of the single raters varied between 0.69 (substantial agreement) and 0.89 (almost perfect agreement). Variation of two or more points in yes-scored AMSTAR items was observed in 65% (73/112) of all assessments. Conclusions The high variation between the first and second AMSTAR ratings suggests that consideration of the TRR is important when evaluating the psychometric properties of AMSTAR.. However, more evidence is needed to investigate this neglected issue of measurement properties. Our results may initiate discussion of the importance of considering the TRR of assessment tools. A further examination of the TRR of AMSTAR, as well as other recently established rating tools such as AMSTAR 2 and ROBIS (Risk Of Bias In Systematic reviews), would be useful.

Download Full-text

Tracking sentence comprehension: Test-retest reliability in people with aphasia and unimpaired adults

Journal of Neurolinguistics ◽

10.1016/j.jneuroling.2016.06.001 ◽

2016 ◽

Vol 40 ◽

pp. 98-111 ◽

Cited By ~ 4

Author(s):

Jennifer E. Mack ◽

Andrew Zu-Sern Wei ◽

Stephanie Gutierrez ◽

Cynthia K. Thompson

Keyword(s):

Sentence Comprehension ◽

Retest Reliability ◽

Comprehension Test ◽

Test Retest Reliability

Download Full-text

Psychometric properties of the PHQ-9 depression scale in people with multiple sclerosis: a systematic review

10.1101/321653 ◽

2018 ◽

Author(s):

Sarah Patrick ◽

Peter Connick

Keyword(s):

Multiple Sclerosis ◽

Psychometric Properties ◽

Depression Scale ◽

Assessment Tools ◽

Patient Health Questionnaire ◽

Health Questionnaire ◽

Eligibility Criteria ◽

Retest Reliability ◽

Patient Health ◽

Test Retest Reliability

AbstractBackgroundDepression affects approximately 25% of people with MS (pwMS) at any given time. It is however under recognised in clinical practice, in part due to a lack of uptake for brief assessment tools and uncertainty about their psychometric properties. The 9-item Patient Health Questionnaire (PHQ-9) is an attractive candidate for this role.ObjectiveTo synthesise published findings on the psychometric properties of the 9-item Patient Health Questionnaire (PHQ-9) when applied to people with multiple sclerosis (pwMS).Data sourcesPubMed, Medline and ISI Web of Science databases, supplemented by hand-searching of references from all eligible sources.Study eligibility criteriaPrimary literature written in English and published following peer-review with a primary aim to evaluate the performance of the PHQ-9 in pwMS.Outcome measuresPsychometric performance with respect to appropriateness, reliability, validity, responsiveness, precision, interpretability, acceptability, and feasibility.ResultsSeven relevant studies were identified, these were of high quality and included 5080 participants from all MS disease-course groups. Strong evidence was found supporting the validity of the PHQ-9 as a unidimensional measure of depression. Used as a screening tool for major depressive disorder (MDD) with a cut-point of 11, sensitivity was 95% sensitivity and specificity 88.3% (PPV 51.4%, NPV 48.6%). Alternative scoring systems that may address the issue of overlap between somatic features of depression and features of MS per se are being developed, although their utility remains unclear. However data on reliability was limited, and no specific evidence was available on test-retest reliability, responsiveness, acceptability, or feasibility.ConclusionsThe PHQ-9 represents a suitable tool to screen for MDD in pwMS. However use as a diagnostic tool cannot currently be recommended, and the potential value for monitoring depressive symptoms cannot be established without further evidence on test-retest reliability, responsiveness, acceptability, and feasibility.PROSPERO register ID: CRD42017067814

Download Full-text

The reliability and validity of novel clinical strength measures of the upper body in older adults

Hand Therapy ◽

10.1177/1758998320957373 ◽

2020 ◽

Vol 25 (4) ◽

pp. 130-138

Author(s):

Hayley S Legg ◽

Jeff Spindor ◽

Reanne Dziendzielowski ◽

Sarah Sharkey ◽

Joel L Lanovaz ◽

...

Keyword(s):

Older Adults ◽

Upper Limb ◽

Assessment Tools ◽

Upper Body ◽

Minimal Detectable Change ◽

The Novel ◽

Precision Error ◽

Retest Reliability ◽

Strength Assessment ◽

Test Retest Reliability

Introduction Research investigating psychometric properties of multi-joint upper body strength assessment tools for older adults is limited. This study aimed to assess the test–retest reliability and concurrent validity of novel clinical strength measures assessing functional concentric and eccentric pushing activities compared to other more traditional upper limb strength measures. Methods Seventeen participants (6 males and 11 females; 71 ± 10 years) were tested two days apart, performing three maximal repetitions of the novel measurements: vertical push-off test and dynamometer-controlled concentric and eccentric single-arm press. Three maximal repetitions of hand-grip dynamometry and isometric hand-held dynamometry for shoulder flexion, shoulder abduction and elbow extension were also collected. Results For all measures, strong test–retest reliability was shown (all ICC > 0.90, p < 0.001), root-mean-squared coefficient of variation percentage: 5–13.6%; standard error of mean: 0.17–1.15 Kg; and minimal detectable change (90%): 2.1–9.9. There were good to high significant correlations between the novel and traditional strength measures (all r > 0.8, p < 0.001). Discussion The push-off test and dynamometer-controlled concentric and eccentric single-arm press are reliable and valid strength measures feasible for testing multi-joint functional upper limb strength assessment in older adults. Higher precision error compared to traditional uni-planar measures warrants caution when completing comparative clinical assessments over time.

Download Full-text

Correction to: Test-Retest Reliability and Interpretation of Common Concussion Assessment Tools: Findings from the NCAA-DoD CARE Consortium

Sports Medicine ◽

10.1007/s40279-018-0906-4 ◽

2018 ◽

Vol 48 (7) ◽

pp. 1761-1761

Author(s):

Steven P. Broglio ◽

◽

Barry P. Katz ◽

Shi Zhao ◽

Michael McCrea ◽

...

Keyword(s):

Assessment Tools ◽

Retest Reliability ◽

Test Retest Reliability

Download Full-text

Comparative study of psychometric properties of three assessment tools for degenerative rotator cuff disease

Clinical Rehabilitation ◽

10.1177/0269215518796888 ◽

2018 ◽

Vol 33 (2) ◽

pp. 277-284 ◽

Cited By ~ 3

Author(s):

Etienne James-Belin ◽

Anne Laure Roy ◽

Sandra Lasbleiz ◽

Agnès Ostertag ◽

Alain Yelnik ◽

...

Keyword(s):

Rotator Cuff ◽

Psychometric Properties ◽

Intraclass Correlation ◽

Assessment Tools ◽

University Hospital ◽

Rotator Cuff Disease ◽

Retest Reliability ◽

Good For ◽

Test Retest Reliability ◽

Improvement Score

Objective: To compare psychometric properties of Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire, Shoulder Pain and Disability Index (SPADI) and Constant–Murley scale, in patients with degenerative rotator cuff disease (DRCD). Design: Longitudinal cohort. Setting: One French university hospital. Methods: The scales were applied twice at one-week interval before physiotherapy and once after physiotherapy two months later. The perceived improvement after treatment was self-assessed on a numerical scale (0–4). The test–retest reliability of the DASH, SPADI and Constant–Murley scales was assessed before treatment by the intraclass correlation coefficient (ICC). The responsiveness was assessed by the paired t-test ( P < 0.05) and standardized mean difference (SMD). The correlation between the percentage of variation in scale scores and the self-assessed improvement score after treatment was measured by the Spearman coefficient. Results: Fifty-three patients were included. Twenty-six only were available for reliability. The test–retest reliability was very good for the DASH (ICC = 0.97), SPADI (0.95) and Constant–Murley (0.92). The scale score was improved after treatment for each scale ( P < 0.05). The SMD was moderate for the DASH (0.56) and SPADI (0.56) scales, and small for the Constant–Murley (0.44). The correlation between the percentage of variation in scores and self-assessed improvement score after treatment was high, moderate and not significant for the SPADI (0.59, P < 0.0001), DASH (0.42, P < 0.01) and Constant–Murley scales, respectively. Conclusion: The test–retest reliability of the DASH, SPADI and Constant–Murley scales is very good for patients with DRCD. The highest responsiveness was achieved with the SPADI.

Download Full-text