Putting the individual into reliability: Bayesian testing of homogeneous within-person variance in hierarchical models

AbstractMeasurement reliability is a fundamental concept in psychology. It is traditionally considered a stable property of a questionnaire, measurement device, or experimental task. Although intraclass correlation coefficients (ICC) are often used to assess reliability in repeated measure designs, their descriptive nature depends upon the assumption of a common within-person variance. This work focuses on the presumption that each individual is adequately described by the average within-person variance in hierarchical models. And thus whether reliability generalizes to the individual level, which leads directly into the notion of individually varying ICCs. In particular, we introduce a novel approach, using the Bayes factor, wherein a researcher can directly test for homogeneous within-person variance in hierarchical models. Additionally, we introduce a membership model that allows for classifying which (and how many) individuals belong to the common variance model. The utility of our methodology is demonstrated on cognitive inhibition tasks. We find that heterogeneous within-person variance is a defining feature of these tasks, and in one case, the ratio between the largest to smallest within-person variance exceeded 20. This translates into a tenfold difference in person-specific reliability! We also find that few individuals belong to the common variance model, and thus traditional reliability indices are potentially masking important individual variation. We discuss the implications of our findings and possible future directions. The methods are implemented in the R package vICC

Download Full-text

Putting the Individual into Reliability: Bayesian Testing of Homogeneous Within-Person Variance in Hierarchical Models

10.31234/osf.io/hpq7w ◽

2019 ◽

Author(s):

Donald Ray Williams ◽

Stephen Ross Martin ◽

Philippe Rast

Keyword(s):

Hierarchical Models ◽

Bayes Factor ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Repeated Measure ◽

Common Variance ◽

Variance Model ◽

Reliability Indices ◽

The Common ◽

The Individual

Measurement reliability is a fundamental concept in psychology. It is traditionally considered a stable property of a questionnaire, measurement device, or experimental task. Although intraclass correlation coefficients (ICC) are often used to assess reliability in repeated measure designs, their descriptive nature depends upon the assumption of a commonwithin-person variance.This work focuses on the presumption that each individual is adequately described by the average within-person variance in hierarchical models. And thus whether reliability generalizes to the individual level, which leads directly into the notion of individually varying ICCs. In particular, we introduce a novel approach, using the Bayes factor, wherein a researcher can directly test for homogeneous within-person variance in hierarchical models. Additionally, we introduce a membership model that allows for classifying which (and how many) individuals belong to the common variance model. The utility of our methodology is demonstrated on cognitive inhibition tasks. We find that heterogeneous within-person variance is a defining feature of these tasks, and in one case, the ratio between the largest to smallest within-person variance exceeded 20. This translates into a 10 fold difference in person-specific reliability! We also find that few individuals belong to the common variance model, and thus traditional reliability indices are potentially masking important individual variation. We discuss the implications of our findings and possible future directions. The methods are implemented in the R package vICC.

Download Full-text

Research on calcifications in the cervical region using panoramic and teleradiographic techniques

RGO - Revista Gaúcha de Odontologia ◽

10.1590/1981-863720160003000063211 ◽

2016 ◽

Vol 64 (3) ◽

pp. 280-286

Author(s):

Rosangela Sayuri Saga KAMIKAWA ◽

Ricardo RAITZ ◽

Marlene Fenyo PEREIRA

Keyword(s):

Carotid Artery ◽

Common Carotid Artery ◽

Soft Tissues ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Cervical Region ◽

Panoramic Radiographs ◽

Intraclass Correlation Coefficients ◽

Gutta Percha ◽

The Common

ABSTRACT Objective: The aim of the study was to evaluate the contribution of lateral and frontal teleradiographs to the identification and location of calcifications in soft tissues, when compared with those observed in panoramic radiographs. Methods: Radiopaque references in gutta-percha were placed unilaterally on the heads of three cadavers, endeavoring at all times to keep to the same level as the bifurcation of the common carotid artery in different structures, sites of possible calcifications, and three radiographic incidences were obtained for each anatomic part. Thus, the sample of this study was composed of 27 panoramic radiographs, 27 lateral teleradiographs and 27 frontal teleradiographs, totaling 81 radiographs. Results: According to the criteria of Cicchetti and Sparrow, the intraclass correlation coefficients (ICCs) obtained were below 0.40. Conclusion: It can be concluded that the lateral and frontal teleradiographs did not contribute efficiently to the identification and location of radiopacities in the cervical region, and that the anatomic conformation interferes in the observation of the presence of radiopacity in the cervical region.

Download Full-text

Genetic variation in the quantitative levels of an NADP (H)-binding protein (FX) in human erythrocytes

Blood ◽

10.1182/blood.v57.2.209.209 ◽

1981 ◽

Vol 57 (2) ◽

pp. 209-217 ◽

Cited By ~ 4

Author(s):

L Lenzerini ◽

U Benatti ◽

A Morelli ◽

S Pontremoli ◽

A De Flora ◽

...

Keyword(s):

Genetic Variation ◽

Preliminary Data ◽

Binding Protein ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Red Cell ◽

Genetic Origin ◽

Intraclass Correlation Coefficients ◽

The Individual

Abstract FX is a red cell NADP(H)-binding protein that has been well defined biochemically and immunologically but whose function is still unknown. Preliminary data indicated that the levels of this protein are significantly increased in hemizygotes, heterozygotes, and homozygotes for the G6PD Mediterranean mutant, thus raising the question of whether or not the individual variation in FX levels is more or less directly influenced by X-linked genes. The present study, based on a large series of population and family data collected in Sardinia, confirms unequivocally the above mentioned interaction, but shows at the same time that the variances in FX levels “between sibships” are 2–3 times larger than those “within sibships,” when the analysis is done separately for the G6PD-normal or the G6PD-deficient sibs. From the comparison of the interclass and intraclass correlation coefficients, it appears that about 60% of the total variation of FX is of genetic origin. Moreover, the FX levels of children, analyzed in a pairwise manner, were found to be more positively correlated with those of their fathers (r = 0.39) than with those of their maternal grandfathers (0.20). This latter finding obviously favors the conclusion that “autosomal”; rather than “X-linked” genes are involved in the determination of the FX levels.

Download Full-text

Reliability and Validity of the Posture and Fine Motor Assessment of Infants

The Occupational Therapy Journal of Research ◽

10.1177/153944928900900501 ◽

1989 ◽

Vol 9 (5) ◽

pp. 259-272 ◽

Cited By ~ 3

Author(s):

Jane Case-Smith

Keyword(s):

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Fine Motor ◽

Motor Assessment ◽

Intraclass Correlation Coefficients ◽

Highly Correlated ◽

Peabody Developmental Motor Scales ◽

The Individual ◽

Test Retest Reliability

The Posture and Fine Motor Assessment of Infants (PFMAI) (Case-Smith, 1987) is a newly developed instrument for assessing the quality of motor function in infants. The test measures components of posture and fine motor control as they first develop. The purpose of this study was to support the test's reliability and validity. Interrater reliability, analyzed with intraclass correlation coefficients (ICCs), was high (.989 for total scores). Test-retest reliability, measured by ICCs, was .853 and .913 for the two test sections. The PFMAI demonstrated concurrent validity with the Peabody Developmental Motor Scales, Revised (Folio & Fewell, 1983) (correlations were .673 and .829 for the individual sections). Scores on the PFMAI were highly correlated with the infant's ages (.892 to .941); this finding provided one indication of construct validity.

Download Full-text

The reliability of the augmented Lehnert-Schroth and Rigo classification in scoliosis management

South African Journal of Physiotherapy ◽

10.4102/sajp.v77i2.1568 ◽

2021 ◽

Vol 77 (2) ◽

Author(s):

Burçin Akçay ◽

Tuğba Kuru Çolak ◽

Adnan Apti ◽

İlker Çolak ◽

Önder Kızıltaş

Keyword(s):

Treatment Plan ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Error Measurement ◽

X Rays ◽

X Ray ◽

Intraclass Correlation Coefficients ◽

Observer Reliability ◽

The Individual ◽

Curve Patterns

Background: In pattern-specific scoliosis exercises and bracing, the corrective treatment plan differs according to different curve patterns. There are a limited number of studies investigating the reliability of the commonly used classifications systems.Objective: To test the reliability of the augmented Lehnert-Schroth (ALS) classification and the Rigo classification.Methods: X-rays and posterior photographs of 45 patients with scoliosis were sent by the first author to three clinicians twice at 1-week intervals. The clinicians classified images according to the ALS and Rigo classifications, and the data were analysed using SPSS V-16. Intraclass correlation coefficients (ICCs) and standard error measurement (SEM) were calculated to evaluate the inter- and intra-observer reliability.Results: The inter-observer ICC values were 0.552 (ALS), 0.452 (Rigo) for X-ray images and 0.494 (ALS), 0.518 (Rigo) for the photographs. The average intra-observer ICC value was 0.720 (ALS), 0.581 (Rigo) for the X-ray images and 0.726 (ALS) and 0.467 (Rigo) for the photographs.Conclusions: The results of our study indicate moderate inter-observer reliability for X-ray images using the ALS classification and clinical photographs using the Rigo classification. Intra-observer reliability was moderate to good for X-ray images and clinical photographs using the ALS classification and poor to moderate for X-ray and clinical photographs using the Rigo classification.Clinical implications: Pattern classifications assist in creating a plan and indication of correction in specific scoliosis physiotherapy and pattern-specific brace applications and surgical treatment. More sub-types are needed to address the individual patterns of curvature. The optimisation of curve classification will likely reduce failures in diagnosis and treatment.

Download Full-text

Reproducibility of the Evolution of Stride Biomechanics During Exhaustive Runs

Journal of Human Kinetics ◽

10.1515/hukin-2017-0184 ◽

2018 ◽

Vol 64 (1) ◽

pp. 57-69

Author(s):

Géraldine Martens ◽

Dorian Deflandre ◽

Cédric Schwartz ◽

Nadia Dardenne ◽

Thierry Bury

Keyword(s):

Intraclass Correlation ◽

Three Dimensional ◽

Correlation Coefficients ◽

Step Length ◽

Stride Frequency ◽

Time To Exhaustion ◽

Centre Of Gravity ◽

Maximal Aerobic Speed ◽

S Period ◽

The Individual

Abstract Running biomechanics and its evolution that occurs over intensive trials are widely studied, but few studies have focused on the reproducibility of stride evolution in these runs. The purpose of this investigation was to assess the reproducibility of changes in eight biomechanical variables during exhaustive runs, using three-dimensional analysis. Ten male athletes (age: 23 ± 4 years; maximal oxygen uptake: 57.5 ± 4.4 ml02·min-1·kg-1; maximal aerobic speed: 19.3 ± 0.8 km·h-1) performed a maximal treadmill test. Between 3 to 10 days later, they started a series of three time-to-exhaustion trials at 90% of the individual maximal aerobic speed, seven days apart. During these trials eight biomechanical variables were recorded over a 20-s period every 4 min until exhaustion. The evolution of a variable over a trial was represented as the slope of the linear regression of these variables over time. Reproducibility was assessed with intraclass correlation coefficients and variability was quantified as standard error of measurement. Changes in five variables (swing duration, stride frequency, step length, centre of gravity vertical and lateral amplitude) showed moderate to good reproducibility (0.48 ≤ ICC ≤ 0.72), while changes in stance duration, reactivity and foot orientation showed poor reproducibility (-0.71 ≤ ICC ≤ 0.04). Fatigue-induced changes in stride biomechanics do not follow a reproducible course across the board; however, several variables do show satisfactory stability: swing duration, stride frequency, step length and centre of gravity shift.

Download Full-text

Reliability Analysis of Traditional and Ballistic Bench Press Exercises at Different Loads

Journal of Human Kinetics ◽

10.1515/hukin-2015-0061 ◽

2015 ◽

Vol 47 (1) ◽

pp. 51-59 ◽

Cited By ~ 11

Author(s):

Amador García-Ramos ◽

Paulino Padial ◽

Miguel García-Ramos ◽

Javier Conde-Pipó ◽

Javier Argüelles-Cienfuegos ◽

...

Keyword(s):

Recovery Time ◽

Bench Press ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Upper Body ◽

Physically Active ◽

Velocity Relationship ◽

Body Velocity ◽

The Individual ◽

Test Retest Reliability

Abstract The purpose of this study was to determine test–retest reliability for peak barbell velocity (Vpeak) during the bench press (BP) and bench press throw (BPT) exercises for loads corresponding to 20–70% of one-repetition maximum (1RM). Thirty physically active collegiate men conducted four evaluations after a preliminary BP 1RM determination (1RM·bw-1 = 1.02 ± 0.16 kg·kg-1). In counterbalanced order, participants performed two sessions of the BP in one week and two sessions of the BPT in another week. Recovery time between sessions within the same week was 48 hours and recovery time between sessions of different weeks was 120 hours. On each day of evaluation the individual load-velocity relationship at each tenth percentile (20–70% of 1RM) in a Smith machine for the BP or BPT was determined. Participants performed three attempts per load, but only the best repetition (highest Vpeak), registered by a linear position transducer, was analysed. The BPT resulted in a significantly lower coefficient of variation (CV) for the whole load–velocity relationship, compared to the BP (2.48% vs. 3.22%; p = 0.040). Test–retest intraclass correlation coefficients (ICCs) ranged from r = 0.94-0.85 for the BPT and r = 0.91-0.71 for the BP (p < 0.001). The reduction in the biological within-subject variation in BPT exercise could be promoted by the braking phase that obligatorily occurs during a BP executed with light or moderate loads. Therefore, we recommend the BPT exercise for a most accurate assessment of upper-body velocity.

Download Full-text

Validation of a Classroom Version of the Eating in the Absence of Hunger Paradigm in Preschoolers

Frontiers in Nutrition ◽

10.3389/fnut.2021.787461 ◽

2022 ◽

Vol 8 ◽

Author(s):

Emily E. Hohman ◽

Katherine M. McNitt ◽

Sally G. Eagleton ◽

Lori A. Francis ◽

Kathleen L. Keller ◽

...

Keyword(s):

Eating Behavior ◽

Weight Status ◽

Eating Behaviors ◽

Ecological Validity ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Free Access ◽

Classroom Setting ◽

Significant Difference ◽

The Individual

Eating in the absence of hunger (EAH), a measure of children's propensity to eat beyond satiety in the presence of highly palatable food, has been associated with childhood obesity and later binge eating behavior. The EAH task is typically conducted in a research laboratory setting, which is resource intensive and lacks ecological validity. Assessing EAH in a group classroom setting is feasible and may be a more efficient alternative, but the validity of the classroom assessment against the traditional individually-administered paradigm has not been tested. The objective of this study was to compare EAH measured in a classroom setting to the one-on-one version of the paradigm in a sample of Head Start preschoolers. Children (n = 35) from three classrooms completed both classroom and individual EAH tasks in a random, counterbalanced order. In the group condition, children sat with peers at their classroom lunch tables; in the individual condition, children met individually with a researcher in a separate area near their classroom. In both conditions, following a meal, children were provided free access to generous portions of six snack foods (~750 kcal) and a selection of toys for 7 min. Snacks were pre- and post-weighed to calculate intake. Parents completed a survey of their child's eating behaviors, and child height and weight were measured. Paired t-tests and intraclass correlation coefficients were used to compare energy intake between conditions, and correlations between EAH intake and child BMI, eating behaviors, and parent feeding practices were examined to evaluate concurrent validity. Average intake was 63.0 ± 50.4 kcal in the classroom setting and 53.7 ± 44.6 in the individual setting, with no significant difference between settings. The intraclass correlation coefficient was 0.57, indicating moderate agreement between conditions. Overall, the EAH protocol appears to perform similarly in classroom and individual settings, suggesting the classroom protocol is a valid alternative. Future studies should further examine the role of age, sex, and weight status on eating behavior measurement paradigms.

Download Full-text

An international tool to measure perceived stressors in intensive care units: the PS-ICU scale

Annals of Intensive Care ◽

10.1186/s13613-021-00846-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Alexandra Laurent ◽

Alicia Fournier ◽

Florent Lheureux ◽

Maria Cruz Martin Delgado ◽

Maria G. Bocci ◽

...

Keyword(s):

Intensive Care ◽

Intensive Care Units ◽

Convergent Validity ◽

Healthcare Professionals ◽

Intraclass Correlation ◽

International Comparisons ◽

Correlation Coefficients ◽

Maslach Burnout Inventory ◽

Stressful Environment ◽

The Individual

Abstract Background The intensive care unit is increasingly recognized as a stressful environment for healthcare professionals. This context has an impact on the health of these professionals but also on the quality of their personal and professional life. However, there is currently no validated scale to measure specific stressors perceived by healthcare professionals in intensive care. The aim of this study was to construct and validate in three languages a perceived stressors scale more specific to intensive care units (ICU). Results We conducted a three-phase study between 2016 and 2019: (1) identification of stressors based on the verbatim of 165 nurses and physicians from 4 countries (Canada, France, Italy, and Spain). We identified 99 stressors, including those common to most healthcare professions (called generic), as well as stressors more specific to ICU professionals (called specific); (2) item elaboration and selection by a panel of interdisciplinary experts to build a provisional 99-item version of the scale. This version was pre-tested with 70 professionals in the 4 countries and enabled us to select 50 relevant items; (3) test of the validity of the scale in 497 ICU healthcare professionals. Factor analyses identified six dimensions: lack of fit with families and organizational functioning; patient- and family-related emotional load; complex/at risk situations and skill-related issues; workload and human resource management issues; difficulties related to team working; and suboptimal care situations. Correlations of the PS-ICU scale with a generic stressors measure (i.e., the Job Content Questionnaire) tested its convergent validity, while its correlations with the Maslach Burnout Inventory-HSS examined its concurrent validity. We also assessed the test–retest reliability of PS-ICU with intraclass correlation coefficients. Conclusions The perceived stressors in intensive care units (PS-ICU) scale have good psychometric properties in all countries. It includes six broad dimensions covering generic or specific stressors to ICU, and thus, enables the identification of work situations that are likely to generate high levels of stress at the individual and unit levels. For future studies, this tool will enable the implementation of targeted corrective actions on which intervention research can be based. It also enables national and international comparisons of stressors’ impact.

Download Full-text

Inter-rater reliability of two paediatric early warning score tools

Dansk Tidsskrift for Akutmedicin ◽

10.7146/akut.v2i3.112944 ◽

2019 ◽

Vol 2 (3) ◽

pp. 37

Author(s):

Claus Sixtus Jensen

Keyword(s):

Early Warning ◽

Intraclass Correlation ◽

Healthcare Providers ◽

Correlation Coefficients ◽

Assessment Tools ◽

Early Warning Score ◽

Rater Reliability ◽

Intraclass Correlation Coefficients ◽

Paediatric Early Warning Score ◽

The Individual

Background: Paediatric early warning score (PEWS) assessment tools can assist healthcare providers in the timely detection and recognition of subtle patient condition changes signalling clinical deterioration. However, PEWS tools instrument data are only as reliable and accurate as the caregivers who obtain and document the parameters. The aim of this study is to evaluate inter-rater reliability among nurses using PEWS systems. Method: The study was carried out in five paediatrics departments in the Central Denmark Region. Inter-rater reliability was investigated through parallel observations. A total of 108 children and 69 nurses participated. Two nurses simultaneously performed a PEWS assessment on the same patient. Before the assessment, the two participating nurses drew lots to decide who would be the active observer. Intraclass correlation coefficient, Fleiss’ κand Bland–Altman limits of agreement were used to determine inter-rater reliability. Results: The intraclass correlation coefficients for the aggregated PEWS score of the two PEWS models were 0.98 and 0.95, respectively. The κvalue on the individual PEWS measurements ranged from 0.70 to 1.0, indicating good to very good agreement. The nurses assigned the exact same aggregated score for both PEWS models in 76% of the cases. In 98% of the PEWS assessments, the aggregated PEWS scores assigned by the nurses were equal to or below 1 point in both models. Conclusion: The study showed good to very good interrater reliability in the two PEWS models used in the Central Denmark Region.

Download Full-text