Robustness of Statistical Power in Group-Randomized Studies of Mediation Under an Optimal Sampling Framework

Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 106-118
Author(s):  
Kyle Cox ◽  
Benjamin Kelcey

Abstract. When planning group-randomized studies probing mediation, effective and efficient sample allocation is governed by several parameters including treatment-mediator and mediator-outcome path coefficients and the mediator and outcome intraclass correlation coefficients. In the design stage, these parameters are typically approximated using information from prior research and these approximations are likely to deviate from the true values eventually realized in the study. This study investigates the robustness of statistical power under an optimal sampling framework to misspecified parameter values in group-randomized designs with group- or individual-level mediators. The results suggest that estimates of statistical power are robust to misspecified parameter values across a variety of conditions and tests. Relative power remained above 90% in most conditions when the incorrect parameter value ranged between 50% and 150% of the true parameter.

2016 ◽  
Vol 40 (6) ◽  
pp. 500-525 ◽  
Author(s):  
Ben Kelcey ◽  
Zuchao Shen ◽  
Jessaca Spybrook

Objective: Over the past two decades, the lack of reliable empirical evidence concerning the effectiveness of educational interventions has motivated a new wave of research in education in sub-Saharan Africa (and across most of the world) that focuses on impact evaluation through rigorous research designs such as experiments. Often these experiments draw on the random assignment of entire clusters, such as schools, to accommodate the multilevel structure of schooling and the theory of action underlying many school-based interventions. Planning effective and efficient school randomized studies, however, requires plausible values of the intraclass correlation coefficient (ICC) and the variance explained by covariates during the design stage. The purpose of this study was to improve the planning of two-level school-randomized studies in sub-Saharan Africa by providing empirical estimates of the ICC and the variance explained by covariates for education outcomes in 15 countries. Method: Our investigation drew on large-scale representative samples of sixth-grade students in 15 countries in sub-Saharan Africa and includes over 60,000 students across 2,500 schools. We examined two core education outcomes: standardized achievement in reading and mathematics. We estimated a series of two-level hierarchical linear models with students nested within schools to inform the design of two-level school-randomized trials. Results: The analyses suggested that outcomes were substantially clustered within schools but that the magnitude of the clustering varied considerably across countries. Similarly, the results indicated that covariance adjustment generally reduced clustering but that the prognostic value of such adjustment varied across countries.


2020 ◽  
Vol 45 (4) ◽  
pp. 446-474
Author(s):  
Zuchao Shen ◽  
Benjamin Kelcey

Conventional optimal design frameworks consider a narrow range of sampling cost structures that thereby constrict their capacity to identify the most powerful and efficient designs. We relax several constraints of previous optimal design frameworks by allowing for variable sampling costs in cluster-randomized trials. The proposed framework introduces additional design considerations and has the potential to identify designs with more statistical power, even when some parameters are constrained due to immutable practical concerns. The results also suggest that the gains in efficiency introduced through the expanded framework are fairly robust to misspecifications of the expanded cost structure and concomitant design parameters (e.g., intraclass correlation coefficient). The proposed framework is implemented in the R package odr.


2017 ◽  
Vol 44 (8) ◽  
pp. 1249-1256 ◽  
Author(s):  
Wineke Armbrust ◽  
G.J.F. Joyce Bos ◽  
Jan H.B. Geertzen ◽  
Pieter J.J. Sauer ◽  
Pieter U. Dijkstra ◽  
...  

Objective.(1) To determine convergent validity of an activity diary (AD) and accelerometer (Actical brand/Phillips-Respironics) in measuring physical activity (PA) in children with juvenile idiopathic arthritis (JIA). (2) To determine how many days give reliable results. (3) To analyze effects of correcting accelerometer data for non-wear.Methods.Patients with JIA (8–13 yrs) were recruited from 3 Dutch pediatric rheumatology centers. PA was assessed for 7 days with an AD and accelerometer, and was expressed as mean min/day of rest, light PA (LPA), moderate to vigorous PA (MVPA), and PA level (PAL). To analyze convergent validity, intraclass correlation coefficients (ICC) were calculated and paired sample Student t tests were performed. The required number of days to achieve reliable results was calculated using the Spearman-Brown prophecy formula.Results.Convergent validity between AD and accelerometer was moderate for rest and PAL (ICC 0.41). ICC for LPA and MVPA were < 0.24. AD overestimated PAL and MVPA compared with the accelerometer. Wearing the accelerometer 7–19 days gave reliable PA estimates on group and individual levels. For the AD, 13–36 days were needed. Adjusting accelerometer data for non-wear resulted in a clinically relevant higher mean number of min/day spent in LPA (effect size 1.12), but not in MVPA (effect size 0.44).Conclusion.Convergent validity between AD and accelerometer is moderate to poor. In children with JIA, 1-week assessment with an accelerometer is sufficient to measure PA (all levels) reliably. On an individual level and for clinical use, 3 weeks are required. Additional use of AD enables correction for non-wear of accelerometer data.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1890
Author(s):  
Kyle Davey ◽  
Paul Read ◽  
Joseph Coyne ◽  
Paul Jarvis ◽  
Anthony Turner ◽  
...  

The aims of the present study are to: (1) determine within- and between-session reliability of multiple metrics obtained during the triple hop test; and (2) determine any systematic bias in both the test and inter-limb asymmetry scores for these metrics. Thirteen male young American football athletes performed three trials of a triple hop test on each leg on two separate occasions. In addition to the total distance hopped, manual detection of touch down and toe-off were calculated via video analysis, enabling flight time (for each hop), ground contact time (GCT), reactive strength index (RSI), and leg stiffness (between hops) to be calculated. Results showed all coefficient of variation (CV) values were ≤ 10.67% and intraclass correlation coefficients (ICC) ranged from moderate to excellent (0.53–0.95) in both test sessions. Intrarater reliability showed excellent reliability for all metrics (CV ≤ 3.60%, ICC ≥ 0.97). No systematic bias was evident between test sessions for raw test scores (g = −0.34 to 0.32) or the magnitude of asymmetry (g = −0.19 to 0.43). However, ‘real’ changes in asymmetry (i.e., greater than the CV in session 1) were evident on an individual level for all metrics. For the direction of asymmetry, kappa coefficients revealed poor-to-fair levels of agreement between test sessions for all metrics (K = −0.10 to 0.39), with the exception of the first hop (K = 0.69). These data show that, given the inherent limitations of distance jumped in the triple hop test, practitioners can confidently gather a range of reliable data when computed manually, provided sufficient test familiarization is conducted. In addition, although the magnitude of asymmetry appears to show only small changes between test sessions, limb dominance does appear to fluctuate between test sessions, highlighting the value of also monitoring the direction of the imbalance.


2006 ◽  
Vol 86 (5) ◽  
pp. 646-655 ◽  
Author(s):  
Ellinor Nordin ◽  
Erik Rosendahl ◽  
Lillemor Lundin-Olsson

Abstract Background and Purpose. It is unknown how cognitive impairment affects the reliability of Timed “Up & Go” Test (TUG) scores. The aim of the present study was to investigate the expected variability of TUG scores in older subjects dependent in activities of daily living (ADL) and with different levels of cognitive state. The hypothesis was that cognitive impairment would increase the variability of TUG scores. Subjects. Seventy-eight subjects with multiple impairments, dependent in ADL, and living in residential care facilities were included in this study. The subjects were 84.8±5.7 (mean±SD) years of age, and their Mini-Mental State Examination score was 18.7±5.6. Methods. The TUG assessments were performed on 3 different days. Intrarater and interrater analyses were carried out. Results. Cognitive impairment was not related to the size of the variability of TUG scores. There was a significant relationship between the variability and the time taken to perform the TUG. The intraclass correlations were greater than .90 and were similar within and between raters. In repeated measurements at the individual level, an observed value of 10 seconds was expected to vary from 7 to 15 seconds and an observed value of 40 seconds was expected to vary from 26 to 61 seconds for 95% of the observations. Discussion and Conclusion. The measurement error of a TUG assessment is substantial for a frail older person dependent in ADL, regardless of the level of cognitive function, when verbal cuing is permitted during testing. The variability increases with the time to perform the TUG. Despite high intraclass correlation coefficients, the ranges of expected variability can be wide and are similar within and between raters. Physical therapists should be aware of this variability before they interpret the TUG score for a particular individual.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Donald D. Anderson ◽  
Neil A. Segal ◽  
Andrew M. Kern ◽  
Michael C. Nevitt ◽  
James C. Torner ◽  
...  

Recent findings suggest that contact stress is a potent predictor of subsequent symptomatic osteoarthritis development in the knee. However, much larger numbers of knees (likely on the order of hundreds, if not thousands) need to be reliably analyzed to achieve the statistical power necessary to clarify this relationship. This study assessed the reliability of new semiautomated computational methods for estimating contact stress in knees from large population-based cohorts. Ten knees of subjects from the Multicenter Osteoarthritis Study were included. Bone surfaces were manually segmented from sequential 1.0 Tesla magnetic resonance imaging slices by three individuals on two nonconsecutive days. Four individuals then registered the resulting bone surfaces to corresponding bone edges on weight-bearing radiographs, using a semi-automated algorithm. Discrete element analysis methods were used to estimate contact stress distributions for each knee. Segmentation and registration reliabilities (day-to-day and interrater) for peak and mean medial and lateral tibiofemoral contact stress were assessed with Shrout-Fleiss intraclass correlation coefficients (ICCs). The segmentation and registration steps of the modeling approach were found to have excellent day-to-day (ICC 0.93–0.99) and good inter-rater reliability (0.84–0.97). This approach for estimating compartment-specific tibiofemoral contact stress appears to be sufficiently reliable for use in large population-based cohorts.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiaofu Du ◽  
Le Fang ◽  
Jing Guo ◽  
Xiangyu Chen ◽  
Shuoci Su ◽  
...  

AbstractSpot urine (SU) collection is a convenient method commonly used for sodium estimation, but its validity in predicting 24-h urinary sodium (24-hUNa) excretion has not been thoroughly evaluated among the general population. The aim of this study was to comprehensively assess the validity of eight existing methods in predicting 24-hUNa excretion by using SU samples among Chinese adults. We analyzed 1424 representative individuals aged 18 to 69 years. We compared the measured and estimated measurements of 24-hUNa at the population level by examining bias, the correlation, intraclass correlation coefficients (ICCs), receiver operating characteristic (ROC) curves and Bland–Altman plots and analyzed the relative and absolute differences and misclassification at the individual level. The bias for all methods was significant (all p < 0.001), among which the smallest bias was − 7.9 mmol for the Toft formula and the largest bias was − 53.8 mmol for the Mage formula. Correlation coefficients were all less than 0.380, all formulas exhibited an area under the ROC curve below 0.683, and the Bland–Altman plots indicated slightly high dispersion of the estimation biases at higher sodium levels regardless of the formula. The proportions of relative differences > 40% for the eight methods were all over one-third, the proportions of absolute differences > 51.3 mmol/24 h (3 g/day NaCl) were all over 40%, and the misclassification rates (7, 10, and 13 g/day NaCl as cutoff points) were all over 65%. Caution remains due to poor validity between estimated and actual measurements when using the eight formulas to obtain a plausible estimation for surveillance of the Chinese population sodium excretion, and the results do not support the application of SU to estimate sodium intake at the individual level due to its poor performance with respect to classification.


Author(s):  
Pedro L. Valenzuela ◽  
Almudena Montalvo-Perez ◽  
Lidia B. Alejo ◽  
Mario Castellanos ◽  
Jaime Gil-Cabrera ◽  
...  

Purpose: Some power meters are available in both bilateral and unilateral versions. However, despite the popularity of the latter, their validity remains unknown. We aimed to analyze the validity of a unilateral pedal power meter for estimating actual (“bilateral”) power output (PO). Methods: Thirty-three male cyclists were assessed at different POs (steady cycling at 100–500 W, as well as all-out sprints), pedaling cadences (70, 85, and 100 repetitions·min−1), and cycling positions (seated and standing). The PO estimated by a left-only power meter (Favero Assioma Uno) was compared with the actual PO computed by a bilateral power meter (Favero Assioma Duo), and the level of bilateral asymmetry (most- vs least-powerful leg) with the latter system was also computed. Results: Nonsignificant differences, high intraclass correlation coefficients (≥.90), and low coefficients of variation (consistently ≤5% except for low PO levels, ie, 5%–7% at 100 W) were found between Favero Assioma Uno and Favero Assioma Duo. However, although a strong intraclass correlation coefficient (.995) was found between both legs, asymmetry values of 4% to 6% were found for all conditions except when pedaling at the lowest PO (100 W), in which asymmetry increased up to 10% to 13%. Conclusions: Although cyclists tend to present some level of bilateral asymmetry during cycling (particularly at low PO), Favero Assioma Uno provides overall valid estimates of actual PO and is, therefore, an economical alternative to bilateral power meters. Caution is needed, however, when interpreting data at the individual level in cyclists with high levels of asymmetry.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Leah Chapman ◽  
Scott Richardson ◽  
Lori Mcleod ◽  
Eric Rimm ◽  
Juliana Cohen

Abstract Objectives Weighing individual plate waste provides reliable estimates of food intake by physically weighing individual food components to the nearest gram before and after a meal. Weighing aggregate, school-level food waste may be an inexpensive and less time-consuming alternative. However, it has not been determined whether aggregate plate waste is an accurate measure of individual plate waste. This pilot study therefore aimed to evaluate the accuracy of aggregate plate waste for quantifying food waste in a school cafeteria setting in comparison to individual weighed plate waste. Methods This study took place in an urban, low-income school district in Massachusetts. Four elementary schools that shared two identical cafeterias and served the same foods each day participated in the study. Participating students in the four schools had similar demographic characteristics. Cafeterias were randomly assigned to either individual or aggregate plate waste measurements. Plate waste was collected for 4 days from approximately n = 850 students in each cafeteria on the same days. For individual plate waste, the % consumed was calculated for each food item on each student's tray. In the cafeteria with aggregate-level measurements, waste was separated by component (entrée, vegetable, fruit, and milk), and weighed to calculate the % consumed. Intraclass correlation coefficients (ICCs) were calculated to assess the agreement between aggregate plate waste and individual-level plate waste. Results Agreement was excellent for entrées (ICC = 0.90) and vegetables (ICC = 0.78), but poor for milk (ICC = 0.22) and fruits (ICC = 0.23). The overall agreement for all four components combined was excellent (ICC = 0.75). Conclusions Results suggest that aggregate plate waste may be a reasonable measure of individually weighed plate waste, but additional research is warranted. Funding Sources A grant from Arbella Insurance funded the data collection. The current analysis had no funding or support. Supporting Tables, Images and/or Graphs


Sign in / Sign up

Export Citation Format

Share Document