scholarly journals Knowing how effective an intervention, treatment, or manipulation is and increasing replication rates: accuracy in parameter estimation as a partial solution to the replication crisis

Author(s):  
Gjalt - Jorn Ygram Peters ◽  
Rik Crutzen

Although basing conclusions on confidence intervals for effect size estimates is preferred over relying on null hypothesis significance testing alone, confidence intervals in psychology are typically very wide. One reason may be a lack of easily applicable methods for planning studies to achieve sufficiently tight confidence intervals. This paper presents tables and freely accessible tools to facilitate planning studies for the desired accuracy in parameter estimation for a common effect size (Cohen’s d). In addition, the importance of such accuracy is demonstrated using data from the Reproducibility Project: Psychology (RPP). It is shown that the sampling distribution of Cohen’s d is very wide unless sample sizes are considerably larger than what is common in psychology studies. This means that effect size estimates can vary substantially from sample to sample, even with perfect replications. The RPP replications’ confidence intervals for Cohen’s d have widths of around 1 standard deviation (95% confidence interval from 1.05 to 1.39). Therefore, point estimates obtained in replications are likely to vary substantially from the estimates from earlier studies. The implication is that researchers in psychology -and funders- will have to get used to conducting considerably larger studies if they are to build a strong evidence base.

2017 ◽  
Author(s):  
Gjalt - Jorn Ygram Peters

The experimental method is one of the staple methodological tools of the scientific method, and is as such prevalent in the psychological literature. It relies on the process of randomization to create equivalent groups. However, this procedure requires sufficiently large samples to succeed. In the current paper, we introduce tools that are based on the sampling distribution of Cohen’s d and that enable computing the likelihood that randomization succeeded in creating equivalent groups and the required sample size to achieve a desired likelihood of randomization success. The required sample sizes are considerable, and to illustrate this, we compute the likelihood of randomization failure using data from the Reproducability Project: Psychology. It is shown that it is likely that many original studies but also many replications failed to successfully create equivalent groups. For the replications, the mean likelihood of randomization failure was 44.54% (with a 95% confidence interval of [35.03%; 54.05%]) in the most liberal scenario, and 100% in the most conservative scenario. This means that many studies were in fact not experiments: the observed effects were at best conditional upon the value of unknown confounders, and at worst biased. In any case replication is unlikely when the randomization procedure failed to generate equivalent groups in either the original study or the replication. The consequence is that researchers in psychology, but also the funders of research in psychology, will have to get used to conducting considerably larger studies if they are to build a strong evidence base.


2021 ◽  
Vol 10 (2) ◽  
pp. 59-64
Author(s):  
Hugo Salazar ◽  
◽  
Franc Garcia ◽  
Luka Svilar ◽  
Julen Castellano ◽  
...  

The goal of this study was to compare the physical demands of the same team in three different basketball com- petitions (EBA league (EBA), U18 regional league (U18L), and a U18 international tournament (U18T)) during the same season. Data from eleven U18 players (age: 16.92 ± 0.67 years) were collected using inertial movement units. As external load variables, Player Load (PL), accelerations (ACC), decelerations (DEC), changes of direction (COD), and jumps (JUMP) were expressed in their total (t) and high intensity (h) values. The analysis of variances (ANOVA) and effect size (ES, Cohen’s d) with their respective 90% confidence intervals were applied to identify differences between the competitions. U18T showed the highest values in PL, tACC, tDEC, hDEC, tCOD, tJUMP, and hJUMP (small to moderate ES). However, the hACC and hCOD values were greater in EBA (small ES) than in U18L and U18T. In conclusion, all three competitions presented different external load demands for the same group of players. This data could help basketball coaches to optimize the training process based on the competition in which their team plays. Furthermore, data could also indicate the most suitable competition for players’ development.


2005 ◽  
Vol 35 (1) ◽  
pp. 1-20 ◽  
Author(s):  
G. K. Huysamen

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.


2020 ◽  
Author(s):  
Jörn Lötsch ◽  
Alfred Ultsch

Abstract Calculating the magnitude of treatment effects or of differences between two groups is a common task in quantitative science. Standard effect size measures based on differences, such as the commonly used Cohen's, fail to capture the treatment-related effects on the data if the effects were not reflected by the central tendency. "Impact” is a novel nonparametric measure of effect size obtained as the sum of two separate components and includes (i) the change in the central tendency of the group-specific data, normalized to the overall variability, and (ii) the difference in the probability density of the group-specific data. Results obtained on artificial data and empirical biomedical data showed that impact outperforms Cohen's d by this additional component. It is shown that in a multivariate setting, while standard statistical analyses and Cohen’s d are not able to identify effects that lead to changes in the form of data distribution, “Impact” correctly captures them. The proposed effect size measure shares the ability to observe such an effect with machine learning algorithms. It is numerically stable even for degenerate distributions consisting of singular values. Therefore, the proposed effect size measure is particularly well suited for data science and artificial intelligence-based knowledge discovery from (big) and heterogeneous data.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A243-A243
Author(s):  
W Hevener ◽  
B Beine ◽  
J Woodruff ◽  
D Munafo ◽  
C Fernandez ◽  
...  

Abstract Introduction Clinical management of CPAP adherence remains an ongoing challenge. Behavioral and technical interventions such as patient outreach, coaching, troubleshooting, and resupply may be deployed to positively impact adherence. Previous authors have described adherence phenotypes that retrospectively categorize patients by discrete usage patterns. We design an AI model that predictively categorizes patients into previously studied adherence phenotypes and analyzes the statistical significance and effect size of several types of interventions on subsequent CPAP adherence. Methods We collected a cross-sectional cohort of subjects (N = 13,917) with 455 days of daily CPAP usage data acquired. Patient outreach notes and resupply data were temporally synchronized with daily CPAP usage. Each 30-days of usage was categorized into one of four adherence phenotypes as defined by Aloia et al. (2008) including Good Users, Variable Users, Occasional Attempters, and Non-Users. Cross-validation was used to train and evaluate a Recurrent Neural Network model for predicting future adherence phenotypes based on the dynamics of prior usage patterns. Two-sided 95% bootstrap confidence intervals and Cohen’s d statistic were used to analyze the significance and effect size of changes in usage behavior 30-days before and after administration of several resupply interventions. Results The AI model predicted the next 30-day adherence phenotype with an average of 90% sensitivity, 96% specificity, 95% accuracy, and 0.83 Cohen’s Kappa. The AI model predicted the number of days of CPAP non-use, use under 4-hours, and use over 4-hours for the next 30-days with OLS Regression R-squared values of 0.94, 0.88, and 0.95 compared to ground truth. Ten resupply interventions were associated with statistically significant increases in adherence, and ranked by adherence effect size using Cohen’s d. The most impactful were new cushions or masks, with a mean post-intervention CPAP adherence increase of 7-14% observed in Variable User, Occasional Attempter, and Non-User groups. Conclusion The AI model applied past CPAP usage data to predict future adherence phenotypes and usage with high sensitivity and specificity. We identified resupply interventions that were associated with significant increases in adherence for struggling patients. This work demonstrates a novel application for AI to aid clinicians in maintaining CPAP adherence. Support  


2007 ◽  
Vol 21 (2) ◽  
pp. 87-100 ◽  
Author(s):  
James M. Ferrin ◽  
Malachy Bishop ◽  
Timothy N. Tansey ◽  
Michael Frain ◽  
Elizabeth A. Swett ◽  
...  

2018 ◽  
Vol 30 (6) ◽  
pp. 779-789 ◽  
Author(s):  
Mary Sherman Mittelman ◽  
Panayiota Maria Papayannopoulou

Summary/AbstractOur experience evaluating a museum program for people with dementia together with their family members demonstrated benefits for all participants. We hypothesized that participation in a chorus would also have positive effects, giving them an opportunity to share a stimulating and social activity that could improve their quality of life. We inaugurated a chorus for people with dementia and their family caregivers in 2011, which rehearses and performs regularly. Each person with dementia must be accompanied by a friend or family member and must commit to attending all rehearsals and the concert that ensues. A pilot study included a structured assessment, take home questionnaires and focus groups. Analyses of pre-post scores were conducted; effect size was quantified using Cohen's d. Results showed that quality of life and communication with the other member of the dyad improved (Effect size: Cohen's d between 0.32 and 0.72) for people with dementia; quality of life, social support, communication and self-esteem improved (d between 0.29 and 0.68) for caregivers. Most participants stated that benefits included belonging to a group, having a normal activity together and learning new skills. Participants attended rehearsals in spite of harsh weather conditions. The chorus has been rehearsing and performing together for more than 6 years and contributing to its costs. Results of this pilot study suggest that people in the early to middle stage of dementia and their family members and friends can enjoy and learn from rehearsing and performing in concerts that also engage the wider community. It is essential to conduct additional larger studies of the benefits of participating in a chorus, which may include improved quality of life and social support for all, and reduced cognitive decline among people with dementia.


2019 ◽  
Vol 3 (4) ◽  
Author(s):  
Christopher R Brydges

Abstract Background and Objectives Researchers typically use Cohen’s guidelines of Pearson’s r = .10, .30, and .50, and Cohen’s d = 0.20, 0.50, and 0.80 to interpret observed effect sizes as small, medium, or large, respectively. However, these guidelines were not based on quantitative estimates and are only recommended if field-specific estimates are unknown. This study investigated the distribution of effect sizes in both individual differences research and group differences research in gerontology to provide estimates of effect sizes in the field. Research Design and Methods Effect sizes (Pearson’s r, Cohen’s d, and Hedges’ g) were extracted from meta-analyses published in 10 top-ranked gerontology journals. The 25th, 50th, and 75th percentile ranks were calculated for Pearson’s r (individual differences) and Cohen’s d or Hedges’ g (group differences) values as indicators of small, medium, and large effects. A priori power analyses were conducted for sample size calculations given the observed effect size estimates. Results Effect sizes of Pearson’s r = .12, .20, and .32 for individual differences research and Hedges’ g = 0.16, 0.38, and 0.76 for group differences research were interpreted as small, medium, and large effects in gerontology. Discussion and Implications Cohen’s guidelines appear to overestimate effect sizes in gerontology. Researchers are encouraged to use Pearson’s r = .10, .20, and .30, and Cohen’s d or Hedges’ g = 0.15, 0.40, and 0.75 to interpret small, medium, and large effects in gerontology, and recruit larger samples.


2019 ◽  
Author(s):  
Adib Rifqi Setiawan

In this work I investigate about my curiousity. My investigation focused on the implications on claims about student learning that result from choosing between one of two metrics. The metrics are normalized gain g, which is the most common method used in Physics Education Research (PER), and effect size Cohen’s d, which is broadly used in Discipline-Based Education Research (DBER) including Biology Education Research (BER). Data for the analyses came from the research about scientific literacy on Physics and Biology Education from courses at institutions across Indonesia. This work reveals that the bias in normalized gaing can harm efforts to improve student’s scientific literacy by misrepresenting the efficacy of teaching practices across populations of students and across institutions. This work, also, recommends use effect size Cohen’s d for measuring student learning, based on reliability statistical method for calculating student learning.


Sign in / Sign up

Export Citation Format

Share Document