scholarly journals Lay people are unimpressed by the effect sizes typically reported in psychological science

2020 ◽  
Author(s):  
Jonathon McPhetres ◽  
Gordon Pennycook

It is recommended that researchers report effect sizes along with statistical results to aid in interpreting the magnitude of results. According to recent surveys of published research, psychologists typically find effect sizes ranging from r = .11 to r = .30. While these numbers may be informative for scientists, no research has examined how lay people perceive the range of effect sizes typically reported in psychological research. In two studies, we showed online participants (N = 1,204) graphs depicting a range of effect sizes in different formats. We demonstrate that lay people perceive psychological effects to be small, rather meaningless, and unconvincing. Even the largest effects we examined (corresponding to a Cohen’s d = .90), which are exceedingly uncommon in reality, were considered small-to-moderate in size by lay people. Science communicators and policymakers should consider this obstacle when attempting to communicate the effectiveness of research results.

2005 ◽  
Vol 29 (5) ◽  
pp. 615-620 ◽  
Author(s):  
Marnie E. Rice ◽  
Grant T. Harris

2019 ◽  
Vol 3 (4) ◽  
Author(s):  
Christopher R Brydges

Abstract Background and Objectives Researchers typically use Cohen’s guidelines of Pearson’s r = .10, .30, and .50, and Cohen’s d = 0.20, 0.50, and 0.80 to interpret observed effect sizes as small, medium, or large, respectively. However, these guidelines were not based on quantitative estimates and are only recommended if field-specific estimates are unknown. This study investigated the distribution of effect sizes in both individual differences research and group differences research in gerontology to provide estimates of effect sizes in the field. Research Design and Methods Effect sizes (Pearson’s r, Cohen’s d, and Hedges’ g) were extracted from meta-analyses published in 10 top-ranked gerontology journals. The 25th, 50th, and 75th percentile ranks were calculated for Pearson’s r (individual differences) and Cohen’s d or Hedges’ g (group differences) values as indicators of small, medium, and large effects. A priori power analyses were conducted for sample size calculations given the observed effect size estimates. Results Effect sizes of Pearson’s r = .12, .20, and .32 for individual differences research and Hedges’ g = 0.16, 0.38, and 0.76 for group differences research were interpreted as small, medium, and large effects in gerontology. Discussion and Implications Cohen’s guidelines appear to overestimate effect sizes in gerontology. Researchers are encouraged to use Pearson’s r = .10, .20, and .30, and Cohen’s d or Hedges’ g = 0.15, 0.40, and 0.75 to interpret small, medium, and large effects in gerontology, and recruit larger samples.


2018 ◽  
Author(s):  
Jonathon McPhetres

Concerns about the generalizability, veracity, and relevance of social psychological research often resurface within psychology. While many changes are being implemented to improve the integrity of published research and to clarify the publication record, less attention has been given to the questions of relevance. In this short commentary, I offer my perspective on questions of relevance and present some data from the website Reddit. The data show that people care greatly about psychological research—social psychology studies being among the highest upvoted on the subreddit r/science. However, upvotes on Reddit are unrelated to metrics used by researchers to gauge importance (e.g., impact factor, journal rankings and citations), suggesting a disconnect between what psychologists and lay-audiences may see as relevant. I interpret these data in light of the replication crisis and suggest that the spotlight on our field puts greater importance on the need for reform. Whether we like it or not, people care about, share, and use psychological research in their lives, which means we should ensure that our findings are reported accurately and transparently.


2018 ◽  
Author(s):  
Richard Anthony Klein ◽  
Michelangelo Vianello ◽  
Fred Hasselman ◽  
Byron Gregory Adams ◽  
Reginald B. Adams ◽  
...  

We conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer reviewed in advance to examine variation in effect magnitudes across sample and setting. Each protocol was administered to approximately half of 125 samples and 15,305 total participants from 36 countries and territories. Using conventional statistical significance (p < .05), fifteen (54%) of the replications provided evidence in the same direction and statistically significant as the original finding. With a strict significance criterion (p < .0001), fourteen (50%) provide such evidence reflecting the extremely high powered design. Seven (25%) of the replications had effect sizes larger than the original finding and 21 (75%) had effect sizes smaller than the original finding. The median comparable Cohen’s d effect sizes for original findings was 0.60 and for replications was 0.15. Sixteen replications (57%) had small effect sizes (< .20) and 9 (32%) were in the opposite direction from the original finding. Across settings, 11 (39%) showed significant heterogeneity using the Q statistic and most of those were among the findings eliciting the largest overall effect sizes; only one effect that was near zero in the aggregate showed significant heterogeneity. Only one effect showed a Tau > 0.20 indicating moderate heterogeneity. Nine others had a Tau near or slightly above 0.10 indicating slight heterogeneity. In moderation tests, very little heterogeneity was attributable to task order, administration in lab versus online, and exploratory WEIRD versus less WEIRD culture comparisons. Cumulatively, variability in observed effect sizes was more attributable to the effect being studied than the sample or setting in which it was studied.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 650-650
Author(s):  
Sangwoo Ahn ◽  
Joel Anderson

Abstract Given the lack of a cure for Alzheimer’s disease (AD), the number of people with AD is expected to surge unless the onset is delayed. Although there have been efforts to examine the effects of single-domain neuroprotective interventions on cognition, no conclusive results have been found so far. Due to the multifactorial causes of AD, interventions combining multiple neuroprotective components may induce more beneficial effects. However, there are few comprehensive reviews evaluating the effects of multi-domain programs on cognition. Thus, the purpose of this systematic review was to evaluate the effects of currently available multi-component interventions on cognition such as global cognition, episodic memory, and/or executive function affected early in AD. The literature search was conducted using PubMed, CINAHL, Web of Science, Scopus, and PsycINFO up to September 2020. Of the 1,445 articles located, 17 met eligibility criteria (n = 10,056, mean age = 72.8 years). According to the Effective Public Health Practice Project Quality Assessment Tool for Quantitative Studies, 8 and 9 studies had strong and moderate overall quality, respectively. The effect sizes of each included study were calculated using Cohen’s d. Multi-component interventions comprising physical activity, cognitive exercise, cardioprotective nutrition, and/or cardiovascular health consultation/education exerted beneficial effects on cognition (very small to moderate effect sizes; Cohen’s d = 0.16 to 0.77). Clinically, health care providers are recommended to consider those elements to potentially stave off AD. There is a pressing need for researchers to identify optimally effective doses of neuroprotective multi-component interventions.


Author(s):  
Shaul Kimhi ◽  
Yohanan Eshel ◽  
Hadas Marciano ◽  
Bruria Adini

Considering the potential impact of COVID-19 on the civil society, a longitudinal study was conducted to identify levels of distress, resilience, and the subjective well-being of the population. The study is based on two repeated measurements conducted at the end of the pandemic’s “first wave” and the beginning of the “second wave” on a sample (n = 906) of Jewish Israeli respondents, who completed an online questionnaire distributed by an Internet panel company. Three groups of indicators were assessed: signs of distress (sense of danger, distress symptoms, and perceived threats), resilience (individual, community, and national), and subjective well-being (well-being, hope, and morale). Results indicated the following: (a) a significant increase in distress indicators, with effect sizes of sense of danger, distress symptoms, and perceived threats (Cohen’s d 0.614, 0.120, and 0.248, respectively); (b) a significant decrease in resilience indicators, with effect sizes of individual, community, and national resilience (Cohen’s d 0.153, 0.428, and 0.793, respectively); and (c) a significant decrease in subjective well-being indicators with effect sizes of well-being, hope, and morale (Cohen’s d 0.116, 0.336, and 0.199, respectively). To conclude, COVID-19 had a severe, large-scale impact on the civil society, leading to multidimensional damage and a marked decrease in the individual, community, and national resilience of the population.


2020 ◽  
Author(s):  
Kathleen Joyce Porter ◽  
Katherine E. Moon ◽  
Virginia T. LeBaron ◽  
Jamie Marie Zoellner

BACKGROUND Addressing the modifiable health behaviors of cancer survivors is important in rural communities disproportionately impacted by cancer, such as those in Central Appalachia. Yet, such efforts are limited and existing interventions may not meet the needs of rural communities. OBJECTIVE To describe the development and proof-of-concept testing of weSurvive, a behavioral intervention for rural, Appalachian cancer survivors. METHODS The ORBIT Model, a systematic model for designing behavioral interventions, informed the study design. An advisory team (n=10) of community stakeholders and researchers engaged in a participatory process to identify desirable features for an intervention targeting rural cancer survivors. The resulting multi-modal, 13-week weSurvive intervention was tested with two cohorts of participants (n=12). Intervention components include in-person group classes and group and individualized telehealth calls. Indicators reflecting five feasibility domains (acceptability, demand, practicality, implementation, and limited-efficacy) were measured using concurrent mixed methods. Pre-post changes and effect sizes were assessed for limited-efficacy data. Descriptive statistics and content analysis were used to summarize data for other domains. RESULTS Participants reported high program satisfaction (acceptability). Indicators of demand included enrollment of cancer survivors with a variety of cancer types and attrition (8%), recruitment (59%), and attendance (62%) rates. Dietary (59%) and physical activity (83%) behaviors were the most frequently chosen behavioral targets. However, findings indicate that participants did not fully engage with action planning activities, including setting specific goals. Implementation indicators showed 100% researcher fidelity to delivery and retention protocols, while practicality indicators highlighted participation barriers. Pre-post changes in limited-efficacy outcomes regarding cancer-specific beliefs/knowledge and behavior-specific self-efficacy, intentions, and behaviors were in the desired directions and demonstrated small and moderate effect sizes. In regards to dietary and physical activity behaviors, effect sizes for fruit and vegetable intake, snack foods, dietary fat, and minutes of moderate-vigorous activity were small (Cohen’s d = 0.00 to 0.32), while the effect sizes for change in physical activity were small to medium (Cohen’s d = 0.22 to 0.45). CONCLUSIONS weSurvive has the potential to be a feasible intervention for rural Appalachian cancer survivors. weSurvive will be refined and further tested based on study findings, which also provide recommendations for other behavioral interventions targeting rural cancer survivors. Recommendations include adding additional recruitment and engagement strategies to increase demand and practicality as well as increasing accountability and motivation for participant involvement in self-monitoring activities through the use of technology (e.g., text messaging). Furthermore, this study highlights the importance of using a systematic model (e.g., the ORBIT framework) and small scale proof-of-concept studies when adapting or developing behavioral interventions, as doing so identifies the intervention potential for feasibility and identifies areas needing improvement prior to the more time and resource-intensive efficacy testing. CLINICALTRIAL n/a because not an RCT


2019 ◽  
Author(s):  
Christopher Brydges

Background and Objectives: Researchers typically use Cohen’s guidelines of Pearson’s r = .10, .30, and .50, and Cohen’s d = 0.20, 0.50, and 0.80 to interpret observed effect sizes as small, medium, or large, respectively. However, these guidelines were not based on quantitative estimates, and are only recommended if field-specific estimates are unknown. The current study investigated the distribution of effect sizes in both individual differences research and group differences research in gerontology to provide estimates of effect sizes in the field.Research Design and Methods: Effect sizes (Pearson’s r, Cohen’s d, and Hedges’ g) were extracted from meta-analyses published in ten top-ranked gerontology journals. The 25th, 50th, and 75th percentile ranks were calculated for Pearson’s r (individual differences) and Cohen’s d or Hedges’ g (group differences) values as indicators of small, medium, and large effects. A priori power analyses were conducted for sample size calculations given the observed effect size estimates.Results: Effect sizes of Pearson’s r = .12, .20, and .32 for individual differences research and Hedges’ g = 0.16, 0.38, and 0.76 for group differences research were interpreted as small, medium, and large effects in gerontology. Discussion and Implications: Cohen’s guidelines appear to overestimate effect sizes in gerontology. Researchers are encouraged to use Pearson’s r = .10, .20, and .30, and Cohen’s d or Hedges’ g = 0.15, 0.40, and 0.75 to interpret small, medium, and large effects in gerontology, and recruit larger samples.


2017 ◽  
Author(s):  
Coosje Lisabet Sterre Veldkamp

THE HUMAN FALLIBILITY OF SCIENTISTSDealing with error and bias in academic researchRecent studies have highlighted that not all published findings in the scientific literature are trustworthy, suggesting that currently implemented control mechanisms such as high standards for the reporting of research methods and results, peer review, and replication, are not sufficient. In psychology in particular, solutions are sought to deal with poor reproducibility and replicability of research results. In this dissertation project I considered these problems from the perspective that the scien¬tific enterprise must better recognize the human fallibility of scientists, and I examined potential solutions aimed at dealing with human error and bias in psychological science. First, I studied whether the human fallibility of scientists is actually recognized (Chapter 2). I examined the degree to which scientists and lay people believe in the storybook image of the scientist: the image that scientists are more objective, rational, open-minded, intelligent, honest and communal than other human beings. The results suggested that belief in this storybook image is strong, particularly among scientists themselves. In addition, I found indications that scientists believe that scientists like themselves fit the storybook image better than other scientists. I consider scientist’s lack of acknowledgement of their own fallibility problematic, because I believe that critical self-reflection is the first line of defense against potential human error aggravated by confirmation bias, hindsight bias, motivated reasoning, and other human cognitive biases that could affect any professional in their work. Then I zoomed in on psychological science and focused on human error in the use of null the most widely used statistical framework in psychology: hypothesis significance testing (NHST). In Chapters 3 and 4, I examined the prevalence of errors in the reporting of statistical results in published articles, and evaluated a potential best practice to reduce such errors: the so called ‘co-pilot model of statistical analysis’. This model entails a simple code of conduct prescribing that statistical analyses are always conducted independently by at least two persons (typically co-authors). Using statcheck, a software package that is able to quickly retrieve and check statistical results in large sets of published articles, I replicated the alarmingly high error rates found in earlier studies. Although I did not find support for the effectiveness of the co-pilot model in reducing these errors, I proposed several ways to deal with human error in (psychological) research and suggested how the effectiveness of the proposed practices might be studied in future research. Finally, I turned to the risk of bias in psychological science. Psychological data can often be analyzed in many different ways. The often arbitrary choices that researchers face in analyzing their data are called researcher degrees of freedom. Researchers might be tempted to use these researcher degrees of freedom in an opportunistic manner in their pursuit of statistical significance (often called p-hacking). This is problematic because it renders research results unreliable. In Chapter 5 I presented a list of researcher degrees of freedom in psychological studies, focusing on the use of NHST. This list can be used to assess the potential for bias in psychological studies, it can be used in research methods education, and it can be used to examine the effectiveness of a potential solution to restrict oppor¬tunistic use of RDFs: study pre-registration. Pre-registration requires researchers to stipulate in advance the research hypothesis, data collection plan, data analyses, and what will be reported in the paper. Different forms of pre-registration are currently emerging in psychology, mainly varying in terms of the level of detail with respect to the research plan they require researchers to provide. In Chapter 6, I assessed the extent to which current pre-registrations restricted opportunistic use of the researcher degrees of freedom on the list presented in Chapter 5. We found that most pre-registrations were not sufficiently restrictive, but that those that were written following better guidelines and requirements restricted opportunistic use of researcher degrees of freedom considerably better than basic pre-registrations that were written following a limited set of guidelines and requirements. We concluded that better instructions, specific questions, and stricter requirements are necessary in order for pre-registrations to do what they are supposed to do: to protect researchers from their own biases.


Sign in / Sign up

Export Citation Format

Share Document