scholarly journals P-value, Confidence Intervals and Statistical Inference: A New Dataset of Misinterpretation

2017 ◽  
Author(s):  
Ziyang Lyu ◽  
Kaiping Peng ◽  
Chuan-Peng Hu

Previous surveys showed that most of students and researchers in psychology misinterpreted P-value and confidence intervals (CIs), yet presenting results in CIs may help them to make better statistical inferences. In this data report, we describe a dataset of 362 valid data from students and researchers in China that replicate these misinterpretations. Part of these data had been reported in [Hu, C.-P., Wang, F., Guo, J., Song, M., Sui, J., & Peng, K. (2016). The replication crisis in psychological research (in Chinese). Advances in Psychological Science, 24(9), 1504–1518 doi:10.3724/SP.J.1042.2016.01504]. This dataset can be used for educational purposes. Also, they can serve as the pilot data for future studies on the relationship between the understanding of P-value/CIs and statistic inference based on P-value/CIs.

2021 ◽  
Author(s):  
Azad Rasul

Abstract Most transmittable diseases appear in a specific season and the effect of climate on COVID-19 is of special interest. This study aimed to investigate the relationship between climatic variables and R0 of COVID-19 cases in one hundred areas around the world. The daily confirmed cases COVID-19 and climatic data of each area per day from January 2020 to March 2021 are utilized in the study. The GWR and MLR methods were used to identify the relationship between R0 of COVID-19 cases and climatic variables. The MLR results showed a significant (p-value < 0.05) weak inverse relationship between R0 of COVID-19 cases and wind speed, but a positive significant (p-value < 0.01) relationship with precipitation. It implies that lower COVID-19 cases were recorded with high wind speed and low precipitations. Based on GWR, R0 of COVID-19 infection against principal climatic variables has found statistically significant using Monte Carlo p-value test and the effect of climatic variables on COVID-19 infection appears to vary geographically. However, besides climatic variables, many socio-economic factors could influence the virus transmission and will be considered in future studies.


Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.


2021 ◽  
Author(s):  
Michele B. Nuijten

Increasing evidence indicates that many published findings in psychology may be overestimated or even false. An often-heard response to this “replication crisis” is to replicate more: replication studies should weed out false positives over time and increase robustness of psychological science. However, replications take time and money – resources that are often scarce. In this chapter, I propose an efficient alternative strategy: a four-step robustness check that first focuses on verifying reported numbers through reanalysis before replicating studies in a new sample.


2018 ◽  
Author(s):  
Jonathon McPhetres

Concerns about the generalizability, veracity, and relevance of social psychological research often resurface within psychology. While many changes are being implemented to improve the integrity of published research and to clarify the publication record, less attention has been given to the questions of relevance. In this short commentary, I offer my perspective on questions of relevance and present some data from the website Reddit. The data show that people care greatly about psychological research—social psychology studies being among the highest upvoted on the subreddit r/science. However, upvotes on Reddit are unrelated to metrics used by researchers to gauge importance (e.g., impact factor, journal rankings and citations), suggesting a disconnect between what psychologists and lay-audiences may see as relevant. I interpret these data in light of the replication crisis and suggest that the spotlight on our field puts greater importance on the need for reform. Whether we like it or not, people care about, share, and use psychological research in their lives, which means we should ensure that our findings are reported accurately and transparently.


2018 ◽  
Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what data resulted, what analysis methods were used and why, and what problems occurred.


2018 ◽  
Vol 32 (7) ◽  
pp. 245 ◽  
Author(s):  
Nella Mutia Arwin ◽  
Suyud Suyud

Pesticide exposure and anemia incidence among horticultural farmers in Cikajang district, Garut in 2016PurposeThis study aimed to determine the relationship of pesticide exposure with anemia.MethodsA cross-sectional design was used in this study. The population in this study were male horticulture farmers domiciled in Cikajang, Garut. A total of 106 farmers were selected as samples and blood samples were taken to determine hemoglobin concentration.ResultsAverage Hb farmer was 16.65 g/dL. Bivariate analysis showed no association between exposure to pesticides which consisted of working period (p = 0.440, OR = 1.944; 95% CI: 0.51 to 7.325), duration of spraying (p = 1.000),  spraying time (p value = 1.000), spraying frequency (p = 1.000, OR = 0.698; 95% CI: 0.091 to 5.334), the dose of pesticides (p = 1.000, OR = 1.244; 95% CI = 0.164 to 9.444), and mixing pesticides (p = 1.000, OR = 1.337; 95% CI: 0.176 to 10.181) with the incidence of anemia. Multivariate analysis showed that working period is the dominant factor affecting the incidence of anemia in horticulture farmers.ConclusionPesticide exposure was not associated with anemia. Therefore, in future studies, it is expected to perform the appropriate biomarker testing to detect the presence of pesticide exposure with the biological effect on the health of farmers.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1198.1-1199
Author(s):  
C. Thurston ◽  
R. Tribbick ◽  
J. Kerns ◽  
F. Dondelinger ◽  
M. Bukhari

Background:A decreased body mass index (BMI) is associated with poorer bone health, a decreased bone mineral density (BMD), and an increased fracture risk. Cardiovascular (CVS) data has shown that the waist:hip ratio is a more robust measurement for CVS outcomes than BMI (1). Waist:hip ratio has never been evaluated as an outcome measure for bone health. Dual-energy x-ray absorptiometry (DEXA) has the capacity to measure average percentage fat in the L1-L4 region and at the hip, and directly relates to the measurement of waist:hip ratio.Objectives:To evaluate the relationship between BMD and average percent fat in a cohort referred for DEXA scanning.Methods:We analysed data routinely collected from patients referred for DEXA between 2004 and 2010 at the Royal Lancaster Infirmary in the North of England. Data collected for these patients included DEXA scans of BMD at the left and right hip, and at the lumbar spine, as well as average percent far and other risk factors for osteoporosis, including the FRAX risk factors. We used only the measures collected at baseline (time of first scan). We modelled the T scores of the BMD measurements using a linear regression model including percentage fat and BMI as explanatory variables, and adjusting for gender, age at scan, and other known risk factors for osteoporosis, including the FRAX risk factors. BMI and average percent fat were standardised.Results:The number of patients included was 33037, (82% female). Results of both regression models are shown in table 1 below. We show the standardised effect size estimates for average percent fat and BMI.Anatomical locationEffect size estimate for average percent fat (95% confidence intervals)P valueEffect size estimate for BMI (95% confidence intervals)P valueLeft neck-0.156 (-0.171, -0.141)<0.001-0.0255 (-0.0441, -0.00701)0.00692Left total-0.225 (-0.241, -0.208)<0.001-0.0680 (-0.0882, -0.0477)<0.001Left Ward’s-0.181 (-0.196, -0.166)<0.001-0.0268 (-0.0456, -0.00813)0.00493Left trochanter-0.263 (-0.281, -0.246)<0.001-0.0667 (-0.0882, -0.0451)<0.001Right neck-0.139 (-0.154, -0.124)<0.001-0.0131 (-0.0317, 0.00549)0.167Right total-0.221 (-0.237, -0.204)<0.001-0.0611 (-0.0811, -0.0411)<0.001Right Ward’s-0.180 (-0.196, -0.165)<0.001-0.0193 (-0.0381, -0.000586)0.0433Right trochanter-0.261 (-0.278, -0.243)<0.001-0.0598 (-0.0810, -0.0386)<0.001Spine (averaged L1-L4)0.219 (0.195, 0.242)<0.001-0.00846 (-0.0379, 0.0206)0.563Conclusion:The analysis shows that average percent fat is a statistically significant predictor for BMD at different anatomical locations, and a larger predictor in comparison to BMI when evaluated in the same model. In the right hip neck and the spine, BMI was not predictive of changes in BMD. Higher average percent fat increases the BMD in the spine, compared to a decline at the hip. Further research is needed to characterise the relationship more precisely and identify whether there is a causal link.References:[1]Obes Rev. 2012 Mar;13(3):275-86. doi: 10.1111/j.1467-789X.2011.00952.xDisclosure of Interests:None declared


2018 ◽  
Author(s):  
Jan Peter De Ruiter

Benjamin et al. (2017) proposed improving the reproducibility of findings in psychological research by lowering the alpha level of our conventional Null Hypothesis Significance Tests from .05 to .005, because findings with p-values close to .05 represent insufficient empirical evidence. They argued that findings with a p-value between 0.005 and 0.05 should still be published, but not called “significant” anymore.This proposal was criticized and rejected in a response by Lakens et al. (2018), who argued that instead of lowering the traditional alpha threshold to .005, we should stop using the term “statistically significant”, and require researchers to determine and justify their alpha levels before they collect data.In this contribution, I argue that the arguments presented by Lakens et al. against the proposal by Benjamin et al (2017) are not convincing. Thus, given that it is highly unlikely that our field will abandon the NHST paradigm any time soon, lowering our alpha level to .005 is at this moment the best way to combat the replication crisis in psychology.


2018 ◽  
Author(s):  
Michele B. Nuijten

SUMMARY DOCTORAL DISSERTATION: Psychology is facing a “replication crisis”. Many psychological findings could not be replicated in novel samples, which lead to the growing concern that many published findings are overly optimistic or even false. In this dissertation, we investigated potential indicators of problems in the published psychological literature. In Part I of this dissertation, we looked at inconsistencies in reported statistical results in published psychology papers. To facilitate our research, we developed the free tool statcheck; a “spellchecker” for statistics. In Part II, we investigated bias in published effect sizes. We showed that in the presence of publication bias, the overestimation of effects can become worse if you combine studies. Indeed, in meta-analyses from the social sciences we found strong evidence that published effects are overestimated. These are worrying findings, and it is important to think about concrete solutions to improve the quality of psychological research. Some of the solutions we propose are preregistration, replication, and transparency. We argue that to select the best strategies to improve psychological science, we need research on research: meta-research.


Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.


Sign in / Sign up

Export Citation Format

Share Document