scholarly journals Beyond Psychology: Prevalence of p-value and confidence interval misinterpretation across different fields

2019 ◽  
Author(s):  
Xiaokang Lyu ◽  
Yuepei Xu ◽  
Xiaofan Zhao ◽  
Xi-Nian Zuo ◽  
Hu Chuan-Peng

P-value and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys revealed that these two indices are generally misunderstood. However, existing surveys on this subject fall under psychology and biomedical research, and data from other disciplines are rare. Moreover, the confidence of researchers when constructing judgments remains unclear. To fill this research gap, we survey 1,479 researchers and students from different fields in China. Results reveal that for significant (p < .05, CI doesn’t include 0) and non-significant (p > .05, CI includes 0) conditions, most respondents, regardless of academic degrees, research fields, and stages of career, could not interpret p-value and CI accurately. Moreover, the majority of them are confident about their (inaccurate) judgments (see osf.io/mcu9q/ for raw data, materials, and supplementary analyses). Therefore, misinterpretations of p-value and CIs prevail in the whole scientific community, thus the need for statistical training in science.

2020 ◽  
Vol 14 ◽  
Author(s):  
Xiao-Kang Lyu ◽  
Yuepei Xu ◽  
Xiao-Fan Zhao ◽  
Xi-Nian Zuo ◽  
Chuan-Peng Hu

Abstract P values and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys have revealed that these two indices are generally misunderstood. However, existing surveys on this subject fall under psychology and biomedical research, and data from other disciplines are rare. Moreover, the confidence of researchers when constructing judgments remains unclear. To fill this research gap, we surveyed 1,479 researchers and students from different fields in China. Results reveal that for significant (i.e., p < .05, CI does not include zero) and non-significant (i.e., p > .05, CI includes zero) conditions, most respondents, regardless of academic degrees, research fields and stages of career, could not interpret p values and CIs accurately. Moreover, the majority were confident about their (inaccurate) judgements (see osf.io/mcu9q/ for raw data, materials, and supplementary analyses). Therefore, as misinterpretations of p values and CIs prevail in the whole scientific community, there is a need for better statistical training in science.


2009 ◽  
Vol 33 (2) ◽  
pp. 87-90 ◽  
Author(s):  
Douglas Curran-Everett

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This third installment of Explorations in Statistics investigates confidence intervals. A confidence interval is a range that we expect, with some level of confidence, to include the true value of a population parameter such as the mean. A confidence interval provides the same statistical information as the P value from a hypothesis test, but it circumvents the drawbacks of that hypothesis test. Even more important, a confidence interval focuses our attention on the scientific importance of some experimental result.


2019 ◽  
Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.


2007 ◽  
Vol 97 (2) ◽  
pp. 165-170 ◽  
Author(s):  
Garry T. Allison

There is a well-known phenomenon of publication bias toward manuscripts that report statistically significant differences. The clinical implications of these statistically significant differences are not always clear because the magnitude of the changes may be clinically meaningless. This article relates the critical P value threshold to the magnitude of the actual observed change and provides a rationale for reporting confidence intervals in clinical studies. Strategies for improving statistical power and reducing the magnitude of the confidence interval range for clinical trials are also described. (J Am Podiatr Med Assoc 97(2): 165–170, 2007)


2021 ◽  
Author(s):  
Rishikesh U Kulkarni ◽  
Catherine L Wang ◽  
Carolyn R Bertozzi

While hierarchical experimental designs are near-ubiquitous in neuroscience and biomedical research, researchers often do not take the structure of their datasets into account while performing statistical hypothesis tests. Resampling-based methods are a flexible strategy for performing these analyses but are difficult due to the lack of open-source software to automate test construction and execution. To address this, we report Hierarch, a Python package to perform hypothesis tests and compute confidence intervals on hierarchical experimental designs. Using a combination of permutation resampling and bootstrap aggregation, Hierarch can be used to perform hypothesis tests that maintain nominal Type I error rates and generate confidence intervals that maintain the nominal coverage probability without making distributional assumptions about the dataset of interest. Hierarch makes use of the Numba JIT compiler to reduce p-value computation times to under one second for typical datasets in biomedical research. Hierarch also enables researchers to construct user-defined resampling plans that take advantage of Hierarch's Numba-accelerated functions. Hierarch is freely available as a Python package at https://github.com/rishi-kulkarni/hierarch.


Author(s):  
Gary G. Yen

Scientific literatures can be organized to serve as a roadmap for researchers by pointing where and when the scientific community has been and is heading to. They present historic and current state-of-the-art knowledge in the interesting areas of study. They also document valuable information including author lists, affiliated institutions, citation information, keywords, etc., which can be used to extract further information that will assist in analyzing their content and relationship with one another. However, their tremendously growing size and the increasing diversity of research fields have become a major concern, especially for organization, analysis, and exploration of such documents. This chapter proposes an automatic scientific literature classification method (ASLCM) that makes use of different information extracted from the literatures to organize and present them in a structured manner. In the proposed ASLCM, multiple similarity information is extracted from all available sources and fused to give an optimized and more meaningful classification through using a genetic algorithm. The final result is used to identify the different research disciplines within the collection, their emergence and termination, major collaborators, centers of excellence, their influence, and the flow of information among the multidisciplinary research areas.


Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.


2016 ◽  
Vol 10 (1) ◽  
pp. 196-200 ◽  
Author(s):  
Varin Sacha ◽  
Demosthenes B. Panagiotakos

It is a fact that p values are commonly used for inference in biomedical and other social fields of research. Unfortunately, the role of p value is very often misused and misinterpreted; that is why it has been recommended the use of resampling methods, like the bootstrap method, to calculate the confidence interval, which provides more robust results for inference than does p value. In this review a discussion is made about the use of p values through hypothesis testing and its alternatives using resampling methods to develop confidence intervals of the tested statistic or effect measure.


2019 ◽  
Vol 6 (1) ◽  
pp. 19-28
Author(s):  
Rakhmie Rafie ◽  
Yusmaidi Yusmaidi ◽  
Mira Fitriyani

Berdasarkan Permenkes 585/1989 dikatakan bahwa informed consent adalah persetujuan yang diberikan oleh pasien atau keluarganya atas dasar penjelasan mengenai tindakan medis yang akan dilakukan terhadap pasien tersebut. Peran dan tanggung jawab dokter terhadap pelaksanaan tindakan medis berdasarkan imformed consent sangat penting untuk mencegah kemungkinan yang akan terjadi kepada pasien nantinya. Pemahaman terhadap informasi yang diberikan dipengaruhi oleh beberapa faktor, diantaranya karakteristik orang tersebut. Survey analitik dengan desain cross sectional dengan wawancara terpimpin menggunakan kuesioner terhadap 100 responden, dan diolah menggunakan analisa univariat dan bivariat dengan uji Chi-Square. Hasil penelitian menunjukkan bahwa: yang berusia dewasa 84 responden (84%) dan yang berusia muda sebanyak 16 responden (16%), laki- laki 63 responden (63%) dan perempuan 37 responden (37%), yang berpendidikan rendah 41 responden (41%) dan yang berpendidikan tinggi 59 responden, yang tidak bekerja 24 responden (24%) sedangkan yang bekerja 76 responden (76%), yang mempunyai pemahaman baik 58 responden (58%) dan yang tidak baik sebanyak 42 responden (42%). Variabel yang terdapat hubungan bermakna dengan pemahaman terhadap persetujuan tindakan medis pada tindakan bedah di RSPBA pada bulan Maret 2015 adalah umur (nilai p value = 0,037) OR = 3.761 dengan nilai Confidence Interval (1.195-11.835)dan pendidikan (nilai p value = 0,00) OR = 8.551 dengan Confidence Interval (3.436-21.285). Sedangkan variabel yang tidak terdapat hubungan bermakna dengan pemahaman persetujuan tindakan medispada tindakan bedah di RSPBA pada bulan Maret 2015 adalah jenis kelamin (nilai p value = 0,987) dan pekerjaan (p value = 0,251). Terdapat hubungan bermakna antara umur dan pendidikan dengan pemahaman terhadap persetujuan tindakan medis pada tindakan bedah di RS Pertamina Bintang Aamin (RSPBA) pada bulan Maret 2015.  


Genetics ◽  
1998 ◽  
Vol 148 (1) ◽  
pp. 525-535
Author(s):  
Claude M Lebreton ◽  
Peter M Visscher

AbstractSeveral nonparametric bootstrap methods are tested to obtain better confidence intervals for the quantitative trait loci (QTL) positions, i.e., with minimal width and unbiased coverage probability. Two selective resampling schemes are proposed as a means of conditioning the bootstrap on the number of genetic factors in our model inferred from the original data. The selection is based on criteria related to the estimated number of genetic factors, and only the retained bootstrapped samples will contribute a value to the empirically estimated distribution of the QTL position estimate. These schemes are compared with a nonselective scheme across a range of simple configurations of one QTL on a one-chromosome genome. In particular, the effect of the chromosome length and the relative position of the QTL are examined for a given experimental power, which determines the confidence interval size. With the test protocol used, it appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome. When the QTL is closer to one end, the likelihood curve of its position along the chromosome becomes truncated, and the nonselective scheme then performs better inasmuch as the percentage of estimated confidence intervals that actually contain the real QTL's position is closer to expectation. The nonselective method, however, produces larger confidence intervals. Hence, we advocate use of the selective methods, regardless of the QTL position along the chromosome (to reduce confidence interval sizes), but we leave the problem open as to how the method should be altered to take into account the bias of the original estimate of the QTL's position.


Sign in / Sign up

Export Citation Format

Share Document