scholarly journals Inferential statistics as descriptive statistics: there is no replication crisis if we don't expect replication

Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what data resulted, what analysis methods were used and why, and what problems occurred.

Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.


Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.


2018 ◽  
Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

There has been much discussion of a "replication crisis" related to statistical inference, which has largely been attributed to overemphasis on and abuse of hypothesis testing. Much of the abuse stems from failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. Considerable non-replication is thus to be expected even with the best reporting practices, and generalizations from single studies are rarely if ever warranted. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what data resulted, what analysis methods were used and why, and what problems occurred.


Author(s):  
Valentin Amrhein ◽  
David Trafimow ◽  
Sander Greenland

There is a massive crisis of confidence in statistical inference, which has largely been attributed to overemphasis on and abuse of hypothesis testing. Much of the abuse stems from failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Unedited and unselected results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. Considerable non-replication is thus to be expected even with honest and complete reporting practices, and generalizations from single studies are rarely if ever warranted. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between model predictions and data, rather than as generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, and what analysis methods were used.


2018 ◽  
Vol 46 (02) ◽  
pp. 150-171 ◽  
Author(s):  
Roberto Rosales ◽  
Isam Atroshi

AbstractStatistics, the science of numerical evaluation, helps in determining the real value of a hand surgical intervention. Clinical research in hand surgery cannot improve without considering the application of the most appropriate statistical procedures. The purpose of the present paper is to approach the basics of data analysis using a database of carpal tunnel syndrome (CTS) to understand the data matrix, the generation of variables, the descriptive statistics, the most appropriate statistical tests based on how data were collected, the parameter estimation (inference statistics) with p-value or confidence interval, and, finally, the important concept of generalized linear models (GLMs) or regression analysis.


Author(s):  
T. K. Hariprasath ◽  
Palati Sinduja ◽  
R. Priyadharshini

Introduction: Palatine Tonsils are paired lymph node organs located on each side of the back of your throat. They function as a defense mechanism and help prevent body from getting an infection. When tonsils become infected, the condition is called tonsillitis. Aim: This article aims to know the knowledge and awareness of dental students on Tonsillitis diseases. Materials and Methods: A questionnaire of 16 questions was created and entered in the online survey creator ‘Google Forms’ and shared among each student of about 100 individually and privately and data were collected subject to statistical analysis using SPSS software. Statistical tests used were descriptive statistics and Chi-square tests. A P-value less than 0.05 will be  considered statistically significant. Results and Conclusion: Results of this study suggest that Third-year students more aware of symptoms of tonsillitis 20%, complications of tonsillitis 20% and symptoms associated with strep throat 18% than students of other year and they need an effective education and awareness campaign to increase their knowledge and awareness on Tonsillitis.


2017 ◽  
Author(s):  
Ziyang Lyu ◽  
Kaiping Peng ◽  
Chuan-Peng Hu

Previous surveys showed that most of students and researchers in psychology misinterpreted P-value and confidence intervals (CIs), yet presenting results in CIs may help them to make better statistical inferences. In this data report, we describe a dataset of 362 valid data from students and researchers in China that replicate these misinterpretations. Part of these data had been reported in [Hu, C.-P., Wang, F., Guo, J., Song, M., Sui, J., & Peng, K. (2016). The replication crisis in psychological research (in Chinese). Advances in Psychological Science, 24(9), 1504–1518 doi:10.3724/SP.J.1042.2016.01504]. This dataset can be used for educational purposes. Also, they can serve as the pilot data for future studies on the relationship between the understanding of P-value/CIs and statistic inference based on P-value/CIs.


1981 ◽  
Vol 41 (4) ◽  
pp. 993-1000 ◽  
Author(s):  
David L. Ronis

It is often interesting to compare the size of treatment effects in analysis of variance designs. Many researchers, however, draw the conclusion that one independent variable has more impact than another without testing the null hypothesis that their impact is equal. Most often, investigators compute the proportion of variance each factor accounts for and infer population characteristics from these values. Because such analyses are based on descriptive rather than inferential statistics, they never justify the conclusion that one factor has more impact than the other. This paper presents a novel technique for testing the relative magnitude of effects. It is recommended that researchers interested in comparing effect sizes apply this technique rather than basing their conclusions solely on descriptive statistics.


Author(s):  
James Carr

Here, we’ll attempt to provide an introduction to what statistics are, some key concepts, and some of the more common tests used in clinical research. It is not a definitive chapter — whole books exist detailing just one of the tests we’ll talk about here and it’s not likely that you have the time or inclination for that! Rather, it is an attempt to help you think about the uses and limitations of statistics and how they might fit into the overall process of your research design. In clinical research much of the data you will collect will be numerical. Collecting the data is, however, just the first stage — then, you have to make sense of it. This is where statistics come into play. The science of collecting and interpreting numerical data, statistics can be used to describe data, such as by calculating averages and distributions ( descriptive statistics) or to draw inferences by analysing patterns and relationships within the data ( inferential statistics). Inferential statistics will usually form the main part of any analysis. Statistical analysis hinges on the use of sampling. In clinical research it is rarely possible to examine whole populations and as a result a sample is drawn from the relevant population. The difficulty with this is that one can never be certain that the sample is representative of the population as a whole and so that some form of bias is not operating. Good experimental design can minimize but not eliminate this possibility; consequently there will always be an element of doubt as to whether a genuine effect is being observed or if we are simply witnessing random variations in a data set. Statistics allow us to analyse patterns within the sample data and to draw inferences about the wider population. One must always bear in mind that statistical tests are used to determine if a prediction we make can actually be supported. They do not provide actual proof that we are correct; if our theory holds up statistically, we are less likely to be incorrect. Equally, if our theory were incorrect the statistics would be unlikely to support it.


Sign in / Sign up

Export Citation Format

Share Document