scholarly journals Manipulating the alpha level cannot cure significance testing

Author(s):  
David Trafimow ◽  
Valentin Amrhein ◽  
Corson N. Areshenkoff ◽  
Carlos Barrera-Causil ◽  
Eric J. Beh ◽  
...  

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p= .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.

2018 ◽  
Author(s):  
David Trafimow ◽  
Valentin Amrhein ◽  
Corson N. Areshenkoff ◽  
Carlos Barrera-Causil ◽  
Eric J. Beh ◽  
...  

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p= .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.


Author(s):  
David Trafimow ◽  
Valentin Amrhein ◽  
Corson N. Areshenkoff ◽  
Carlos Barrera-Causil ◽  
Eric J. Beh ◽  
...  

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p= .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.


Author(s):  
David Trafimow ◽  
Valentin Amrhein ◽  
Corson N. Areshenkoff ◽  
Carlos Barrera-Causil ◽  
Eric J. Beh ◽  
...  

We argue that depending on p-values to reject null hypotheses, including a recent call for changing the canonical alpha level for statistical significance from .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable criterion levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and determining sample sizes much more directly than significance testing does; but none of the statistical tools should replace significance testing as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.


2017 ◽  
Vol 23 (5) ◽  
pp. 644-646 ◽  
Author(s):  
Maria Pia Sormani

The calculation of the sample size needed for a clinical study is the challenge most frequently put to statisticians, and it is one of the most relevant issues in the study design. The correct size of the study sample optimizes the number of patients needed to get the result, that is, to detect the minimum treatment effect that is clinically relevant. Minimizing the sample size of a study has the advantage of reducing costs, enhancing feasibility, and also has ethical implications. In this brief report, I will explore the main concepts on which the sample size calculation is based.


2016 ◽  
Vol 11 (4) ◽  
pp. 551-554 ◽  
Author(s):  
Martin Buchheit

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.


2011 ◽  
Vol 6 (2) ◽  
pp. 252-277 ◽  
Author(s):  
Stephen T. Ziliak

AbstractStudent's exacting theory of errors, both random and real, marked a significant advance over ambiguous reports of plant life and fermentation asserted by chemists from Priestley and Lavoisier down to Pasteur and Johannsen, working at the Carlsberg Laboratory. One reason seems to be that William Sealy Gosset (1876–1937) aka “Student” – he of Student'st-table and test of statistical significance – rejected artificial rules about sample size, experimental design, and the level of significance, and took instead an economic approach to the logic of decisions made under uncertainty. In his job as Apprentice Brewer, Head Experimental Brewer, and finally Head Brewer of Guinness, Student produced small samples of experimental barley, malt, and hops, seeking guidance for industrial quality control and maximum expected profit at the large scale brewery. In the process Student invented or inspired half of modern statistics. This article draws on original archival evidence, shedding light on several core yet neglected aspects of Student's methods, that is, Guinnessometrics, not discussed by Ronald A. Fisher (1890–1962). The focus is on Student's small sample, economic approach to real error minimization, particularly in field and laboratory experiments he conducted on barley and malt, 1904 to 1937. Balanced designs of experiments, he found, are more efficient than random and have higher power to detect large and real treatment differences in a series of repeated and independent experiments. Student's world-class achievement poses a challenge to every science. Should statistical methods – such as the choice of sample size, experimental design, and level of significance – follow the purpose of the experiment, rather than the other way around? (JEL classification codes: C10, C90, C93, L66)


1970 ◽  
Vol 6 ◽  
pp. 98-108
Author(s):  
Bal K Joshi ◽  
Madhusudan P Upadhyay ◽  
Hari P Bimb ◽  
D Gauchan ◽  
BK Baniya

Synthesizing data analysis methods adopted under in situ global project in Nepal along withvariables and nature of study could be guiding reference for researchers especially to those involvedin on farm research. The review work was conducted with the objective to help in utilizing andmanaging in situ database system. The objectives of the experiment, the structure of the treatmentsand the experimental design used primarily determine the type of analysis. There were 60 papers ofthis project published in Nepal. All these papers are grouped under 8 thematic groups namely 1.Agroecosystem (3 papers), 2. Agromorphological and farmers’ perception (7 papers), 3. Croppopulation structure (5 papers), 4. Gender, policy and general (15 papers), 5. Isozyme andmolecular (6 papers), 6. Seed systems and farmers’ networks (5 papers), 7. Social, cultural andeconomical (11 papers) and 8. Value addition (8 papers). All these papers were reviewed basicallyfor data type, sample size, sampling methods, statistical methods and tools, varieties and purposes.Descriptive and inferential statistics along with multivariate methods were commonly used in onfarm research. Experimental design, the most common in on station trial was least used. Study overspace and time was not adopted. There were 5 kinds of data generated, 45 statistical tools adoptedin eight different crop species. Among the 5 kinds of data under these eight subject areas,categorical type was highest followed by discrete numerical. Binary type was least in frequency.Most of the papers were related to rice followed by taro and finger millet. Cucumber and pigeonpea were studied least. Descriptive statistics along with Χ2, multivariate analysis and regressionapproaches would be appropriate tools. Similarly SPSS and MINITAB may be good software. Thebest one among a number of statistical tools should be selected and utmost care must be exercisedwhile collecting data.Key words: Data analysis methods; on farm research; on station research; subject areasDOI: 10.3126/narj.v6i0.3371Nepal Agriculture Research Journal Vol.6 2005 pp.98-108


Author(s):  
Anupam Dakua ◽  
Kalyan Ghadei

Aim: Land being the most important consideration in the social status in the rural areas, selling of them is considered as bad signs in India. Many times, it is observed that farmers were compelled to sell their lands due to manyreasons. Depeasantisation is one of them. In the current paper the land selling scenario of the Depeasantised persons is analysed. Study Design and Place of Study: An Ex-post-Facto study has been conducted in Nayagarh District of Odisha, which is one of the peri-urban districts of the Capital city of Odisha. Methodology: A total of 280 number of Depeasantised persons were selected randomly from 5 blocks out of 8 blocks of the district for the study. A structured interview schedule was prepared for collecting the data from the respondents. With proper statistical tools the data was analysed and interpreted for the result. The proportion of respondents sold land, category of farmers who sold land, the reason of land selling, and the persons to whom the land sold was found out during the investigation. Result: Almost half of the depeasantised persons have sold their lands, all of then have sold a portion of their lands only. More than 85 percent of the respondents belonged to the marginal farer category who had sold their lands. Debt repayment was the primary factor for selling of land in the study area. Most of the depeasantised persons more than 60 percent had sold their lands to landlords and other moneyed persons. Conclusion: To safeguard the interest of the farmers and to prevent the land selling of the small and marginal farmers government should have more focused approach to solve this issue.


Author(s):  
Pavel Bednář ◽  
Jakub Černý

This paper analyses the development of beech plantations aged 7 to 18 years that were planted in gap cuts (0.1–0.25 ha; ISF 50%), clear cuts (0.5–1.0 ha; ISF 87%) and finally underplanted areas in shelterwood cuts in mature spruce stands (G = 22–26 m2/ha; ISF 28%). The research consisted of the following analyses: height growth, diameter growth and beech quality development. We used standard statistical tools (p < 0.05) for evaluating height and diameter growth, which showed significant differences in both characteristics (total height and DBH) within 7-year-old and 13-year-old plantations grown in all three regeneration treatments. The tallest beech trees with greatest DBH at the age of 7 and 13 were found in clearings whereas shortest and thinnest trees grew in shelterwoods. However, at the age of 18, there was no significant different between gap cut and clear cut in both parameters. The best quality was observed in shelterwoods.


Sign in / Sign up

Export Citation Format

Share Document