Explorations in statistics: hypothesis tests and P values

2009 ◽  
Vol 33 (2) ◽  
pp. 81-86 ◽  
Author(s):  
Douglas Curran-Everett

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.

2016 ◽  
Vol 77 (3) ◽  
pp. 529-539 ◽  
Author(s):  
Maarten Marsman ◽  
Eric-Jan Wagenmakers

P values have been critiqued on several grounds but remain entrenched as the dominant inferential method in the empirical sciences. In this article, we elaborate on the fact that in many statistical models, the one-sided P value has a direct Bayesian interpretation as the approximate posterior mass for values lower than zero. The connection between the one-sided P value and posterior probability mass reveals three insights: (1) P values can be interpreted as Bayesian tests of direction, to be used only when the null hypothesis is known from the outset to be false; (2) as a measure of evidence, P values are biased against a point null hypothesis; and (3) with N fixed and effect size variable, there is an approximately linear relation between P values and Bayesian point null hypothesis tests.


2017 ◽  
Author(s):  
Christopher Green ◽  
Sahir Abbas ◽  
Arlie Belliveau ◽  
Nataly Beribisky ◽  
Ian Davidson ◽  
...  

Using a computer program called “Statcheck,” a 2016 digital survey of several prestigious American and European psychology journals showed that the p-values reported in research articles failed to agree with the corresponding test statistics (e.g., F, t, χ2) at surprisingly high rates: nearly half of all articles contained at least one such error, as did about 10% of all null hypothesis significance tests. We investigated whether this problem was present in Canadian psychology journals and, if so, at what frequency. We discovered similar rates of p-value errors in Canadian journals over the past 30 years. However, we also noticed, a large number of typographical errors in the electronic versions of the articles. When we hand corrected a sample of our articles, the per-article error rate remained about the same, but the per test rate of errors dropped to 6.3%. We recommend that, in future, journals include explicit checks of statistics in their editorial processes.


2019 ◽  
Author(s):  
Daniel Lakens

Due to the strong overreliance on p-values in the scientific literature some researchers have argued that p-values should be abandoned or banned, and that we need to move beyond p-values and embrace practical alternatives. When proposing alternatives to p-values statisticians often commit the ‘Statistician’s Fallacy’, where they declare which statistic researchers really ‘want to know’. Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. In some situations, the answer to the question they are most interested in will be the p-value. As long as null-hypothesis tests have been criticized, researchers have suggested to include minimum-effect tests and equivalence tests in our statistical toolbox, and these tests (even though they return p-values) have the potential to greatly improve the questions researchers ask. It is clear there is room for improvement in how we teach p-values. If anyone really believes p-values are an important cause of problems in science, preventing the misinterpretation of p-values by developing better evidence-based education and user-centered statistical software should be a top priority. Telling researchers which statistic they should use has distracted us from examining more important questions, such as asking researchers what they want to know when they do scientific research. Before we can improve our statistical inferences, we need to improve our statistical questions.


Econometrics ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 11 ◽  
Author(s):  
Richard Startz

As a contribution toward the ongoing discussion about the use and mis-use of p-values, numerical examples are presented demonstrating that a p-value can, as a practical matter, give you a really different answer than the one that you want.


2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


PEDIATRICS ◽  
1989 ◽  
Vol 84 (6) ◽  
pp. A30-A30
Author(s):  
Student

Often investigators report many P values in the same study. The expected number of P values smaller than 0.05 is 1 in 20 tests of true null hypotheses; therefore the probability that at least one P value will be smaller than 0.05 increases with the number of tests, even when the null hypothesis is correct for each test. This increase is known as the "multiple-comparisons" problem...One reasonable way to correct for multiplicity is simply to multiply the P value by the number of tests. Thus, with five tests, an orignal 0.05 level for each is increased, perhaps to a value as high as 0.25 for the set. To achieve a level of not more than 0.05 for the set, we need to choose a level of 0.05/5 = 0.01 for the individual tests. This adjustment is conservative. We know only that the probability does not exceed 0.05 for the set.


1999 ◽  
Vol 1 (2) ◽  
pp. 83-91 ◽  
Author(s):  
E. M. C. MICHIELS ◽  
E. OUSSOREN ◽  
M. VAN GROENIGEN ◽  
E. PAUWS ◽  
P. M. M. BOSSUYT ◽  
...  

Michiels, E. M. C., E. Oussoren, M. van Groenigen, E. Pauws, P. M. M. Bossuyt, P. A. Voûte, and F. Baas. Genes differentially expressed in medulloblastoma and fetal brain. Physiol. Genomics 1: 83–91, 1999.—Serial analysis of gene expression (SAGE) was used to identify genes that might be involved in the development or growth of medulloblastoma, a childhood brain tumor. Sequence tags from medulloblastoma (10229) and fetal brain (10692) were determined. The distributions of sequence tags in each population were compared, and for each sequence tag, pairwise χ2 test statistics were calculated. Northern blot was used to confirm some of the results obtained by SAGE. For 16 tags, the χ2 test statistic was associated with a P value < 10−4. Among those transcripts with a higher expression in medulloblastoma were the genes for ZIC1 protein and the OTX2 gene, both of which are expressed in the cerebellar germinal layers. The high expression of these two genes strongly supports the hypothesis that medulloblastoma arises from the germinal layer of the cerebellum. This analysis shows that SAGE can be used as a rapid differential screening procedure.


Entropy ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. 630 ◽  
Author(s):  
Boris Ryabko

The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, there are hundreds of RNG statistical tests that are often combined into so-called batteries, each containing from a dozen to more than one hundred tests. When a battery test is used, it is applied to a sequence generated by the RNG, and the calculation time is determined by the length of the sequence and the number of tests. Generally speaking, the longer is the sequence, the smaller are the deviations from randomness that can be found by a specific test. Thus, when a battery is applied, on the one hand, the “better” are the tests in the battery, the more chances there are to reject a “bad” RNG. On the other hand, the larger is the battery, the less time it can spend on each test and, therefore, the shorter is the test sequence. In turn, this reduces the ability to find small deviations from randomness. To reduce this trade-off, we propose an adaptive way to use batteries (and other sets) of tests, which requires less time but, in a certain sense, preserves the power of the original battery. We call this method time-adaptive battery of tests. The suggested method is based on the theorem which describes asymptotic properties of the so-called p-values of tests. Namely, the theorem claims that, if the RNG can be modeled by a stationary ergodic source, the value − l o g π ( x 1 x 2 … x n ) / n goes to 1 − h when n grows, where x 1 x 2 … is the sequence, π ( ) is the p-value of the most powerful test, and h is the limit Shannon entropy of the source.


2010 ◽  
Vol 51 ◽  
Author(s):  
Kęstutis Dučinskas ◽  
Lina Dreižienė

Paper deals with a problem of testing isotropy against geometric anisotropy for Gaussian spatial data. The original simple test statistic based on directional empirical semivariograms is proposed. Under the assumption of independence of the classical semivariogram estimators and for increasing domain asymptotics, the distribution of test statistics is approximated by chi-squared distribution. The simulation experiments demonstrate the efficacy of the proposed test.


2020 ◽  
Vol 7 (1) ◽  
pp. 35-42
Author(s):  
Raisa Anakotta ◽  
Nursalim Nursalim ◽  
Reka Judahida Latuheru

The objective of this research is to describe whether or not fishbowl technique can improve students’ speaking of tenth grade of IPS 1 in SMA N 2 Sorong Regency. In this study, the researcher conducted the quantitative research using the method pre-experimental design type of the one group pre-test. Therefore, the researcher took 30 students as the sample from the population. The researcher used SPSS Analysis Version 20.0, the researcher gave the interpretation   towards “t” score by comparing t-value with t-table. The researcher interpretation that t-value is 3.048 with the significance value is 0.05. The score of t-table is 2.045 with the significance level of 0.05 with df 29. P-value is 0.0005 < 0.05, it is known that t-value > t-table ( 3.048  >  2.045 ). Meanwhile, the alternative (H1) is accepted and null hypothesis (Ho) is rejected. It means that using fishbowl technique can improve students speaking skill at the tenth of SMAN 2 Sorong Regency. But this technique is not effective towards students’ speaking skill, because it is not achieve score of KKM that is > 68.


Sign in / Sign up

Export Citation Format

Share Document