A Note on Computing Interval Overlap Statistics

Mapping Intimacies ◽

10.1101/517987 ◽

2019 ◽

Author(s):

Shahab Sarmashghi ◽

Vineet Bafna

Keyword(s):

Approximate Method ◽

Finite Length ◽

Line Segment ◽

Null Hypothesis ◽

Statistical Significance ◽

Interval Data ◽

Combinatorial Algorithm ◽

Link Type ◽

Functional Regions ◽

Insight Into

AbstractWe consider the following problem: Let I and If each describe a collection of n and m non-overlapping intervals on a line segment of finite length. Suppose that k of the m intervals of If are intersected by some interval(s) in I. Under the null hypothesis that intervals in I are randomly arranged w.r.t If, what is the significance of this overlap? This is a natural abstraction of statistical questions that are ubiquitous in the post-genomic era. The interval collections represent annotations that reveal structural or functional regions of the genome, and overlap statistics can provide insight into the correlation between different structural and functional regions. However, the statistics of interval overlaps have not been systematically explored. In this manuscript, we formulate a statistical significance problem which considers the length and structure of intervals. We describe a combinatorial algorithm for a constrained interval overlap problem that can accurately compute very small p-values. We also propose a fast approximate method to facilitate problems consisted of very large number of intervals. These methods are all implemented in a tool, iStat. We applied iStat to simulated interval data to obtain precise estimates of low p-values, and characterize the performance of our methods. We also test iStat on real datasets from previous studies, and compare iStat results with the reported p-values using basic permutation or parametric tests. The iStat software is made publicly available on https://github.com/shahab-sarmashghi/ISTAT.git

Download Full-text

The Numbers Will Love You Back in Return—I Promise

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2016-0214 ◽

2016 ◽

Vol 11 (4) ◽

pp. 551-554 ◽

Cited By ~ 53

Author(s):

Martin Buchheit

Keyword(s):

Sample Size ◽

Null Hypothesis ◽

Clinical Medicine ◽

Statistical Significance ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Sport Science ◽

Size Dependent ◽

Research Questions ◽

Per Se

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.

Download Full-text

On the Potential Mismatch between the Function of the Bayes Factor and Researchers’ Expectations

10.31234/osf.io/86p4k ◽

2021 ◽

Author(s):

Tsz Keung Wong ◽

Henk Kiers ◽

Jorge Tendeiro

Keyword(s):

Null Hypothesis ◽

Bayes Factor ◽

Survey Study ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Posterior Odds ◽

Statistical Tool ◽

Reporting Practices ◽

Insight Into

The aim of this study is to investigate whether there is a potential mismatch between the usability of a statistical tool and psychology researchers’ expectation of it. Bayesian statistics is often promoted as an ideal substitute for frequentists statistics since it coincides better with researchers’ expectations and needs. A particular incidence of this is the proposal of replacing Null Hypothesis Significance Testing (NHST) by Null Hypothesis Bayesian Testing (NHBT) using the Bayes factor. In this paper, it is studied to what extent the usability and expectations of NHBT match well. First, a study of the reporting practices in 73 psychological publications was carried out. It was found that eight Questionable Reporting and Interpreting Practices (QRIPs) occur more than once among the practitioners when doing NHBT. Specifically, our analysis provides insight into possible mismatches and their occurrence frequencies. A follow-up survey study has been conducted to assess such mismatches. The sample (N = 108) consisted of psychology researchers, experts in methodology (and/or statistics), and applied researchers in fields other than psychology. The data show that discrepancies exist among the participants. Interpreting the Bayes Factor as posterior odds and not acknowledging the notion of relative evidence in the Bayes Factor are arguably the most concerning ones. The results of the paper suggest that a shift of statistical paradigm cannot solve the problem of misinterpretation altogether if the users are not well acquainted with the tools.

Download Full-text

The earth is flat (p>0.05): Significance thresholds and the crisis of unreplicable research

10.7287/peerj.preprints.2921v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Alternative Hypothesis ◽

Statistical Significance ◽

Practical Importance ◽

Decision Rules ◽

Effect Sizes ◽

P Values ◽

True Effect

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.

Download Full-text

Evaluation of dental age and associated developmental anomalies in subjects with impacted mandibular canines

The Angle Orthodontist ◽

10.2319/053114-387.1 ◽

2014 ◽

Vol 85 (4) ◽

pp. 638-644 ◽

Cited By ~ 1

Author(s):

Shikha Jain ◽

K Sadashiva Shetty ◽

Shweta Jain ◽

Sachin Jain ◽

A.T. Prakash ◽

...

Keyword(s):

Null Hypothesis ◽

Close Association ◽

Statistical Significance ◽

Reference Sample ◽

Dental Development ◽

Control Group ◽

Dental Age ◽

Odds Ratios ◽

Developmental Anomalies ◽

Orthodontic Patients

ABSTRACT Objectives: To assess the null hypothesis that there is no difference in the rate of dental development and the occurrence of selected developmental anomalies related to shape, number, structure, and position of teeth between subjects with impacted mandibular canines and those with normally erupted canines. Materials and Methods: Pretreatment records of 42 subjects diagnosed with mandibular canines impaction (impaction group: IG) were compared with those of 84 subjects serving as a control reference sample (control group: CG). Independent t-tests were used to compare mean dental ages between the groups. Intergroup differences in distribution of subjects based on the rate of dental development and occurrence of selected dental anomalies were assessed using χ2 tests. Odds of late, normal, and early developers and various categories of developmental anomalies between the IG and the CG were evaluated in terms of odds ratios. Results: Mean dental age for the IG was lower than that for the CG in general. Specifically, this was true for girls (P < .05). Differences in the distribution of the subjects based on the rate of dental development and occurrence of positional anomalies also reached statistical significance (P < .05). The IG showed a higher frequency of late developers and positional anomalies compared with controls (odds ratios 3.00 and 2.82, respectively; P < .05). Conclusions: The null hypothesis was rejected. We identified close association of female subjects in the IG with retarded dental development compared with the female orthodontic patients. Increased frequency of positional developmental anomalies was also remarkable in the IG.

Download Full-text

Temperature and Time of Development of the Two Sexes in Drosophila

Journal of Experimental Biology ◽

10.1242/jeb.4.2.186 ◽

1926 ◽

Vol 4 (2) ◽

pp. 186-195 ◽

Cited By ~ 1

Author(s):

GERT BONNIER

Keyword(s):

Statistical Significance ◽

Pupal Stage ◽

The Other ◽

Female Development ◽

Short Intervals ◽

Influence Of Temperature On ◽

The Times ◽

The Moment ◽

First Time ◽

Insight Into

1. The time of development at 25°C. up to the moment of pupation is found to be for females and males respectively 116.62 ± 0.19 and 116.78 ± 0.20 hours. During the pupal stage the two times are 111.36 ± 0.15 and 115.46 ± 0.13 hours. 2. At 30° C. the corresponding figures are (in the same order): 99.95 ± 0.49, 103.37 ± 0.43, 78±15 ± 0.50 and 84.26 ± 0.34 hours. 3. These figures show that there is a statistical significance in the differences of the times of development of the two sexes for both the periods at 30°C. but only for the pupal stage at 25° C. It is pointed out that the fact that the longer time of male development as compared with female development at 25° C. is confined to the pupal stage, may be correlated with the other fact that the essential parts of the secondary sexual characters are developed during this stage. 4. It is shown that there is a negative correlation between the pre-pupal and pupal times of development, indicating that the longer the first time is, the shorter is, as a rule, the other time and vice versa. 5. With the aid of statistical methods it is shown that the shortening of the time of development at 30°C. as compared with the time at 25° C. is much more pronounced for the pupal than for the pre-pupal stage. 6. This last fact is discussed and it is emphasised that the ordinary methods of studying the influence of temperature on development are too rough to be of more than of a descriptive value, the only way of getting a deeper insight into the processes of development by temperature studies being the separate studies of a number of short intervals.

Download Full-text

The null-hypothesis significance-test procedure is still warranted

Behavioral and Brain Sciences ◽

10.1017/s0140525x98591169 ◽

1998 ◽

Vol 21 (2) ◽

pp. 228-235 ◽

Cited By ~ 2

Author(s):

Siu L. Chow

Keyword(s):

Null Hypothesis ◽

Statistical Power ◽

Meta Analysis ◽

Statistical Significance ◽

Test Procedure ◽

Significance Test ◽

Null Hypothesis Significance Test ◽

Wide Range ◽

Statistical Hypotheses ◽

Alternative Hypotheses

Entertaining diverse assumptions about empirical research, commentators give a wide range of verdicts on the NHSTP defence in Statistical significance. The null-hypothesis significance-test procedure (NHSTP) is defended in a framework in which deductive and inductive rules are deployed in theory corroboration in the spirit of Popper's Conjectures and refutations (1968b). The defensible hypothetico-deductive structure of the framework is used to make explicit the distinctions between (1) substantive and statistical hypotheses, (2) statistical alternative and conceptual alternative hypotheses, and (3) making statistical decisions and drawing theoretical conclusions. These distinctions make it easier to show that (1) H0 can be true, (2) the effect size is irrelevant to theory corroboration, and (3) “strong” hypotheses make no difference to NHSTP. Reservations about statistical power, meta-analysis, and the Bayesian approach are still warranted.

Download Full-text

Null Hypothesis, Statistical Significance and Power

Biostatistics for Radiologists ◽

10.1007/978-88-470-1133-5_4 ◽

2009 ◽

pp. 67-76

Keyword(s):

Null Hypothesis ◽

Statistical Significance

Download Full-text

Evaluating the Statistical Significance of Models Developed by Stepwise Regression

Journal of Marketing Research ◽

10.1177/002224378302000101 ◽

1983 ◽

Vol 20 (1) ◽

pp. 1-11 ◽

Cited By ~ 16

Author(s):

Shelby H. McIntyre ◽

David B. Montgomery ◽

V. Srinivasan ◽

Barton A. Weitz

Keyword(s):

Monte Carlo ◽

Null Hypothesis ◽

Regression Models ◽

Stepwise Regression ◽

Statistical Significance ◽

Selection Procedure ◽

Coefficient Of Determination ◽

Forward Selection ◽

Monte Carlo Simulation Study ◽

Independent Variables

Information for evaluating the statistical significance of stepwise regression models developed with a forward selection procedure is presented. Cumulative distributions of the adjusted coefficient of determination ([Formula: see text]) under the null hypothesis of no relationship between the dependent variable and m potential independent variables are derived from a Monté Carlo simulation study. The study design included sample sizes of 25, 50, and 100, available independent variables of 10, 20, and 40, and three criteria for including variables in the regression model. The results reveal that the biases involved in testing statistical significance by two well-known rules are very large, thus demonstrating the desirability of using the Monté Carlo cumulative [Formula: see text] distributions developed by the authors. Although the results were derived under the assumption of uncorrelated predictors, the authors show that the results continue to be useful for the correlated predictor case.

Download Full-text

22 Comparing statistical significance based on P-values with the probability of replicating a result less extreme than the null hypothesis to make evidence more replicable

10.1136/bmjebm-2019-ebmlive.103 ◽

2019 ◽

Author(s):

Huw Llewelym

Keyword(s):

Null Hypothesis ◽

Statistical Significance ◽

P Values

Download Full-text

Failing Grade: 89% of Introduction-to-Psychology Textbooks That Define or Explain Statistical Significance Do So Incorrectly

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919858072 ◽

2019 ◽

Vol 2 (3) ◽

pp. 233-239 ◽

Cited By ~ 3

Author(s):

Scott A. Cassidy ◽

Ralitza Dimova ◽

Benjamin Giguère ◽

Jeffrey R. Spence ◽

David J. Stanley

Keyword(s):

United States ◽

Null Hypothesis ◽

Classroom Instruction ◽

Statistical Significance ◽

Psychology Students ◽

The United States ◽

Introductory Psychology ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Do So

Null-hypothesis significance testing (NHST) is commonly used in psychology; however, it is widely acknowledged that NHST is not well understood by either psychology professors or psychology students. In the current study, we investigated whether introduction-to-psychology textbooks accurately define and explain statistical significance. We examined 30 introductory-psychology textbooks, including the best-selling books from the United States and Canada, and found that 89% incorrectly defined or explained statistical significance. Incorrect definitions and explanations were most often consistent with the odds-against-chance fallacy. These results suggest that it is common for introduction-to-psychology students to be taught incorrect interpretations of statistical significance. We hope that our results will create awareness among authors of introductory-psychology books and provide the impetus for corrective action. To help with classroom instruction, we provide slides that correctly describe NHST and may be useful for introductory-psychology instructors.

Download Full-text