Testing Significance Testing

Joachim I. Krueger; Patrick R. Heck

doi:10.1525/collabra.108

Testing Significance Testing

Collabra Psychology ◽

10.1525/collabra.108 ◽

2018 ◽

Vol 4 (1) ◽

Cited By ~ 2

Author(s):

Joachim I. Krueger ◽

Patrick R. Heck

Keyword(s):

Posterior Probability ◽

Inductive Inference ◽

Significance Testing ◽

Likelihood Ratios ◽

Simulation Experiments ◽

P Values ◽

Psychological Science ◽

Research Findings ◽

Inductive Inferences ◽

Better Than

The practice of Significance Testing (ST) remains widespread in psychological science despite continual criticism of its flaws and abuses. Using simulation experiments, we address four concerns about ST and for two of these we compare ST’s performance with prominent alternatives. We find the following: First, the p values delivered by ST predict the posterior probability of the tested hypothesis well under many research conditions. Second, low p values support inductive inferences because they are most likely to occur when the tested hypothesis is false. Third, p values track likelihood ratios without raising the uncertainties of relative inference. Fourth, p values predict the replicability of research findings better than confidence intervals do. Given these results, we conclude that p values may be used judiciously as a heuristic tool for inductive inference. Yet, p values cannot bear the full burden of inference. We encourage researchers to be flexible in their selection and use of statistical methods.

Download Full-text

The replication and reproducibility crises: origins and consequences for studies of ecology and evolution

Septentrio Conference Series ◽

10.7557/5.4525 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nigel Gilles Yoccoz

Keyword(s):

Scientific Progress ◽

Present Situation ◽

Open Science ◽

P Values ◽

Psychological Science ◽

Alternative Hypotheses ◽

Research Findings ◽

Fundamental Misunderstanding ◽

Science Collaboration ◽

Published Research

Watch the VIDEO.There is a widespread discussion around a scientific crisis, resulting from a lack of reproducibility of published scientific studies. This was exemplified by Ioannidis’ 2005 paper “Why most published research findings are false” or the 2015 Open Science Collaboration study assessing reproducibility of psychological science. An often-cited reason for this reproducibility crisis is a fundamental misunderstanding of what statistical methods, and in particular P-values, can achieve. In the context of studies of ecology and evolution, I will show how 1) the pressure for publishing “novel” results, 2) what Gelman has called the “garden of forking paths”, i.e. the fact that published analyses represent only one out of many possible analyses, and 3) the often fruitless dichotomy between a null and alternative hypotheses, has led to the present situation. While scientific progress is dependent of major breakthroughs, we also need to find a better balance between confirmatory research – understanding how known effects vary in size according to the context – and exploratory, non-incremental research – finding new effects.

Download Full-text

The logic of p-values

10.31234/osf.io/z9ua2 ◽

2017 ◽

Author(s):

Jose D. Perezgonzalez

Keyword(s):

Null Hypothesis ◽

Formal Logic ◽

Significance Testing ◽

P Value ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Logical Interpretation ◽

Psychological Science ◽

Tests Of Significance

Wagenmakers et al. addressed the illogic use of p-values in 'Psychological Science under Scrutiny'. While historical criticisms mostly deal with the illogical nature of null hypothesis significance testing (NHST), Wagenmakers et al. generalize such argumentation to the p-value itself. Unfortunately, Wagenmakers et al. misinterpret the formal logic basis of tests of significance (and, by extension, of tests of acceptance). This article highlights three instances where such logical interpretation fails and provides plausible corrections and further clarification.

Download Full-text

Using Connectionist Modules for Decision Support

Methods of Information in Medicine ◽

10.1055/s-0038-1634790 ◽

1990 ◽

Vol 29 (03) ◽

pp. 167-181 ◽

Cited By ~ 6

Author(s):

G. Hripcsak

Keyword(s):

Decision Support ◽

Standard Deviation ◽

Confidence Interval ◽

Posterior Probability ◽

Back Propagation ◽

Connectionist Model ◽

Test Set ◽

The Third ◽

Independent Test ◽

Better Than

AbstractA connectionist model for decision support was constructed out of several back-propagation modules. Manifestations serve as input to the model; they may be real-valued, and the confidence in their measurement may be specified. The model produces as its output the posterior probability of disease. The model was trained on 1,000 cases taken from a simulated underlying population with three conditionally independent manifestations. The first manifestation had a linear relationship between value and posterior probability of disease, the second had a stepped relationship, and the third was normally distributed. An independent test set of 30,000 cases showed that the model was better able to estimate the posterior probability of disease (the standard deviation of residuals was 0.046, with a 95% confidence interval of 0.046-0.047) than a model constructed using logistic regression (with a standard deviation of residuals of 0.062, with a 95% confidence interval of 0.062-0.063). The model fitted the normal and stepped manifestations better than the linear one. It accommodated intermediate levels of confidence well.

Download Full-text

p-Values of Likelihood Ratios

Probability and Forensic Evidence ◽

10.1017/9781108596176.010 ◽

2021 ◽

pp. 254-282

Keyword(s):

Likelihood Ratios ◽

P Values

Download Full-text

A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals

Econometrics ◽

10.3390/econometrics7020026 ◽

2019 ◽

Vol 7 (2) ◽

pp. 26 ◽

Cited By ~ 7

Author(s):

David Trafimow

Keyword(s):

Present Article ◽

Confidence Intervals ◽

Null Hypothesis ◽

A Priori ◽

Significance Testing ◽

Population Parameters ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Statistical Procedures ◽

Major Section

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.

Download Full-text

Application of Bayesian Network in Safety Evaluation of Metro Construction

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.838-841.1463 ◽

2013 ◽

Vol 838-841 ◽

pp. 1463-1468

Author(s):

Xiang Ke Liu ◽

Zhi Shen Wang ◽

Hai Liang Wang ◽

Jun Tao Wang

Keyword(s):

Bayesian Networks ◽

Bayesian Network ◽

Posterior Probability ◽

Fault Tree ◽

Fault Tree Analysis ◽

Safety Evaluation ◽

Tree Analysis ◽

Tunnel Blasting ◽

Better Than

The paper introduced the Bayesian networks briefly and discussed the algorithm of transforming fault tree into Bayesian networks at first, then regarded the structures impaired caused by tunnel blasting construction as a example, introduced the built and calculated method of the Bayesian networks by matlab. Then assumed the probabilities of essential events, calculated the probability of top event and the posterior probability of each essential events by the Bayesian networks. After that the paper contrast the characteristics of fault tree analysis and the Bayesian networks, Identified that the Bayesian networks is better than fault tree analysis in safety evaluation in some case, and provided a valid way to assess risk in metro construction.

Download Full-text

Outcomes of full-term infants with bilious vomiting: observational study of a retrieved cohort

Archives of Disease in Childhood ◽

10.1136/archdischild-2013-305724 ◽

2014 ◽

Vol 100 (1) ◽

pp. 14-17 ◽

Cited By ~ 9

Author(s):

Syed Mohinuddin ◽

Pankaj Sakhuja ◽

Benjie Bermundo ◽

Nandiran Ratnavel ◽

Stephen Kempley ◽

...

Keyword(s):

Posterior Probability ◽

Abdominal Distension ◽

Clinical Findings ◽

Likelihood Ratios ◽

Term Infants ◽

X Ray ◽

Bilious Vomiting ◽

Term Newborns ◽

Time Critical ◽

Surgical Condition

Bilious vomiting in a neonate may be a sign of intestinal obstruction often resulting in transfer requests to surgical centres. The aim of this study was to assess the use of clinical findings at referral in predicting outcomes and to determine how often such patients have a time-critical surgical condition (eg, volvulus, where a delay in treatment is likely to compromise gut viability).Methods4-year data and outcomes of all term newborns aged ≤7 days with bilious vomiting transferred by a regional transfer service were analysed. Specificity, sensitivity, likelihood ratios, correlations, prior and posterior probability of clinical findings in predicting newborns with surgical diagnosis were calculated.ResultsOf 163 neonates with bilious vomiting, 75 (46%) had a surgical diagnosis and 23 (14.1%) had a time-critical surgical condition. The diagnosis of a surgical condition in neonates with bilious vomiting was significantly associated with abdominal distension (χ2=5.17, p=0.023), abdominal tenderness (χ2=5.90, p=0.015) and abnormal abdominal X-ray findings (χ2=5.68, p=0.017) but not with palpation findings of a soft as compared with a tense abdomen (χ2=3.21, p=0.073). Abnormal abdominal X-ray, abdominal distension and tenderness had 97%, 74% and 62% sensitivity, respectively, with regard to association with an underlying surgical diagnosis. Normal abdominal X-ray reduced the posterior probability of surgical diagnosis from 50% to 16%. Overall, clinical findings at referral did not differentiate between infants with or without surgical or time-critical condition.ConclusionsWe recommend that term neonates with bilious vomiting referred for transfer are prioritised as time critical.

Download Full-text

The Application of Baum-Welch Algorithm in Multistep Attack

The Scientific World JOURNAL ◽

10.1155/2014/374260 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Yanxue Zhang ◽

Dongmei Zhao ◽

Jinxing Liu

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov Models ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Simulation Experiments ◽

Forward Algorithm ◽

Better Than

The biggest difficulty of hidden Markov model applied to multistep attack is the determination of observations. Now the research of the determination of observations is still lacking, and it shows a certain degree of subjectivity. In this regard, we integrate the attack intentions and hidden Markov model (HMM) and support a method to forecasting multistep attack based on hidden Markov model. Firstly, we train the existing hidden Markov model(s) by the Baum-Welch algorithm of HMM. Then we recognize the alert belonging to attack scenarios with the Forward algorithm of HMM. Finally, we forecast the next possible attack sequence with the Viterbi algorithm of HMM. The results of simulation experiments show that the hidden Markov models which have been trained are better than the untrained in recognition and prediction.

Download Full-text

Research on thermal property and temperature rating prediction of Mongolian robe ensembles

International Journal of Clothing Science and Technology ◽

10.1108/ijcst-03-2017-0030 ◽

2018 ◽

Vol 30 (6) ◽

pp. 747-756 ◽

Cited By ~ 1

Author(s):

Xiaofang Guo ◽

Hui Shi ◽

Chenglong Wei ◽

Xiao Dong Chen

Keyword(s):

Prediction Model ◽

Thermal Property ◽

Thermal Insulation ◽

Design Methodology ◽

Environmental Adaptation ◽

Photographic Method ◽

Mongolian Plateau ◽

Content Type ◽

Research Findings ◽

Better Than

Purpose The purpose of this paper is to reveal the unique thermal property of Mongolian clothing from the current western clothing and explain their environmental adaptation to the climate of Mongolian plateau in China. Design/methodology/approach Thermal insulation and the temperature rating (TR) of eight Mongolian robe ensembles and two western clothing ensembles were investigated by manikin testing and wearing trials, respectively. The clothing area factor (fcl) of these Mongolian clothing was measured by photographic method and estimated equation from ISO 15831. Finally, the TR prediction model for Mongolian clothing was built and compared with current models for western clothing in ISO 7730 and for Tibetan clothing in previous article. Findings The results demonstrated that the total thermal insulation of Mongolian robe ensembles was much bigger than that of western clothing ensembles and ranged from 1.81clo to 3.11clo during the whole year. The fcl of the Mongolian clothing should be determined by photographic method because the differences between these two methods were much bigger from 0.6 to 13.9 percent; the TR prediction model for Mongolian robe ensembles is TR=25.57−7.13Icl, which revealed that the environmental adaptation of Mongolian clothing was much better than that of western clothing and similar to that of Tibetan clothing. Originality/value The research findings give a detailed information about the thermal property of China Mongolian clothing, and explain the environmental adaptation of Mongolian clothing to the cold and changing climate.

Download Full-text

Erratum

Sociological Research Online ◽

10.1177/1360780417731066 ◽

2017 ◽

Vol 22 (4) ◽

pp. 255-255

Keyword(s):

T Test ◽

Significance Testing ◽

Sociological Research ◽

P Values ◽

Power Of The Test ◽

Large N

Gorard S (2016) Damaging Real Lives Through Obstinacy: Re-emphasising Why Significance Testing is Wrong. Sociological Research Online 21(1): 2. DOI: 10.5153/sro.3857 It has been brought to the attention of the Editors and the Publishers that some corrections requested by the author while reviewing the proofs had inadvertently been missed ahead of first publication of the above article on 28 February 2016. The author’s corrections were incorporated into subsequent versions of the online article, and there was an unintentional delay in uploading the corrected PDF version of the article online. For clarity of the scientific record, the corrections are outlined in this erratum: In paragraph 4.4 The sentence ‘On the first run, 1217 p-values were below 0.05 (this represents around 5.5% of the samples).’ was corrected as follows: ‘On the first run, 1217 p-values were below 0.05 (this represents around 12% of the samples)’. In paragraph 4.5 The sentence ‘Lack of normality may reduce the so-called “power” of the test slightly, but with 268 cases (deemed a very large N in most resources), this has been shown not to matter ( http://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/ )’ was corrected as follows: ‘Lack of normality may reduce the so-called “power” of the test slightly, but with 200 cases (deemed a very large N in most resources), this has been shown not to matter ( http://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/ )’ Sociological Research Online apologises to the author and the readers for any inconvenience this may have caused. The correct and citable version of the article is accessible at the following DOI: 10.5153/sro.3857

Download Full-text