Simple Statistical Tests and P Values

2019 ◽  
pp. 285-299
Author(s):  
Charles H. Goldsmith ◽  
Eric K. Duku ◽  
Achilles Thoma ◽  
Jessica Murphy
Keyword(s):  
2020 ◽  
Vol 132 (6) ◽  
pp. 1970-1976
Author(s):  
Ashwin G. Ramayya ◽  
H. Isaac Chen ◽  
Paul J. Marcotte ◽  
Steven Brem ◽  
Eric L. Zager ◽  
...  

OBJECTIVEAlthough it is known that intersurgeon variability in offering elective surgery can have major consequences for patient morbidity and healthcare spending, data addressing variability within neurosurgery are scarce. The authors performed a prospective peer review study of randomly selected neurosurgery cases in order to assess the extent of consensus regarding the decision to offer elective surgery among attending neurosurgeons across one large academic institution.METHODSAll consecutive patients who had undergone standard inpatient surgical interventions of 1 of 4 types (craniotomy for tumor [CFT], nonacute redo CFT, first-time spine surgery with/without instrumentation, and nonacute redo spine surgery with/without instrumentation) during the period 2015–2017 were retrospectively enrolled (n = 9156 patient surgeries, n = 80 randomly selected individual cases, n = 20 index cases of each type randomly selected for review). The selected cases were scored by attending neurosurgeons using a need for surgery (NFS) score based on clinical data (patient demographics, preoperative notes, radiology reports, and operative notes; n = 616 independent case reviews). Attending neurosurgeon reviewers were blinded as to performing provider and surgical outcome. Aggregate NFS scores across various categories were measured. The authors employed a repeated-measures mixed ANOVA model with autoregressive variance structure to compute omnibus statistical tests across the various surgery types. Interrater reliability (IRR) was measured using Cohen’s kappa based on binary NFS scores.RESULTSOverall, the authors found that most of the neurosurgical procedures studied were rated as “indicated” by blinded attending neurosurgeons (mean NFS = 88.3, all p values < 0.001) with greater agreement among neurosurgeon raters than expected by chance (IRR = 81.78%, p = 0.016). Redo surgery had lower NFS scores and IRR scores than first-time surgery, both for craniotomy and spine surgery (ANOVA, all p values < 0.01). Spine surgeries with fusion had lower NFS scores than spine surgeries without fusion procedures (p < 0.01).CONCLUSIONSThere was general agreement among neurosurgeons in terms of indication for surgery; however, revision surgery of all types and spine surgery with fusion procedures had the lowest amount of decision consensus. These results should guide efforts aimed at reducing unnecessary variability in surgical practice with the goal of effective allocation of healthcare resources to advance the value paradigm in neurosurgery.


2018 ◽  
Author(s):  
Daniel Mortlock

Mathematics is the language of quantitative science, and probability and statistics are the extension of classical logic to real world data analysis and experimental design. The basics of mathematical functions and probability theory are summarized here, providing the tools for statistical modeling and assessment of experimental results. There is a focus on the Bayesian approach to such problems (ie, Bayesian data analysis); therefore, the basic laws of probability are stated, along with several standard probability distributions (eg, binomial, Poisson, Gaussian). A number of standard classical tests (eg, p values, the t-test) are also defined and, to the degree possible, linked to the underlying principles of probability theory. This review contains 5 figures, 1 table, and 15 references. Keywords: Bayesian data analysis, mathematical models, power analysis, probability, p values, statistical tests, statistics, survey design


Entropy ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. 630 ◽  
Author(s):  
Boris Ryabko

The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, there are hundreds of RNG statistical tests that are often combined into so-called batteries, each containing from a dozen to more than one hundred tests. When a battery test is used, it is applied to a sequence generated by the RNG, and the calculation time is determined by the length of the sequence and the number of tests. Generally speaking, the longer is the sequence, the smaller are the deviations from randomness that can be found by a specific test. Thus, when a battery is applied, on the one hand, the “better” are the tests in the battery, the more chances there are to reject a “bad” RNG. On the other hand, the larger is the battery, the less time it can spend on each test and, therefore, the shorter is the test sequence. In turn, this reduces the ability to find small deviations from randomness. To reduce this trade-off, we propose an adaptive way to use batteries (and other sets) of tests, which requires less time but, in a certain sense, preserves the power of the original battery. We call this method time-adaptive battery of tests. The suggested method is based on the theorem which describes asymptotic properties of the so-called p-values of tests. Namely, the theorem claims that, if the RNG can be modeled by a stationary ergodic source, the value − l o g π ( x 1 x 2 … x n ) / n goes to 1 − h when n grows, where x 1 x 2 … is the sequence, π ( ) is the p-value of the most powerful test, and h is the limit Shannon entropy of the source.


2013 ◽  
Vol 690-693 ◽  
pp. 1553-1567 ◽  
Author(s):  
Md Arifuzzaman ◽  
Rafiqul A. Tarefder

This study evaluates the role of antistripping agents to resist moisture-induced damage in asphalt binders. A total of five different types of antistripping agents are used. Plastomer and elastomer modified asphalt binders are used to modify the original base binder. Functionalized and non-functionalized AFM tips are used to determine adhesion in asphalt. With-CH3 tip, lime is found to be the most effective to protect moisture damage in asphalt binder as the adhesion loss is almost zero. The statistical tests show the Pearson values are very close to-1 that indicates a good correlation among the variables. Also the p-values are well below the prescribed value of 0.2% that indicates the test results to be significant from the statistical point of view.


2018 ◽  
Author(s):  
Diana Domanska ◽  
Chakravarthi Kanduri ◽  
Boris Simovski ◽  
Geir Kjetil Sandve

AbstractBackgroundThe difficulties associated with sequencing and assembling some regions of the DNA sequence result in gaps in the reference genomes that are typically represented as stretches of Ns. Although the presence of assembly gaps causes a slight reduction in the mapping rate in many experimental settings, that does not invalidate the typical statistical testing comparing read count distributions across experimental conditions. However, we hypothesize that not handling assembly gaps in the null model may confound statistical testing of co-localization of genomic features.ResultsFirst, we performed a series of explorative analyses to understand whether and how the public genomic tracks intersect the assembly gaps track (hg19). The findings rightly confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps and the intersection was observed only at the beginning and end regions of the assembly gaps rather than covering the whole gap sizes. Further, we simulated a set of query and reference genomic tracks in a way that nullified any dependence between them to test our hypothesis that not avoiding assembly gaps in the null model would result in spurious inflation of statistical significance. We then contrasted the distributions of test statistics and p-values of Monte Carlo simulation-based permutation tests that either avoided or not avoided assembly gaps in the null model when testing for significant co-localization between a pair of query and reference tracks. We observed that the statistical tests that did not account for the assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribu tion of p-values that is shifted to the left (leading to inflated significance).ConclusionOur results shows that not accounting for assembly gaps in statistical testing of co-localization analysis may lead to false positives and over-optimistic findings.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-16
Author(s):  
Christina Chatzipantsiou ◽  
Marios Dimitriadis ◽  
Manos Papadakis ◽  
Michail Tsagris

Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. Computationally efficient method for calculating permutation-based p-values for the Pearson correlation coefficient and two independent samples t-test are proposed. The method is general and can be applied to other similar two sample mean or two mean vectors cases.


Sign in / Sign up

Export Citation Format

Share Document