Indices of Rank Histogram Flatness and Their Sampling Properties

Abstract Quantitative evaluation of the flatness of the verification rank histogram can be approached through formal hypothesis testing. Traditionally, the familiar χ2 test has been used for this purpose. Recently, two alternatives—the reliability index (RI) and an entropy statistic (Ω)—have been suggested in the literature. This paper presents approximations to the sampling distributions of these latter two rank histogram flatness metrics, and compares the statistical power of tests based on the three statistics, in a controlled setting. The χ2 test is generally most powerful (i.e., most sensitive to violations of the null hypothesis of rank uniformity), although for overdispersed ensembles and small sample sizes, the test based on the entropy statistic Ω is more powerful. The RI-based test is preferred only for unbiased forecasts with small ensembles and very small sample sizes.

Download Full-text

No Evidence that Experiencing Physical Warmth Promotes Interpersonal Warmth: Two Failures to Replicate Williams and Bargh (2008)

10.31234/osf.io/mvn9b ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher Chabris ◽

Patrick Ryan Heck ◽

Jaclyn Mandart ◽

Daniel Jacob Benjamin ◽

Daniel J. Simons

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

Sample Sizes ◽

Double Blind ◽

Bayesian Analyses ◽

Physical Warmth ◽

Small Sample Sizes ◽

Interpersonal Warmth

Williams and Bargh (2008) reported that holding a hot cup of coffee caused participants to judge a person’s personality as warmer, and that holding a therapeutic heat pad caused participants to choose rewards for other people rather than for themselves. These experiments featured large effects (r = .28 and .31), small sample sizes (41 and 53 participants), and barely statistically significant results. We attempted to replicate both experiments in field settings with more than triple the sample sizes (128 and 177) and double-blind procedures, but found near-zero effects (r = –.03 and .02). In both cases, Bayesian analyses suggest there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.

Download Full-text

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Scientific Reports ◽

10.1038/s41598-021-81110-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Florent Le Borgne ◽

Arthur Chatton ◽

Maxime Léger ◽

Rémi Lenain ◽

Yohann Foucher

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Statistical Power ◽

Small Sample ◽

Causal Effects ◽

Small Samples ◽

Support Vector ◽

Sample Sizes ◽

Super Learner ◽

Small Sample Sizes

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

Download Full-text

Distinguishing autism from co-existing conditions: a behavioural profiling investigation

Advances in Autism ◽

10.1108/aia-09-2015-0018 ◽

2016 ◽

Vol 2 (1) ◽

pp. 41-54

Author(s):

Ashleigh Saunders ◽

Karen E. Waldie

Keyword(s):

Statistical Power ◽

Psychiatric Comorbidity ◽

Adaptive Functioning ◽

Autism Spectrum ◽

Small Sample ◽

Comorbid Conditions ◽

Sample Sizes ◽

Content Type ◽

Small Sample Sizes ◽

Comorbid Adhd

Purpose – Autism spectrum disorder (ASD) is a lifelong neurodevelopmental condition for which there is no known cure. The rate of psychiatric comorbidity in autism is extremely high, which raises questions about the nature of the co-occurring symptoms. It is unclear whether these additional conditions are true comorbid conditions, or can simply be accounted for through the ASD diagnosis. The paper aims to discuss this issue. Design/methodology/approach – A number of questionnaires and a computer-based task were used in the current study. The authors asked the participants about symptoms of ASD, attention deficit hyperactivity disorder (ADHD) and anxiety, as well as overall adaptive functioning. Findings – The results demonstrate that each condition, in its pure form, can be clearly differentiated from one another (and from neurotypical controls). Further analyses revealed that when ASD occurs together with anxiety, anxiety appears to be a separate condition. In contrast, there is no clear behavioural profile for when ASD and ADHD co-occur. Research limitations/implications – First, due to small sample sizes, some analyses performed were targeted to specific groups (i.e. comparing ADHD, ASD to comorbid ADHD+ASD). Larger sample sizes would have given the statistical power to perform a full scale comparative analysis of all experimental groups when split by their comorbid conditions. Second, males were over-represented in the ASD group and females were over-represented in the anxiety group, due to the uneven gender balance in the prevalence of these conditions. Lastly, the main profiling techniques used were questionnaires. Clinical interviews would have been preferable, as they give a more objective account of behavioural difficulties. Practical implications – The rate of psychiatric comorbidity in autism is extremely high, which raises questions about the nature of the co-occurring symptoms. It is unclear whether these additional conditions are true comorbid conditions, or can simply be accounted for through the ASD diagnosis. Social implications – This information will be important, not only to healthcare practitioners when administering a diagnosis, but also to therapists who need to apply evidence-based treatment to comorbid and stand-alone conditions. Originality/value – This study is the first to investigate the nature of co-existing conditions in ASD in a New Zealand population.

Download Full-text

No Evidence That Experiencing Physical Warmth Promotes Interpersonal Warmth

Social Psychology ◽

10.1027/1864-9335/a000361 ◽

2019 ◽

Vol 50 (2) ◽

pp. 127-132 ◽

Cited By ~ 12

Author(s):

Christopher F. Chabris ◽

Patrick R. Heck ◽

Jaclyn Mandart ◽

Daniel J. Benjamin ◽

Daniel J. Simons

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

Sample Sizes ◽

Double Blind ◽

Bayesian Analyses ◽

Physical Warmth ◽

Small Sample Sizes ◽

Interpersonal Warmth

Abstract. Williams and Bargh (2008) reported that holding a hot cup of coffee caused participants to judge a person’s personality as warmer and that holding a therapeutic heat pad caused participants to choose rewards for other people rather than for themselves. These experiments featured large effects ( r = .28 and .31), small sample sizes (41 and 53 participants), and barely statistically significant results. We attempted to replicate both experiments in field settings with more than triple the sample sizes (128 and 177) and double-blind procedures, but found near-zero effects ( r = −.03 and .02). In both cases, Bayesian analyses suggest there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.

Download Full-text

Do Female and Male Judges Assign the Same Ratings to the Same Wines? Large Sample Results

Journal of Wine Economics ◽

10.1017/jwe.2018.35 ◽

2018 ◽

Vol 13 (4) ◽

pp. 403-408 ◽

Cited By ~ 1

Author(s):

Jeff Bodington ◽

Manuel Malfeito-Ferreira

Keyword(s):

Null Hypothesis ◽

Expected Value ◽

Small Sample ◽

Sample Sizes ◽

Large Sample ◽

Related Variation ◽

The Difference ◽

Small Sample Sizes ◽

Jel Classifications

AbstractMuch research shows that women and men have different taste acuities and preferences. If female and male judges tend to assign different ratings to the same wines, then the gender balances of the judge panels will bias awards. Existing research supports the null hypothesis, however, that finding is based on small sample sizes. This article presents the results for a large sample; 260 wines and 1,736 wine-score observations. Subject to the strong qualification that non-gender-related variation is material, the results affirm that female and male judges do assign about the same ratings to the same wines. The expected value of the difference in their mean ratings is zero. (JEL Classifications: A10, C00, C10, C12, D12)

Download Full-text

Small Sample Sizes and Low Statistical Power: Barriers, Concerns, and Contradictions in EEG/ERP Research

International Journal of Psychophysiology ◽

10.1016/j.ijpsycho.2021.07.189 ◽

2021 ◽

Vol 168 ◽

pp. S62

Author(s):

Kaylie Carbine

Keyword(s):

Statistical Power ◽

Small Sample ◽

Sample Sizes ◽

Small Sample Sizes

Download Full-text

Evaluation of TagSeq, a reliable low-cost alternative for RNAseq

10.1101/036426 ◽

2016 ◽

Cited By ~ 5

Author(s):

Brian Keith Lohman ◽

Jesse N Weber ◽

Daniel I Bolnick

Keyword(s):

Gene Expression ◽

Statistical Power ◽

Low Cost ◽

Ecological Genetics ◽

Small Sample ◽

Sample Sizes ◽

Experimental Conditions ◽

Efficient Alternative ◽

Highly Correlated ◽

Small Sample Sizes

RNAseq is a relatively new tool for ecological genetics that offers researchers insight into changes in gene expression in response to a myriad of natural or experimental conditions. However, standard RNAseq methods (e.g., Illumina TruSeq® or NEBNext®) can be cost prohibitive, especially when study designs require large sample sizes. Consequently, RNAseq is often underused as a method, or is applied to small sample sizes that confer poor statistical power. Low cost RNAseq methods could therefore enable far greater and more powerful applications of transcriptomics in ecological genetics and beyond. Standard mRNAseq is costly partly because one sequences portions of the full length of all transcripts. Such whole-mRNA data is redundant for estimates of relative gene expression. TagSeq is an alternative method that focuses sequencing effort on mRNAs 3-prime end, thereby reducing the necessary sequencing depth per sample, and thus cost. Here we present a revised TagSeq protocol, and compare its performance against NEBNext®, the gold-standard whole mRNAseq method. We built both TagSeq and NEBNext® libraries from the same biological samples, each spiked with control RNAs. We found that TagSeq measured the control RNA distribution more accurately than NEBNext®, for a fraction of the cost per sample (~10%). The higher accuracy of TagSeq was particularly apparent for transcripts of moderate to low abundance. Technical replicates of TagSeq libraries are highly correlated, and were correlated with NEBNext® results. Overall, we show that our modified TagSeq protocol is an efficient alternative to traditional whole mRNAseq, offering researchers comparable data at greatly reduced cost.

Download Full-text

Comparing Alternative Corrections for Bias in the Bias-Corrected Bootstrap Test of Mediation

Evaluation & the Health Professions ◽

10.1177/01632787211024356 ◽

2021 ◽

pp. 016327872110243

Author(s):

Donna Chen ◽

Matthew S. Fritz

Keyword(s):

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Small Sample ◽

Medium Effect ◽

Type I ◽

Sample Sizes ◽

Bootstrap Test ◽

Small Effect Size ◽

Small Sample Sizes

Although the bias-corrected (BC) bootstrap is an often-recommended method for testing mediation due to its higher statistical power relative to other tests, it has also been found to have elevated Type I error rates with small sample sizes. Under limitations for participant recruitment, obtaining a larger sample size is not always feasible. Thus, this study examines whether using alternative corrections for bias in the BC bootstrap test of mediation for small sample sizes can achieve equal levels of statistical power without the associated increase in Type I error. A simulation study was conducted to compare Efron and Tibshirani’s original correction for bias, z 0, to six alternative corrections for bias: (a) mean, (b–e) Winsorized mean with 10%, 20%, 30%, and 40% trimming in each tail, and (f) medcouple (robust skewness measure). Most variation in Type I error (given a medium effect size of one regression slope and zero for the other slope) and power (small effect size in both regression slopes) was found with small sample sizes. Recommendations for applied researchers are made based on the results. An empirical example using data from the ATLAS drug prevention intervention study is presented to illustrate these results. Limitations and future directions are discussed.

Download Full-text

When possible, report a Fisher-exactPvalue and display its underlying null randomization distribution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1915454117 ◽

2020 ◽

Vol 117 (32) ◽

pp. 19151-19158 ◽

Cited By ~ 2

Author(s):

M.-A. C. Bind ◽

D. B. Rubin

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

The Other ◽

Randomized Experiment ◽

Randomized Experiments ◽

Sample Sizes ◽

Statistical Framework ◽

Small Sample Sizes ◽

Second Period ◽

Randomization Distribution

In randomized experiments, Fisher-exactPvalues are available and should be used to help evaluate results rather than the more commonly reported asymptoticPvalues. One reason is that using the latter can effectively alter the question being addressed by including irrelevant distributional assumptions. The Fisherian statistical framework, proposed in 1925, calculates aPvalue in a randomized experiment by using the actual randomization procedure that led to the observed data. Here, we illustrate this Fisherian framework in a crossover randomized experiment. First, we consider the first period of the experiment and analyze its data as a completely randomized experiment, ignoring the second period; then, we consider both periods. For each analysis, we focus on 10 outcomes that illustrate important differences between the asymptotic and Fisher tests for the null hypothesis of no ozone effect. For some outcomes, the traditionalPvalue based on the approximating asymptotic Student’stdistribution substantially subceeded the minimum attainable Fisher-exactPvalue. For the other outcomes, the Fisher-exact null randomization distribution substantially differed from the bell-shaped one assumed by the asymptoticttest. Our conclusions: When researchers choose to reportPvalues in randomized experiments, 1) Fisher-exactPvalues should be used, especially in studies with small sample sizes, and 2) the shape of the actual null randomization distribution should be examined for the recondite scientific insights it may reveal.

Download Full-text

Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power—Commentary on Vul et al. (2009)

Perspectives on Psychological Science ◽

10.1111/j.1745-6924.2009.01127.x ◽

2009 ◽

Vol 4 (3) ◽

pp. 294-298 ◽

Cited By ~ 285

Author(s):

Tal Yarkoni

Keyword(s):

Empirical Analysis ◽

Cognitive Neuroscience ◽

Statistical Power ◽

Small Sample ◽

Sample Sizes ◽

Whole Brain ◽

Small Sample Sizes ◽

Brain Fmri

Vul, Harris, Winkielman, and Pashler (2009) , (this issue) argue that correlations in many cognitive neuroscience studies are grossly inflated due to a widespread tendency to use nonindependent analyses. In this article, I argue that Vul et al.'s primary conclusion is correct, but for different reasons than they suggest. I demonstrate that the primary cause of grossly inflated correlations in whole-brain fMRI analyses is not nonindependence, but the pernicious combination of small sample sizes and stringent alpha-correction levels. Far from defusing Vul et al.'s conclusions, the simulations presented suggest that the level of inflation may be even worse than Vul et al.'s empirical analysis would suggest.

Download Full-text