How to use and report Bayesian hypothesis tests

Mapping Intimacies ◽

10.31234/osf.io/bua5n ◽

2020 ◽

Cited By ~ 1

Author(s):

Zoltan Dienes

Keyword(s):

Bayes Factor ◽

Hypothesis Test ◽

Effect Sizes ◽

Conscious Perception ◽

Hypothesis Tests ◽

Bayesian Analyses ◽

Interesting Effect ◽

Frequentist Statistics ◽

Scale Of Effect ◽

Bayesian Hypothesis Test

This article provides guidance on interpreting and reporting Bayesian hypothesis tests, in order to aid their understanding. To use and report a Bayesian hypothesis test, predicted effect sizes must be specified. The paper will provide guidance in specifying effect sizes of interest (which also will be of relevance to those using frequentist statistics). First, if a minimally interesting effect size can be specified, a null interval is defined as the effects smaller in magnitude than the minimally interesting effect. Then the proportion of the posterior distribution that falls in the null interval indicates the plausibility of the null interval hypothesis. Second, if a rough scale of effect can be determined, a Bayes factor can indicate evidence for a model representing that scale of effect versus a model of H0. Both methods allow data to count against a theory that predicts a difference. By contrast, non-significance does not count against such a theory. Various examples are provided including the suitability of Bayesian analyses for demonstrating the absence of conscious perception under putative subliminal conditions, and its presence in supraliminal conditions.

Download Full-text

The reign of the p -value is over: what alternative analyses could we employ to fill the power vacuum?

Biology Letters ◽

10.1098/rsbl.2019.0174 ◽

2019 ◽

Vol 15 (5) ◽

pp. 20190174 ◽

Cited By ~ 51

Author(s):

Lewis G. Halsey

Keyword(s):

Bayes Factor ◽

Model Building ◽

Information Criteria ◽

Effect Sizes ◽

P Value ◽

Significant Finding ◽

Limited Information ◽

Frequentist Statistics ◽

Alternative Hypotheses ◽

Multiple Variables

The p -value has long been the figurehead of statistical analysis in biology, but its position is under threat. p is now widely recognized as providing quite limited information about our data, and as being easily misinterpreted. Many biologists are aware of p 's frailties, but less clear about how they might change the way they analyse their data in response. This article highlights and summarizes four broad statistical approaches that augment or replace the p -value, and that are relatively straightforward to apply. First, you can augment your p -value with information about how confident you are in it, how likely it is that you will get a similar p -value in a replicate study, or the probability that a statistically significant finding is in fact a false positive. Second, you can enhance the information provided by frequentist statistics with a focus on effect sizes and a quantified confidence that those effect sizes are accurate. Third, you can augment or substitute p -values with the Bayes factor to inform on the relative levels of evidence for the null and alternative hypotheses; this approach is particularly appropriate for studies where you wish to keep collecting data until clear evidence for or against your hypothesis has accrued. Finally, specifically where you are using multiple variables to predict an outcome through model building, Akaike information criteria can take the place of the p -value, providing quantified information on what model is best. Hopefully, this quick-and-easy guide to some simple yet powerful statistical options will support biologists in adopting new approaches where they feel that the p -value alone is not doing their data justice.

Download Full-text

Subjective Likelihood and the Construal Level of Future Events: A Replication Study of Wakslak, Trope, Liberman, and Alony (2006)

10.31234/osf.io/gd6ej ◽

2020 ◽

Author(s):

Sofia Calderon ◽

Erik Mac Giolla ◽

Karl Ask ◽

Pär Anders Granhag

Keyword(s):

Mental Representation ◽

Effect Sizes ◽

Major Influence ◽

Construal Level ◽

Bayesian Analyses ◽

Previous Conclusion ◽

Strong Trend ◽

Future Events ◽

The Relationship ◽

Diagnostic Support

C. J. Wakslak, Y. Trope, N. Liberman, and R. Alony (2006), Seeing the forest when entry is unlikely: Probability and the mental representation of events, Journal of experimental psychology: General, examined the effect of manipulating the likelihood of future events on level of construal (i.e., mental abstraction). Over seven experiments, they consistently found that subjectively unlikely (vs. likely) future events were more abstractly (vs. concretely) construed. This well-cited, but understudied finding has had a major influence on the CLT literature: Likelihood is considered to be one of four psychological distances assumed to influence mental abstraction in similar ways (Trope & Liberman, 2010). Contrary to the original empirical findings, we present two close replication attempts (N = 115 and N = 120; the original studies had N = 20 and N = 34) which failed to find the effect of likelihood on construal level. Bayesian analyses provided diagnostic support for the absence of an effect. In light of the failed replications, we present a meta-analytic summary of the accumulated evidence on the effect. It suggests a strong trend of declining effect sizes as a function of larger samples. These results call into question the previous conclusion that likelihood has a reliable influence on construal level. We discuss the implications of these findings for construal level theory, and advise against treating likelihood as a psychological distance until further tests have established the relationship.

Download Full-text

Improving Transparency, Falsifiability, and Rigour by Making Hypothesis Tests Machine Readable

10.31234/osf.io/5xcda ◽

2020 ◽

Cited By ~ 2

Author(s):

Daniel Lakens ◽

Lisa Marie DeBruine

Keyword(s):

Peer Review ◽

Hypothesis Test ◽

Scientific Information ◽

Real Life ◽

R Package ◽

Statistical Prediction ◽

Hypothesis Tests ◽

Research Questions ◽

Meta Analyses ◽

Machine Readable

Making scientific information machine-readable greatly facilitates its re-use. Many scientific articles have the goal to test a hypothesis, so making the tests of statistical predictions easier to find and access could be very beneficial. We propose an approach that can be used to make hypothesis tests machine readable. We believe there are two benefits to specifying a hypothesis test in a way that a computer can evaluate whether the statistical prediction is corroborated or not. First, hypothesis test will become more transparent, falsifiable, and rigorous. Second, scientists will benefit if information related to hypothesis tests in scientific articles is easily findable and re-usable, for example when performing meta-analyses, during peer review, and when examining meta-scientific research questions. We examine what a machine readable hypothesis test should look like, and demonstrate the feasibility of machine readable hypothesis tests in a real-life example using the fully operational prototype R package scienceverse.

Download Full-text

Response to Technical Comment on Rickles, Heppen, Allensworth, Sorensen, and Walters (2018)

Educational Researcher ◽

10.3102/0013189x19848731 ◽

2019 ◽

Vol 48 (4) ◽

pp. 241-243

Author(s):

Jordan Rickles ◽

Jessica B. Heppen ◽

Elaine Allensworth ◽

Nicholas Sorensen ◽

Kirk Walters

Keyword(s):

Confidence Intervals ◽

Null Hypothesis ◽

Hypothesis Test ◽

Hypothesis Tests

In response to the concerns White raises in his technical comment on Rickles, Heppen, Allensworth, Sorensen, and Walters (2018), we discuss whether it would have been appropriate to test for nominally equivalent outcomes, given that the study was initially conceived and designed to test for significant differences, and that the conclusion of no difference was not solely based on a null hypothesis test. To further support the article’s conclusion, confidence intervals for the null hypothesis tests and a test of equivalence are provided.

Download Full-text

A default Bayesian hypothesis test for correlations and partial correlations

Psychonomic Bulletin & Review ◽

10.3758/s13423-012-0295-x ◽

2012 ◽

Vol 19 (6) ◽

pp. 1057-1064 ◽

Cited By ~ 244

Author(s):

Ruud Wetzels ◽

Eric-Jan Wagenmakers

Keyword(s):

Hypothesis Test ◽

Partial Correlations ◽

Bayesian Hypothesis Test

Download Full-text

A phase II trial of gemcitabine (G), cisplatin (C), and nab-paclitaxel (N) in advanced biliary tract cancers (aBTCs): Updated survival analysis.

Journal of Clinical Oncology ◽

10.1200/jco.2018.36.4_suppl.350 ◽

2018 ◽

Vol 36 (4_suppl) ◽

pp. 350-350

Author(s):

Rachna T. Shroff ◽

Milind M. Javle ◽

Lianchun Xiao ◽

Ahmed Omar Kaseb ◽

Gauri R. Varadhachary ◽

...

Keyword(s):

Dose Level ◽

Phase Ii ◽

Initial Dose ◽

Hypothesis Test ◽

Progression Free Survival ◽

Median Number ◽

Controlled Study ◽

Grade 3 ◽

Bayesian Hypothesis Test ◽

Md Anderson

350 Background: BTCs are often diagnosed at an advanced stage and have a poor prognosis. The standard therapy for aBTCs is the combination of GC. However, the median overall survival (mOS) is dismal at 11.7 months (mos) with a median progression free survival (mPFS) of 8 mos. Methods: A single arm, phase II study was conducted at MD Anderson and Mayo Clinic Arizona. Patients (pts) with aBTC were treated at initial dose level of G/C/N (in mg/m2) at 1000/25/125 (n = 30) which was reduced to lower doses due to grade 3/4 hematological (heme) toxicity (tox) - G/C/N: 800/25/100 (n = 30). Cycles were q21 days with restaging q3 cycles until progression. PFS was the primary endpoint (endpt). Using a Bayesian hypothesis test-based design, we assumed mPFS of 8 mos under the null hypothesis (H0), 10 mos under the alternative (H1). Secondary endpts included mOS, RECIST v1.1 response rate (RR), safety and CA19-9 response. Results: 60 pts were enrolled with 51 being response-evaluable having received more than 1 cycle of therapy (age: median 60 yrs [range 31-77], ECOG PS 0/1 (22/38), M/F (33/27), intrahepatic cholangiocarcinoma/extrahepatic/gallbladder (38/9/13). Median follow-up was 14 mos and median number of treatment (trmt) cycles = 5.24. Pts at initial dose level had significant grade 3/4 heme tox: neutropenia, febrile neutropenia, anemia, and thrombocytopenia leading to trmt discontinuation in 6/30 pts. After dose reduction to G/C/N (in mg/m2) at 800/25/100, trmt was better tolerated with only 3 pts experiencing grade 4 heme tox. Non-heme tox were grade 3 in 19 pts: nausea/vomiting, diarrhea, thromboembolic event/CVA, hypokalemia, constipation, cystitis, LFT elevations. The mPFS = 11.4 mos (95% CI: 6.1, 16.1) and mOS = 19.2 (95%CI: 13.6, NA), 1-year survival rate 67.6%. 51 pts evaluable for response: disease control rate (PR+CR+SD)-84.3% and RR-39%. 12 unresectable cases were operated post trmt with 1 pathologic CR. Conclusions: The combination of GCN was well tolerated at adjusted doses and demonstrates encouraging efficacy having met its mPFS endpt and an impressive mOS higher than historical control. These results merit evaluating GC +/-N in a randomized controlled study. Clinical trial information: NCT02392637.

Download Full-text

Assessing and Comparing Anesthesiologists’ Performance on Mandated Metrics Using a Bayesian Approach

Anesthesiology ◽

10.1097/aln.0000000000000667 ◽

2015 ◽

Vol 123 (1) ◽

pp. 101-115 ◽

Cited By ~ 10

Author(s):

Emine Ozgur Bayman ◽

Franklin Dexter ◽

Michael M. Todd

Keyword(s):

Blood Pressure ◽

Information Management System ◽

Covariate Adjustment ◽

Time Interval ◽

Anesthesia Induction ◽

Bayesian Analyses ◽

Bayesian Hierarchical ◽

Professional Performance ◽

Frequentist Statistics ◽

American Society

Abstract Background: Periodic assessment of performance by anesthesiologists is required by The Joint Commission Ongoing Professional Performance Evaluation program. Methods: The metrics used in this study were the (1) measurement of blood pressure and (2) oxygen saturation (Spo2) either before or less than 5 min after anesthesia induction. Noncompliance was defined as no measurement within this time interval. The authors assessed the frequency of noncompliance using information from 63,913 cases drawn from the anesthesia information management system. To adjust for differences in patient and procedural characteristics, 135 preoperative variables were analyzed with decision trees. The retained covariate for the blood pressure metric was patient’s age and, for Spo2 metric, was American Society of Anesthesiologist’s physical status, whether the patient was coming from an intensive care unit, and whether induction occurred within 5 min of the start of the scheduled workday. A Bayesian hierarchical model, designed to identify anesthesiologists as “performance outliers,” after adjustment for covariates, was developed and was compared with frequentist methods. Results: The global incidences of noncompliance (with frequentist 95% CI) were 5.35% (5.17 to 5.53%) for blood pressure and 1.22% (1.14 to 1.30%) for Spo2 metrics. By using unadjusted rates and frequentist statistics, it was found that up to 43% of anesthesiologists would be deemed noncompliant for the blood pressure metric and 70% of anesthesiologists for the Spo2 metric. By using Bayesian analyses with covariate adjustment, only 2.44% (1.28 to 3.60%) and 0.00% of the anesthesiologists would be deemed “noncompliant” for blood pressure and Spo2, respectively. Conclusion: Bayesian hierarchical multivariate methodology with covariate adjustment is better suited to faculty monitoring than the nonhierarchical frequentist approach.

Download Full-text

A default Bayesian hypothesis test for mediation

Behavior Research Methods ◽

10.3758/s13428-014-0470-2 ◽

2014 ◽

Vol 47 (1) ◽

pp. 85-97 ◽

Cited By ~ 37

Author(s):

Michèle B. Nuijten ◽

Ruud Wetzels ◽

Dora Matzke ◽

Conor V. Dolan ◽

Eric-Jan Wagenmakers

Keyword(s):

Hypothesis Test ◽

Bayesian Hypothesis Test

Download Full-text

Reexamining the effect of gustatory disgust on moral judgment: A multi-lab direct replication of Eskine, Kacinik, and Prinz (2011)

10.31234/osf.io/349pk ◽

2018 ◽

Cited By ~ 1

Author(s):

Eric Ghelfi ◽

Cody D Christopherson ◽

Heather L. Urry ◽

Richie L Lenne ◽

Nicole Legate ◽

...

Keyword(s):

Large Scale ◽

Bayes Factor ◽

Mixed Effects ◽

Original Study ◽

Taste Perception ◽

Effect Sizes ◽

Linear Mixed Effects ◽

Standardized Effect Sizes ◽

Meta Analyses ◽

Moral Wrongness

Eskine, Kacinik, and Prinz’s (2011) influential experiment demonstrated that gustatory disgust triggers a heightened sense of moral wrongness. We report a large-scale multi-site direct replication of this study conducted by participants in the Collaborative Replications and Education Project. Participants in each sample were randomly assigned to one of three beverage conditions: bitter/disgusting, control, or sweet. Then, participants made a series of judgments indicating the moral wrongness of the behavior depicted in each of six vignettes. In the original study (N = 57), drinking the bitter beverage led to higher ratings of moral wrongness than drinking the control and sweet beverages; a beverage contrast was significant among conservative (N = 19) but not liberal (N = 25) participants. In this report, random effects meta-analyses across all participants (N = 1,137 in k = 11 studies), conservative participants (N = 142, k = 5), and liberal participants (N = 635, k = 9) revealed standardized effect sizes that were smaller than reported in the original study. Some were in the opposite of the predicted direction, all had 95% confidence intervals containing zero, and most were smaller than the effect size the original authors could meaningfully detect. In linear mixed-effects regressions, drinking the bitter beverage led to higher ratings of moral wrongness than drinking the control beverage but not the sweet beverage. Bayes Factor tests reveal greater relative support for the null hypothesis. The overall pattern provides little to no support for the theory that physical disgust via taste perception harshens judgments of moral wrongness.

Download Full-text

Conserved novel ORFs in the mitochondrial genome of the ctenophore Beroe forskalii

PeerJ ◽

10.7717/peerj.8356 ◽

2020 ◽

Vol 8 ◽

pp. e8356

Author(s):

Darrin T. Schultz ◽

Jordan M. Eizenga ◽

Russell B. Corbett-Detig ◽

Warren R. Francis ◽

Lynne M. Christianson ◽

...

Keyword(s):

Mitochondrial Genome ◽

Hypothesis Test ◽

Open Reading Frames ◽

Mitochondrial Genomes ◽

Intergenic Sequence ◽

Computational Tools ◽

Protein Coding ◽

Bayesian Hypothesis Test ◽

And Function ◽

Reading Frames

To date, five ctenophore species’ mitochondrial genomes have been sequenced, and each contains open reading frames (ORFs) that if translated have no identifiable orthologs. ORFs with no identifiable orthologs are called unidentified reading frames (URFs). If truly protein-coding, ctenophore mitochondrial URFs represent a little understood path in early-diverging metazoan mitochondrial evolution and metabolism. We sequenced and annotated the mitochondrial genomes of three individuals of the beroid ctenophore Beroe forskalii and found that in addition to sharing the same canonical mitochondrial genes as other ctenophores, the B. forskalii mitochondrial genome contains two URFs. These URFs are conserved among the three individuals but not found in other sequenced species. We developed computational tools called pauvre and cuttlery to determine the likelihood that URFs are protein coding. There is evidence that the two URFs are under negative selection, and a novel Bayesian hypothesis test of trinucleotide frequency shows that the URFs are more similar to known coding genes than noncoding intergenic sequence. Protein structure and function prediction of all ctenophore URFs suggests that they all code for transmembrane transport proteins. These findings, along with the presence of URFs in other sequenced ctenophore mitochondrial genomes, suggest that ctenophores may have uncharacterized transmembrane proteins present in their mitochondria.

Download Full-text