Do sexist jokes increase rape proclivity among males high in hostile sexism? Evidence from two pre-registered direct replications of Thomae & Viki (2013)

2021 ◽  
Author(s):  
Neil McLatchie ◽  
Manuela Thomae

Thomae and Viki (2013) reported that increased exposure to sexist humour can increase rape proclivity among males, specifically those who score high on measures of Hostile Sexism. Here we report two pre-registered direct replications (N = 530) of Study 2 from Thomae and Viki (2013) and assess replicability via (i) statistical significance, (ii) Bayes factors, (iii) the small-telescope approach, and (iv) an internal meta-analysis across the original and replication studies. The original results were not supported by any of the approaches. Combining the original study and the replications yielded moderate evidence in support of the null over the alternative hypothesis with a Bayes factor of B = 0.13. In light of the combined evidence, we encourage researchers to exercise caution before claiming that brief exposure to sexist humour increases male’s proclivity towards rape, until further pre-registered and open research demonstrates the effect is reliably reproducible.

2020 ◽  
Vol 31 (7) ◽  
pp. 858-864 ◽  
Author(s):  
Will M. Gervais ◽  
Stephanie E. McKee ◽  
Sarah Malik

Do reminders of God encourage people to take more risks? Kupor, Laurin, and Levav (2015) reported nine studies that all yielded statistically significant results consistent with the hypothesis that they do. We conducted two large-sample Preregistered Direct Replications ( N = 1,104) of studies in Kupor et al.’s article (Studies 1a and 1b) and evaluated replicability via (a) statistical significance, (b) a “small-telescopes” approach, (c) Bayes factors (BFs), and (d) meta-analyses pooled across original and replication studies. None of these approaches replicated the original studies’ effects. Combining both original studies and both replications yielded strong evidence in support of the null over a default alternative hypothesis, BF01 = 11.04, meaning that the totality of evidence speaks against the possibility that religious primes increased nonmoral risk taking in these designs. This suggests that support for the “anticipating-divine-protection” hypothesis may be overstated.


2017 ◽  
Vol 4 (1) ◽  
pp. 160426 ◽  
Author(s):  
Maarten Marsman ◽  
Felix D. Schönbrodt ◽  
Richard D. Morey ◽  
Yuling Yao ◽  
Andrew Gelman ◽  
...  

We applied three Bayesian methods to reanalyse the preregistered contributions to the Social Psychology special issue ‘Replications of Important Results in Social Psychology’ (Nosek & Lakens. 2014 Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45 , 137–141. ( doi:10.1027/1864-9335/a000192 )). First, individual-experiment Bayesian parameter estimation revealed that for directed effect size measures, only three out of 44 central 95% credible intervals did not overlap with zero and fell in the expected direction. For undirected effect size measures, only four out of 59 credible intervals contained values greater than 0.10 (10% of variance explained) and only 19 intervals contained values larger than 0.05 . Second, a Bayesian random-effects meta-analysis for all 38 t -tests showed that only one out of the 38 hierarchically estimated credible intervals did not overlap with zero and fell in the expected direction. Third, a Bayes factor hypothesis test was used to quantify the evidence for the null hypothesis against a default one-sided alternative. Only seven out of 60 Bayes factors indicated non-anecdotal support in favour of the alternative hypothesis ( BF 10 > 3 ), whereas 51 Bayes factors indicated at least some support for the null hypothesis. We hope that future analyses of replication success will embrace a more inclusive statistical approach by adopting a wider range of complementary techniques.


2018 ◽  
Author(s):  
Will M Gervais ◽  
Stephanie Elizabeth McKee ◽  
Sarah Malik

Do reminders of God encourage people to take more risks? A recent paper (Kupor, Laurin, & Levav, 2015) reported 9 studies that all yielded statistically significant results consistent with hypothesis. We conducted two large-sample preregistered direct replications (total N = 1104) of studies in this paper (Studies 1a and 1b), and evaluated replicability via 1) statistical significance, 2) a “small telescopes” approach (Simonsohn, 2015), 3) Bayes factors (Gronau, Ly, & Wagenmakers, 2017), and 4) meta-analyses pooling across original and replication studies. None of these approaches found replicable effects. Combining both original studies and both replications yields strong evidence in support of the null over a default alternative hypothesis, BF01 = 11.04, meaning that the totality of evidence speaks against the possibility that religious primes increase nonmoral risk taking in these designs. This suggests that support for the “anticipating divine protection” hypothesis may be overstated. Preprint https://psyarxiv.com/8f7qd/


2016 ◽  
Vol 27 (2) ◽  
pp. 364-383 ◽  
Author(s):  
Stefano Cabras

The problem of multiple hypothesis testing can be represented as a Markov process where a new alternative hypothesis is accepted in accordance with its relative evidence to the currently accepted one. This virtual and not formally observed process provides the most probable set of non null hypotheses given the data; it plays the same role as Markov Chain Monte Carlo in approximating a posterior distribution. To apply this representation and obtain the posterior probabilities over all alternative hypotheses, it is enough to have, for each test, barely defined Bayes Factors, e.g. Bayes Factors obtained up to an unknown constant. Such Bayes Factors may either arise from using default and improper priors or from calibrating p-values with respect to their corresponding Bayes Factor lower bound. Both sources of evidence are used to form a Markov transition kernel on the space of hypotheses. The approach leads to easy interpretable results and involves very simple formulas suitable to analyze large datasets as those arising from gene expression data (microarray or RNA-seq experiments).


Author(s):  
Colin Foster

AbstractConfidence assessment (CA) involves students stating alongside each of their answers a confidence rating (e.g. 0 low to 10 high) to express how certain they are that their answer is correct. Each student’s score is calculated as the sum of the confidence ratings on the items that they answered correctly, minus the sum of the confidence ratings on the items that they answered incorrectly; this scoring system is designed to incentivize students to give truthful confidence ratings. Previous research found that secondary-school mathematics students readily understood the negative-marking feature of a CA instrument used during one lesson, and that they were generally positive about the CA approach. This paper reports on a quasi-experimental trial of CA in four secondary-school mathematics lessons (N = 475 students) across time periods ranging from 3 weeks up to one academic year, compared to business-as-usual controls. A meta-analysis of the effect sizes across the four schools gave an aggregated Cohen’s d of –0.02 [95% CI –0.22, 0.19] and an overall Bayes Factor B01 of 8.48. This indicated substantial evidence for the null hypothesis that there was no difference between the attainment gains of the intervention group and the control group, relative to the alternative hypothesis that the gains were different. I conclude that incorporating confidence assessment into low-stakes classroom mathematics formative assessments does not appear to be detrimental to students’ attainment, and I suggest reasons why a clear positive outcome was not obtained.


2018 ◽  
Vol 1 (2) ◽  
pp. 198-218 ◽  
Author(s):  
Gerd Gigerenzer

The “replication crisis” has been attributed to misguided external incentives gamed by researchers (the strategic-game hypothesis). Here, I want to draw attention to a complementary internal factor, namely, researchers’ widespread faith in a statistical ritual and associated delusions (the statistical-ritual hypothesis). The “null ritual,” unknown in statistics proper, eliminates judgment precisely at points where statistical theories demand it. The crucial delusion is that the p value specifies the probability of a successful replication (i.e., 1 – p), which makes replication studies appear to be superfluous. A review of studies with 839 academic psychologists and 991 students shows that the replication delusion existed among 20% of the faculty teaching statistics in psychology, 39% of the professors and lecturers, and 66% of the students. Two further beliefs, the illusion of certainty (e.g., that statistical significance proves that an effect exists) and Bayesian wishful thinking (e.g., that the probability of the alternative hypothesis being true is 1 – p), also make successful replication appear to be certain or almost certain, respectively. In every study reviewed, the majority of researchers (56%–97%) exhibited one or more of these delusions. Psychology departments need to begin teaching statistical thinking, not rituals, and journal editors should no longer accept manuscripts that report results as “significant” or “not significant.”


2020 ◽  
Author(s):  
Hidde Jelmer Leplaa ◽  
Charlotte Rietbergen ◽  
Herbert Hoijtink

In this paper a method is proposed to determine whether the result from an original study is corroborated in a replication study. The paper is illustrated using data from the reproducibility project psychology by the Open Science Collaboration. This method emphasizes the need to determine what one wants to replicate: the hypotheses as formulated in the introduction of the original paper, or hypotheses derived from the research results presented in the original paper. The Bayes factor will be used to determine whether the hypotheses evaluated in/resulting from the original study are corroborated by the replication study. Our method to assess the successfulness of replication will better fit the needs and desires of researchers in fields that use replication studies.


Author(s):  
Darius Adam Rohani ◽  
Maria Faurholt-Jepsen ◽  
Lars Vedel Kessing ◽  
Jakob Eyvind Bardram

BACKGROUND Several studies have recently reported on the correlation between objective behavioral features collected via mobile and wearable technologies and depressive mood symptoms in affective disorders (unipolar disorder and bipolar disorder). However, individual studies have reported on different and sometimes contradicting results, and no quantitative systematic review of the correlation between objective behavioral features and depressive mood symptoms has been published. OBJECTIVE The objectives of this systematic review were to 1) provide an overview of correlations between objective behavioral features and depressive mood symptoms reported in the literature, and 2) investigate the strength and statistical significance of these correlations across studies. The answers to these questions could potentially help in the identification on which objective features have shown most promising results across studies. METHODS A systematic review of the scientific literature reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines was conducted. IEEE Xplore, ACM Digital Library, Web of Sciences, PsychINFO, Pubmed, DBLP computer science bibliography, HTA, DARE, Scopus and Science Direct were searched and supplemented by hand examination of reference lists. The search ended 04/27-2017 and was limited to studies published 2007-2017. RESULTS A total of 46 studies were eligible for the review. These studies identified and investigated 85 unique objective behavioral features covering 17 various sensor data inputs. These features can be categorized into seven overall categories. Several features were found to have statistically significant and consistent correlation directionality with mood assessment (e.g., the amount of home stay, sleep duration, vigorous activity), while others showed directionality discrepancies across the studies (e.g., amount of SMS sent, time you spend between locations, frequency of smartphone screen activity). CONCLUSIONS Several studies showed consistent and statistically significant correlations between objective behavioral features collected by mobile and wearable technology and depressive mood symptoms. Hence, continuous and every-day monitoring of behavioral aspects in affective disorders could be a promising supplementary objective measure to estimate depressive mood symptoms. However, the evidence is limited by methodological issues in individual studies and by a lack of standardization of 1) the collected objective features, 2) the mood assessment methodology, and 3) the statistical methods applied. Therefore, consistency in data collection and analysis in future studies is needed making replication studies as well as meta-analyses possible.


Author(s):  
Min Yuan ◽  
Xiaoqing Pan ◽  
Yaning Yang

AbstractAdaptive transmission disequilibrium test (aTDT) and MAX3 test are two robust-efficient association tests for case-parent family trio data. Both tests incorporate information of common genetic models including recessive, additive and dominant models and are efficient in power and robust to genetic model specifications. The aTDT uses information of departure from Hardy-Weinberg disequilibrium to identify the potential genetic model underlying the data and then applies the corresponding TDT-type test, and the MAX3 test is defined as the maximum of the absolute value of three TDT-type tests under the three common genetic models. In this article, we propose three robust Bayes procedures, the aTDT based Bayes factor, MAX3 based Bayes factor and Bayes model averaging (BMA), for association analysis with case-parent trio design. The asymptotic distributions of aTDT under the null and alternative hypothesis are derived in order to calculate its Bayes factor. Extensive simulations show that the Bayes factors and the


2019 ◽  
Author(s):  
Francesco Margoni ◽  
Martin Shepperd

Infant research is making considerable progresses. However, among infant researchers there is growing concern regarding the widespread habit of undertaking studies that have small sample sizes and employ tests with low statistical power (to detect a wide range of possible effects). For many researchers, issues of confidence may be partially resolved by relying on replications. Here, we bring further evidence that the classical logic of confirmation, according to which the result of a replication study confirms the original finding when it reaches statistical significance, could be usefully abandoned. With real examples taken from the infant literature and Monte Carlo simulations, we show that a very wide range of possible replication results would in a formal statistical sense constitute confirmation as they can be explained simply due to sampling error. Thus, often no useful conclusion can be derived from a single or small number of replication studies. We suggest that, in order to accumulate and generate new knowledge, the dichotomous view of replication as confirmatory/disconfirmatory can be replaced by an approach that emphasizes the estimation of effect sizes via meta-analysis. Moreover, we discuss possible solutions for reducing problems affecting the validity of conclusions drawn from meta-analyses in infant research.


Sign in / Sign up

Export Citation Format

Share Document