Bayesian paired comparison with the bpcs package

Behavior Research Methods ◽

10.3758/s13428-021-01714-2 ◽

2021 ◽

Author(s):

David Issa Mattos ◽

Érika Martins Silva Ramos

Keyword(s):

Posterior Distribution ◽

Latent Variables ◽

Type I Error ◽

Paired Comparison ◽

Bayesian Models ◽

R Package ◽

Type I ◽

Online Appendix ◽

Credible Intervals ◽

Comparison Data

AbstractThis article introduces the R package (Bayesian Paired Comparison in Stan) and the statistical models implemented in the package. This package aims to facilitate the use of Bayesian models for paired comparison data in behavioral research. Bayesian analysis of paired comparison data allows parameter estimation even in conditions where the maximum likelihood does not exist, allows easy extension of paired comparison models, provides straightforward interpretation of the results with credible intervals, has better control of type I error, has more robust evidence towards the null hypothesis, allows propagation of uncertainties, includes prior information, and performs well when handling models with many parameters and latent variables. The package provides a consistent interface for R users and several functions to evaluate the posterior distribution of all parameters to estimate the posterior distribution of any contest between items and to obtain the posterior distribution of the ranks. Three reanalyses of recent studies that used the frequentist Bradley–Terry model are presented. These reanalyses are conducted with the Bayesian models of the package, and all the code used to fit the models, generate the figures, and the tables are available in the online appendix.

Download Full-text

Implementation of the Omega (ω) Index to detect large-scale systematic cheating

10.35542/osf.io/exwkp ◽

2019 ◽

Author(s):

Alvin Vista

Keyword(s):

Standardized Testing ◽

Large Scale ◽

Type I Error ◽

R Package ◽

Statistical Testing ◽

System Level ◽

Control Group ◽

Type I ◽

Data Contamination ◽

Cheating Detection

Cheating detection is an important issue in standardized testing, especially in large-scale settings. Statistical approaches are often computationally intensive and require specialised software to conduct. We present a two-stage approach that quickly filters suspected groups using statistical testing on an IRT-based answer-copying index. We also present an approach to mitigate data contamination and improve the performance of the index. The computation of the index was implemented through a modified version of an open source R package, thus enabling wider access to the method. Using data from PIRLS 2011 (N=64,232) we conduct a simulation to demonstrate our approach. Type I error was well-controlled and no control group was falsely flagged for cheating, while 16 (combined n=12,569) of the 18 (combined n=14,149) simulated groups were detected. Implications for system-level cheating detection and further improvements of the approach were discussed.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating the evidence from evidence factors in observational studies

Biometrika ◽

10.1093/biomet/asz003 ◽

2019 ◽

Vol 106 (2) ◽

pp. 353-367 ◽

Cited By ~ 3

Author(s):

B Karmakar ◽

B French ◽

D S Small

Keyword(s):

Sensitivity Analysis ◽

Observational Study ◽

Error Rate ◽

Type I Error ◽

R Package ◽

Sensitivity Analyses ◽

Bahadur Efficiency ◽

Type I ◽

Familywise Error Rate ◽

Risk Of Cancer

Summary A sensitivity analysis for an observational study assesses how much bias, due to nonrandom assignment of treatment, would be necessary to change the conclusions of an analysis that assumes treatment assignment was effectively random. The evidence for a treatment effect can be strengthened if two different analyses, which could be affected by different types of biases, are both somewhat insensitive to bias. The finding from the observational study is then said to be replicated. Evidence factors allow for two independent analyses to be constructed from the same dataset. When combining the evidence factors, the Type I error rate must be controlled to obtain valid inference. A powerful method is developed for controlling the familywise error rate for sensitivity analyses with evidence factors. It is shown that the Bahadur efficiency of sensitivity analysis for the combined evidence is greater than for either evidence factor alone. The proposed methods are illustrated through a study of the effect of radiation exposure on the risk of cancer. An R package, evidenceFactors, is available from CRAN to implement the methods of the paper.

Download Full-text

A New Approach to Testing Mediation of the Microbiome using the LDM

10.1101/2021.11.12.468449 ◽

2021 ◽

Author(s):

Ye Yue ◽

Yi-Juan Hu

Keyword(s):

Relative Abundance ◽

Type I Error ◽

R Package ◽

New Method ◽

Type I ◽

Inverse Regression ◽

Global Test ◽

New Approach ◽

Mediation Analyses ◽

Mediating Role

Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Methods: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Results: Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset. Availability and Implementation: Our new method has been added to our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.

Download Full-text

Detecting differentially methylated regions using a fast wavelet-based approach to functional association analysis

BMC Bioinformatics ◽

10.1186/s12859-021-03979-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

William R. P. Denault ◽

Astanand Jugessur

Keyword(s):

Type I Error ◽

R Package ◽

Type I ◽

Differentially Methylated Regions ◽

Stat Assoc ◽

Link Type ◽

Fast Wavelet ◽

Computational Speed ◽

Recent Method ◽

Public Repositories

Abstract Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.

Download Full-text

Analysis of Variance Models with Stochastic Group Weights

10.31234/osf.io/t98w7 ◽

2019 ◽

Author(s):

Axel Mayer ◽

Felix Thoemmes

Keyword(s):

Analysis Of Variance ◽

Error Rate ◽

Latent Variables ◽

Structural Equation Models ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Inflated Type ◽

Group Weights

The analysis of variance (ANOVA) is still one of the most widely used statistical methods in the social sciences. This paper is about stochastic group weights in ANOVA models – a neglected aspect in the literature. Stochastic group weights are present whenever the experimenter does not determine the exact group sizes before conducting the experiment. We show that classic ANOVA tests based on estimated marginal means can have an inflated type I error rate when stochastic group weights are not taken into account, even in randomized experiments. We propose two new ways to incorporate stochastic group weights in the tests of average effects - one based on the general linear model and one based on multigroup structural equation models (SEMs). We show in simulation studies that our methods have nominal type I error rates in experiments with stochastic group weights while classic approaches show an inflated type I error rate. The SEM approach can additionally deal with heteroscedastic residual variances and latent variables. An easy-to-use software package with graphical user interface is provided.

Download Full-text

A Simplification and Implementation of Random-effects Meta-analyses Based on the Exact Distribution of Cochran’s Q

Methods of Information in Medicine ◽

10.3414/me13-01-0073 ◽

2014 ◽

Vol 53 (01) ◽

pp. 54-61 ◽

Cited By ~ 6

Author(s):

M. Preuß ◽

A. Ziegler

Keyword(s):

Distribution Function ◽

Cumulative Distribution Function ◽

Type I Error ◽

Meta Analysis ◽

Real Data ◽

R Package ◽

Cumulative Distribution ◽

Data Sets ◽

Type I ◽

Simulation Studies

SummaryBackground: The random-effects (RE) model is the standard choice for meta-analysis in the presence of heterogeneity, and the stand ard RE method is the DerSimonian and Laird (DSL) approach, where the degree of heterogeneity is estimated using a moment-estimator. The DSL approach does not take into account the variability of the estimated heterogeneity variance in the estimation of Cochran’s Q. Biggerstaff and Jackson derived the exact cumulative distribution function (CDF) of Q to account for the variability of Ť 2.Objectives: The first objective is to show that the explicit numerical computation of the density function of Cochran’s Q is not required. The second objective is to develop an R package with the possibility to easily calculate the classical RE method and the new exact RE method.Methods: The novel approach was validated in extensive simulation studies. The different approaches used in the simulation studies, including the exact weights RE meta-analysis, the I 2 and T 2 estimates together with their confidence intervals were implemented in the R package metaxa.Results: The comparison with the classical DSL method showed that the exact weights RE meta-analysis kept the nominal type I error level better and that it had greater power in case of many small studies and a single large study. The Hedges RE approach had inflated type I error levels. Another advantage of the exact weights RE meta-analysis is that an exact confidence interval for T 2is readily available. The exact weights RE approach had greater power in case of few studies, while the restricted maximum likelihood (REML) approach was superior in case of a large number of studies. Differences between the exact weights RE meta-analysis and the DSL approach were observed in the re-analysis of real data sets. Application of the exact weights RE meta-analysis, REML, and the DSL approach to real data sets showed that conclusions between these methods differed.Conclusions: The simplification does not require the calculation of the density of Cochran’s Q, but only the calculation of the cumulative distribution function, while the previous approach required the computation of both the density and the cumulative distribution function. It thus reduces computation time, improves numerical stability, and reduces the approximation error in meta-analysis. The different approaches, including the exact weights RE meta-analysis, the I 2 and T 2estimates together with their confidence intervals are available in the R package metaxa, which can be used in applications.

Download Full-text

Using summary statistics to evaluate the genetic architecture of multiplicative combinations of initially analyzed phenotypes with a flexible choice of covariates

10.1101/2021.03.08.433979 ◽

2021 ◽

Author(s):

Jack Wolf ◽

Jason Westra ◽

Nathan Tintle

Keyword(s):

Linear Models ◽

Type I Error ◽

Genetic Relationships ◽

Data Access ◽

R Package ◽

Type I ◽

Summary Statistics ◽

Patient Privacy ◽

Individual Level ◽

Downstream Analysis

AbstractWhile the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using ‘and’ and ‘or’) with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method’s accuracy through several simulation studies and an application modeling various fatty acid ratios using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.

Download Full-text

Dyadic Measurement Invariance and Its Importance for Replicability in Romantic Relationship Research

10.31234/osf.io/9vcnz ◽

2019 ◽

Cited By ~ 1

Author(s):

John Kitchener Sakaluk ◽

Robyn Kilshaw ◽

Alexandra Noelle Fisher ◽

Connor Emont Leshner

Keyword(s):

Measurement Invariance ◽

Latent Variables ◽

Romantic Relationship ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Model Framework ◽

Psychological Variables ◽

Inflated Type ◽

Regression Slopes

Comparisons of group means, variances, correlations, and/or regression slopes involving psychological variables rely on an assumption of measurement invariance–that the latent variables under investigation have equivalent meaning and measurement across group. When measures are noninvariant, replicability suffers, as comparisons are either conceptually meaningless, or hindered by inflated Type I error rates. We propose that the failure to account for interdependence amongst dyad members when testing measurement invariance may be a potential source of unreplicable findings in relationship research. We develop fully dyadic versions of invariance-testing in an Actor-Partner Interdependence Model framework, and propose a Registered Report for gauging the extent of dyadic (non)invariance in romantic relationship research.

Download Full-text

Group Sequential Designs: A Tutorial

10.31234/osf.io/x4azm ◽

2021 ◽

Author(s):

Daniel Lakens ◽

Friedrich Pahlke ◽

Gernot Wassmer

Keyword(s):

Sample Size ◽

Type I Error ◽

A Priori ◽

R Package ◽

Error Rates ◽

Type I ◽

Sequential Designs ◽

Group Sequential ◽

Shiny App ◽

Group Sequential Designs

This tutorial illustrates how to design, analyze, and report group sequential designs. In these designs, groups of observations are collected and repeatedly analyzed, while controlling error rates. Compared to a fixed sample size design, where data is analyzed only once, group sequential designs offer the possibility to stop the study at interim looks at the data either for efficacy or futility. Hence, they provide greater flexibility and are more efficient in the sense that due to early stopping the expected sample size is smaller as compared to the sample size in the design with no interim look. In this tutorial we illustrate how to use the R package 'rpact' and the associated Shiny app to design studies that control the Type I error rate when repeatedly analyzing data, even when neither the number of looks at the data, nor the exact timing of looks at the data, is specified. Specifically for *t*-tests, we illustrate how to perform an a-priori power analysis for group sequential designs, and explain how to stop the data collection for futility by rejecting the presence of an effect of interest based on a beta-spending function. Finally, we discuss how to report adjusted effect size estimates and confidence intervals. The recent availability of accessible software such as 'rpact' makes it possible for psychologists to benefit from the efficiency gains provided by group sequential designs.

Download Full-text