scholarly journals SampleSizePlanner: A Tool to Estimate and Justify Sample Size for Two-Group Studies

2021 ◽  
Author(s):  
Marton Kovacs ◽  
Don van Ravenzwaaij ◽  
Rink Hoekstra ◽  
Balazs Aczel

Planning sample size often requires researchers to identify a statistical technique and to make several choices during their calculations. Currently, there is a lack of clear guidelines for researchers to find and use the applicable procedure. In the present tutorial, we introduce a web app and R package that offer nine different procedures to determine and justify the sample size for independent two-group study designs. The application highlights the most important decision points for each procedure and suggests example justifications for them. The resulting sample size report can serve as a template for preregistrations and manuscripts.

2021 ◽  
Vol 11 (3) ◽  
pp. 234
Author(s):  
Abigail R. Basson ◽  
Fabio Cominelli ◽  
Alexander Rodriguez-Palacios

Poor study reproducibility is a concern in translational research. As a solution, it is recommended to increase sample size (N), i.e., add more subjects to experiments. The goal of this study was to examine/visualize data multimodality (data with >1 data peak/mode) as cause of study irreproducibility. To emulate the repetition of studies and random sampling of study subjects, we first used various simulation methods of random number generation based on preclinical published disease outcome data from human gut microbiota-transplantation rodent studies (e.g., intestinal inflammation and univariate/continuous). We first used unimodal distributions (one-mode, Gaussian, and binomial) to generate random numbers. We showed that increasing N does not reproducibly identify statistical differences when group comparisons are repeatedly simulated. We then used multimodal distributions (>1-modes and Markov chain Monte Carlo methods of random sampling) to simulate similar multimodal datasets A and B (t-test-p = 0.95; N = 100,000), and confirmed that increasing N does not improve the ‘reproducibility of statistical results or direction of the effects’. Data visualization with violin plots of categorical random data simulations with five-integer categories/five-groups illustrated how multimodality leads to irreproducibility. Re-analysis of data from a human clinical trial that used maltodextrin as dietary placebo illustrated multimodal responses between human groups, and after placebo consumption. In conclusion, increasing N does not necessarily ensure reproducible statistical findings across repeated simulations due to randomness and multimodality. Herein, we clarify how to quantify, visualize and address disease data multimodality in research. Data visualization could facilitate study designs focused on disease subtypes/modes to help understand person–person differences and personalized medicine.


2021 ◽  
pp. 193229682110289
Author(s):  
Evan Olawsky ◽  
Yuan Zhang ◽  
Lynn E Eberly ◽  
Erika S Helgeson ◽  
Lisa S Chow

Background: With the development of continuous glucose monitoring systems (CGMS), detailed glycemic data are now available for analysis. Yet analysis of this data-rich information can be formidable. The power of CGMS-derived data lies in its characterization of glycemic variability. In contrast, many standard glycemic measures like hemoglobin A1c (HbA1c) and self-monitored blood glucose inadequately describe glycemic variability and run the risk of bias toward overreporting hyperglycemia. Methods that adjust for this bias are often overlooked in clinical research due to difficulty of computation and lack of accessible analysis tools. Methods: In response, we have developed a new R package rGV, which calculates a suite of 16 glycemic variability metrics when provided a single individual’s CGM data. rGV is versatile and robust; it is capable of handling data of many formats from many sensor types. We also created a companion R Shiny web app that provides these glycemic variability analysis tools without prior knowledge of R coding. We analyzed the statistical reliability of all the glycemic variability metrics included in rGV and illustrate the clinical utility of rGV by analyzing CGM data from three studies. Results: In subjects without diabetes, greater glycemic variability was associated with higher HbA1c values. In patients with type 2 diabetes mellitus (T2DM), we found that high glucose is the primary driver of glycemic variability. In patients with type 1 diabetes (T1DM), we found that naltrexone use may potentially reduce glycemic variability. Conclusions: We present a new R package and accompanying web app to facilitate quick and easy computation of a suite of glycemic variability metrics.


2021 ◽  
Author(s):  
Florian Privé ◽  
Bjarni J. Vilhjálmsson ◽  
Timothy S. H. Mak

AbstractWe present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.


Endoscopy ◽  
2013 ◽  
Vol 45 (11) ◽  
pp. 922-927 ◽  
Author(s):  
Frank van den Broek ◽  
Teaco Kuiper ◽  
Evelien Dekker ◽  
Aeilko Zwinderman ◽  
Paul Fockens ◽  
...  

2014 ◽  
Vol 42 (15) ◽  
pp. e121-e121 ◽  
Author(s):  
Hari Krishna Yalamanchili ◽  
Zhaoyuan Li ◽  
Panwen Wang ◽  
Maria P. Wong ◽  
Jianfeng Yao ◽  
...  

Abstract Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet.


2019 ◽  
Author(s):  
Andrew J. Bass ◽  
John D. Storey

Analysis of biological data often involves the simultaneous testing of thousands of genes. This requires two key steps: the ranking of genes and the selection of important genes based on a significance threshold. One such testing procedure, called the "optimal discovery procedure" (ODP), leverages information across different tests to provide an optimal ranking of genes. This approach can lead to substantial improvements in statistical power compared to other methods. However, current applications of the ODP have only been established for simple study designs using microarray technology. Here we extend this work to the analysis of complex study designs and RNA sequencing studies. We then apply our extended framework to a static RNA sequencing study, a longitudinal and an independent sampling time-series study, and an independent sampling dose-response study. We find that our method shows improved performance compared to other testing procedures, finding more differentially expressed genes and increasing power for enrichment analysis. Thus the extended ODP enables a superior significance analysis of genomic studies. The algorithm is implemented in our freely available R package called edge.


Author(s):  
David Benkeser ◽  
Iván Díaz ◽  
Alex Luedtke ◽  
Jodi Segal ◽  
Daniel Scharfstein ◽  
...  

SummaryTime is of the essence in evaluating potential drugs and biologics for the treatment and prevention of COVID-19. There are currently over 400 clinical trials (phase 2 and 3) of treatments for COVID-19 registered on clinicaltrials.gov. Covariate adjustment is a statistical analysis method with potential to improve precision and reduce the required sample size for a substantial number of these trials. Though covariate adjustment is recommended by the U.S. Food and Drug Administration and the European Medicines Agency, it is underutilized, especially for the types of outcomes (binary, ordinal and time-to-event) that are common in COVID-19 trials. To demonstrate the potential value added by covariate adjustment in this context, we simulated two-arm, randomized trials comparing a hypothetical COVID-19 treatment versus standard of care, where the primary outcome is binary, ordinal, or time-to-event. Our simulated distributions are derived from two sources: longitudinal data on over 500 patients hospitalized at Weill Cornell Medicine New York Presbyterian Hospital, and a Centers for Disease Control and Prevention (CDC) preliminary description of 2449 cases. We found substantial precision gains from using covariate adjustment-equivalent to 9-21% reductions in the required sample size to achieve a desired power-for a variety of estimands (targets of inference) when the trial sample size was at least 200. We provide an R package and practical recommendations for implementing covariate adjustment. The estimators that we consider are robust to model misspecification.


Sign in / Sign up

Export Citation Format

Share Document