scholarly journals Sample Size Growth with an Increasing Number of Comparisons

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Chi-Hong Tseng ◽  
Yongzhao Shao

An appropriate sample size is crucial for the success of many studies that involve a large number of comparisons. Sample size formulas for testing multiple hypotheses are provided in this paper. They can be used to determine the sample sizes required to provide adequate power while controlling familywise error rate or false discovery rate, to derive the growth rate of sample size with respect to an increasing number of comparisons or decrease in effect size, and to assess reliability of study designs. It is demonstrated that practical sample sizes can often be achieved even when adjustments for a large number of comparisons are made as in many genomewide studies.

Biometrika ◽  
2020 ◽  
Vol 107 (3) ◽  
pp. 761-768 ◽  
Author(s):  
E Dobriban

Summary Multiple hypothesis testing problems arise naturally in science. This note introduces a new fast closed testing method for multiple testing which controls the familywise error rate. Controlling the familywise error rate is state-of-the-art in many important application areas and is preferred over false discovery rate control for many reasons, including that it leads to stronger reproducibility. The closure principle rejects an individual hypothesis if all global nulls of subsets containing it are rejected using some test statistics. It takes exponential time in the worst case. When the tests are symmetric and monotone, the proposed method is an exact algorithm for computing the closure, is quadratic in the number of tests, and is linear in the number of discoveries. Our framework generalizes most examples of closed testing, such as Holm’s method and the Bonferroni method. As a special case of the method, we propose the Simes and higher criticism fusion test, which is powerful both for detecting a few strong signals and for detecting many moderate signals.


Author(s):  
Michael J. Adjabui ◽  
Jakperik Dioggban ◽  
Irene D. Angbing

We propose a stepwise confidence procedure for identifying minimum effective dose (MED) without multiplicity adjustment.Stepwise procedures strongly control the familywise error rate (FWER) which is a critical requirement for statistical methodologies in identification of MED. The partitioning principle is invoked to validate the control of the FWER. Our simulation study indicates that the FWER was properly controlled in the case with balanced design but failed in some cases of sample sizes for situations of unbalanced design. In addition, the power of the procedure increases with increasing mean of ratio differences and the sample sizes.


Circulation ◽  
2014 ◽  
Vol 130 (suppl_2) ◽  
Author(s):  
Sonia Jain ◽  
Sushil A Luis ◽  
Vivien Coelho ◽  
Limor Ilan Bushari ◽  
Hilma Holm ◽  
...  

Background: It is generally perceived that sample sizes of randomized clinical trials (RCTs) have increased over the years, particularly in specialties such as cardiology, with a robust evidence base. The aim of this study was to analyze temporal trends in sample sizes of RCTs in cardiology journals compared to other specialties. Methods: Abstracts of RCTs involving humans from PubMed for 1970-2013 were analyzed using a digital search algorithm. Sample sizes of studies were extracted from each abstract. Date of publication and journal name were collected. Journals from several medical subspecialties were selected for comparison, using the journal impact factor as a measure of clinical relevance. Sample sizes of studies in 1990 were compared to 2010 for each of the journals using the Mann-Whitney U test. Graphical comparisons of sample size trends are presented. Results: 272,054 abstracts of human RCTs were identified. Median sample sizes for the years 1990 and 2010 is shown in table 1. The median sample size for all RCTs published in Circulation was 99 subjects per study in 1990, increasing to 630 subjects per study in 2010 (p < 0.01). All cardiology journals had a significant increase in study sample size over the 20 year period, as did the multispecialty journals (JAMA, NEJM, Lancet). In contrast, only a few non-cardiology specialty journals published studies with increasing sample sizes (table 1). Figure 1 shows the sample size trend for 1970-2013. Conclusions: Our study demonstrates a dramatic temporal trend of increasing sample sizes in RCTs in cardiology compared to other specialties. Since sample size is estimated based on the effect size studied, one explanation for this observation is that the more obvious larger effects have been previously elucidated, leaving only smaller associations to be studied. This requires increasing resources, highlighting the importance of alternate study designs and collaborative registries to develop a cost effective evidence base.


2012 ◽  
Vol 2012 ◽  
pp. 1-13 ◽  
Author(s):  
Shulian Shang ◽  
Qianhe Zhou ◽  
Mengling Liu ◽  
Yongzhao Shao

The false discovery proportion (FDP), the proportion of incorrect rejections among all rejections, is a direct measure of abundance of false positive findings in multiple testing. Many methods have been proposed to control FDP, but they are too conservative to be useful for power analysis. Study designs for controlling the mean of FDP, which is false discovery rate, have been commonly used. However, there has been little attempt to design study with direct FDP control to achieve certain level of efficiency. We provide a sample size calculation method using the variance formula of the FDP under weak-dependence assumptions to achieve the desired overall power. The relationship between design parameters and sample size is explored. The adequacy of the procedure is assessed by simulation. We illustrate the method using estimated correlations from a prostate cancer dataset.


2021 ◽  
Vol 11 (3) ◽  
pp. 234
Author(s):  
Abigail R. Basson ◽  
Fabio Cominelli ◽  
Alexander Rodriguez-Palacios

Poor study reproducibility is a concern in translational research. As a solution, it is recommended to increase sample size (N), i.e., add more subjects to experiments. The goal of this study was to examine/visualize data multimodality (data with >1 data peak/mode) as cause of study irreproducibility. To emulate the repetition of studies and random sampling of study subjects, we first used various simulation methods of random number generation based on preclinical published disease outcome data from human gut microbiota-transplantation rodent studies (e.g., intestinal inflammation and univariate/continuous). We first used unimodal distributions (one-mode, Gaussian, and binomial) to generate random numbers. We showed that increasing N does not reproducibly identify statistical differences when group comparisons are repeatedly simulated. We then used multimodal distributions (>1-modes and Markov chain Monte Carlo methods of random sampling) to simulate similar multimodal datasets A and B (t-test-p = 0.95; N = 100,000), and confirmed that increasing N does not improve the ‘reproducibility of statistical results or direction of the effects’. Data visualization with violin plots of categorical random data simulations with five-integer categories/five-groups illustrated how multimodality leads to irreproducibility. Re-analysis of data from a human clinical trial that used maltodextrin as dietary placebo illustrated multimodal responses between human groups, and after placebo consumption. In conclusion, increasing N does not necessarily ensure reproducible statistical findings across repeated simulations due to randomness and multimodality. Herein, we clarify how to quantify, visualize and address disease data multimodality in research. Data visualization could facilitate study designs focused on disease subtypes/modes to help understand person–person differences and personalized medicine.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


Sign in / Sign up

Export Citation Format

Share Document