PREFERRED SEQUENCES OF GENETIC EVENTS IN CARCINOGENESIS: QUANTITATIVE ASPECTS OF THE PROBLEM

2001 ◽  
Vol 09 (02) ◽  
pp. 105-121 ◽  
Author(s):  
ANIKO SZABO ◽  
ANDREI YAKOVLEV

In this paper we discuss some natural limitations in quantitative inference about the frequency, correlation and ordering of genetic events occurring in the course of tumor development. We consider a simple, yet frequently used experimental design, under which independent tumors are examined once for the presence/absence of specific mutations of interest. The most typical factors that affect the inference on the chronological order of genetic events are: a possible dependence of mutation rates, the sampling bias that arises from the observation process and small sample sizes. Our results clearly indicate that just these three factors alone may dramatically distort the outcome of data analysis, thereby leading to estimates of limited utility as an underpinning for mechanistic models of carcinogenesis.

The Auk ◽  
1985 ◽  
Vol 102 (3) ◽  
pp. 493-499 ◽  
Author(s):  
David W. Bradley

Abstract Four time-budget estimation strategies are compared with respect to their sensitivity to two components of visibility bias in the observation process: discovery bias and loss bias. Monte Carlo simulations and a brief field study both indicate that visibility bias (particularly discovery bias) can substantially affect the results of time-budget studies. Estimators designed to curtail these biases performed best. Counting only initial contacts was least satisfactory. Bootstrap confidence intervals for niche overlap from the field study were so broad that overlap estimates seem nearly useless with very small sample sizes, such as the 93 observation series with 1,065 data points obtained here. Investigators who measure time or energy budgets in the field should take care to minimize sample biases, obtain adequate sample sizes, select analysis techniques appropriate for their sampling scheme, and confine inference to a scope compatible with the temporal and spatial scale of their study.


2014 ◽  
Vol 11 (Suppl 1) ◽  
pp. S2 ◽  
Author(s):  
Joanna Zyla ◽  
Paul Finnon ◽  
Robert Bulman ◽  
Simon Bouffler ◽  
Christophe Badie ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yasir Suhail ◽  
Junaid Afzal ◽  
Kshitiz

Abstract Background The disease burden of SARS-CoV-2 as measured by tests from various localities, and at different time points present varying estimates of infection and fatality rates. Models based on these acquired data may suffer from systematic errors and large estimation variances due to the biases associated with testing. An unbiased randomized testing to estimate the true fatality rate is still missing. Methods Here, we characterize the effect of incidental sampling bias in the estimation of epidemic dynamics. Towards this, we explicitly modeled for sampling bias in an augmented compartment model to predict epidemic dynamics. We further calculate the bias from differences in disease prediction from biased, and randomized sampling, proposing a strategy to obtain unbiased estimates. Results Our simulations demonstrate that sampling biases in favor of patients with higher disease manifestation could significantly affect direct estimates of infection and fatality rates calculated from the numbers of confirmed cases and deaths, and serological testing can partially mitigate these biased estimates. Conclusions The augmented compartmental model allows the explicit modeling of different testing policies and their effects on disease estimates. Our calculations for the dependence of expected confidence on a randomized sample sizes, show that relatively small sample sizes can provide statistically significant estimates for SARS-CoV-2 related death rates.


Author(s):  
Shiqi Cui ◽  
Tieming Ji ◽  
Jilong Li ◽  
Jianlin Cheng ◽  
Jing Qiu

AbstractIdentifying differentially expressed (DE) genes between different conditions is one of the main goals of RNA-seq data analysis. Although a large amount of RNA-seq data were produced for two-group comparison with small sample sizes at early stage, more and more RNA-seq data are being produced in the setting of complex experimental designs such as split-plot designs and repeated measure designs. Data arising from such experiments are traditionally analyzed by mixed-effects models. Therefore an appropriate statistical approach for analyzing RNA-seq data from such designs should be generalized linear mixed models (GLMM) or similar approaches that allow for random effects. However, common practices for analyzing such data in literature either treat random effects as fixed or completely ignore the experimental design and focus on two-group comparison using partial data. In this paper, we examine the effect of ignoring the random effects when analyzing RNA-seq data. We accomplish this goal by comparing the standard GLMM model to the methods that ignore the random effects through simulation studies and real data analysis. Our studies show that, ignoring random effects in a multi-factor experiment can lead to the increase of the false positives among the top selected genes or lower power when the nominal FDR level is controlled.


2019 ◽  
Author(s):  
Dustin Fife

Data analysis is a risky endeavor, particularly among those unaware of its dangers. In the words of Cook and Campbell (1976; see also Cook, Campbell, and Shadish 2002), “Statistical Conclusions Validity” threatens all experiments that subject themselves to the dark arts of statistical magic. Although traditional statistics classes may advise against certain practices (e.g., multiple comparisons, small sample sizes, violating normality), they may fail to cover others (e.g., outlier detection and violating linearity). More common, perhaps, is that researchers may fail to remember them. In this paper, rather than rehashing old warnings and diatribes against this practice or that, I instead advocate a general statistical analysis strategy. This graphically-based eight step strategy promises to resolve the majority of statistical traps researchers may fall in without having to remember large lists of problematic statistical practices. These steps will assist in preventing both Type I and Type II errors and yield critical insights about the data that would have otherwise been missed. I conclude with an applied example that shows how the eight steps highlight data problems that would not be detected with standard statistical practices.


2017 ◽  
Vol 45 (1) ◽  
pp. 23-27
Author(s):  
Gergely Tóth ◽  
Pál Szepesváry

Abstract The use of biased estimators can be found in some historically and up to now important tools in statistical data analysis. In this paper their replacement with unbiased estimators at least in the case of the estimator of the population standard deviation for normal distributions is proposed. By removing the incoherence from the Student’s t-distribution caused by the biased estimator, a corrected t-distribution may be defined. Although the quantitative results in most data analysis applications are identical for both the original and corrected tdistributions, the use of this last t-distribution is suggested because of its theoretical consistency. Moreover, the frequent qualitative discussion of the t-distribution has come under much criticism, because it concerns artefacts of the biased estimator. In the case of Geary’s kurtosis the same correction results (2/π)1/2 unbiased estimation of kurtosis for normally distributed data that is independent of the size of the sample. It is believed that by removing the sample-size-dependent biased feature, the applicability domain can be expanded to include small sample sizes for some normality tests.


2020 ◽  
Vol 15 (4) ◽  
pp. 1054-1075
Author(s):  
Dustin Fife

Data analysis is a risky endeavor, particularly among people who are unaware of its dangers. According to some researchers, “statistical conclusions validity” threatens all research subjected to the dark arts of statistical magic. Although traditional statistics classes may advise against certain practices (e.g., multiple comparisons, small sample sizes, violating normality), they may fail to cover others (e.g., outlier detection and violating linearity). More common, perhaps, is that researchers may fail to remember them. In this article, rather than rehashing old warnings and diatribes against this practice or that, I instead advocate a general statistical-analysis strategy. This graphic-based eight-step strategy promises to resolve the majority of statistical traps researchers may fall into—without having to remember large lists of problematic statistical practices. These steps will assist in preventing both false positives and false negatives and yield critical insights about the data that would have otherwise been missed. I conclude with an applied example that shows how the eight steps reveal interesting insights that would not be detected with standard statistical practices.


2017 ◽  
Author(s):  
Colleen Molloy Farrelly

Studies of highly and profoundly gifted children typically involve small sample sizes, as the population is relatively rare, and many statistical methods cannot handle these small sample sizes well. However, topological data analysis (TDA) tools are robust, even with very small samples, and can provide useful information as well as robust statistical tests.This study demonstrates these capabilities on data simulated from previous talent search results (small and large samples), as well as a subset of data from Ruf’s cohort of gifted children. TDA methods show strong, robust performance and uncover insight into sample characteristics and subgroups, including the appearance of similar subgroups across assessment populations.


2020 ◽  
Author(s):  
Yasir Suhail ◽  
Junaid Afzal ◽  
Kshitiz

ABSTRACTThe disease burden of SARS-CoV-2 as measured by tests from various countries present varying estimates of infection and fatality rates. Models based on these acquired data may suffer from systematic errors and large estimation variances due to the biases associated with testing and lags between the infection and death counts. Here, we present an augmented compartment model to predict epidemic dynamics while explicitly modeling for the sampling bias involved in testing. Our simulations show that sampling biases in favor of patients with higher disease manifestation could significantly affect direct estimates of infection and fatality rates calculated from the numbers of confirmed cases and deaths, and serological testing can partially mitigate these biased estimates. We further recommend a strategy to obtain unbiased estimates, calculating the dependence of expected confidence on a randomized sample size, showing that relatively small sample sizes can provide statistically significant estimates for SARS-CoV-2 related death rates.


Sign in / Sign up

Export Citation Format

Share Document