scholarly journals Replacement of Biased Estimators with Unbiased Ones in the Case of Student’s t-Distribution and Geary’s Kurtosis

2017 ◽  
Vol 45 (1) ◽  
pp. 23-27
Author(s):  
Gergely Tóth ◽  
Pál Szepesváry

Abstract The use of biased estimators can be found in some historically and up to now important tools in statistical data analysis. In this paper their replacement with unbiased estimators at least in the case of the estimator of the population standard deviation for normal distributions is proposed. By removing the incoherence from the Student’s t-distribution caused by the biased estimator, a corrected t-distribution may be defined. Although the quantitative results in most data analysis applications are identical for both the original and corrected tdistributions, the use of this last t-distribution is suggested because of its theoretical consistency. Moreover, the frequent qualitative discussion of the t-distribution has come under much criticism, because it concerns artefacts of the biased estimator. In the case of Geary’s kurtosis the same correction results (2/π)1/2 unbiased estimation of kurtosis for normally distributed data that is independent of the size of the sample. It is believed that by removing the sample-size-dependent biased feature, the applicability domain can be expanded to include small sample sizes for some normality tests.

2014 ◽  
Vol 11 (Suppl 1) ◽  
pp. S2 ◽  
Author(s):  
Joanna Zyla ◽  
Paul Finnon ◽  
Robert Bulman ◽  
Simon Bouffler ◽  
Christophe Badie ◽  
...  

2001 ◽  
Vol 09 (02) ◽  
pp. 105-121 ◽  
Author(s):  
ANIKO SZABO ◽  
ANDREI YAKOVLEV

In this paper we discuss some natural limitations in quantitative inference about the frequency, correlation and ordering of genetic events occurring in the course of tumor development. We consider a simple, yet frequently used experimental design, under which independent tumors are examined once for the presence/absence of specific mutations of interest. The most typical factors that affect the inference on the chronological order of genetic events are: a possible dependence of mutation rates, the sampling bias that arises from the observation process and small sample sizes. Our results clearly indicate that just these three factors alone may dramatically distort the outcome of data analysis, thereby leading to estimates of limited utility as an underpinning for mechanistic models of carcinogenesis.


Recycling ◽  
2020 ◽  
Vol 5 (3) ◽  
pp. 19
Author(s):  
Paul Martin Mählitz ◽  
Nathalie Korf ◽  
Kristine Sperlich ◽  
Olivier Münch ◽  
Matthias Rösslein ◽  
...  

Comprehensive knowledge of built-in batteries in waste electrical and electronic equipment (WEEE) is required for sound and save WEEE management. However, representative sampling is challenging due to the constantly changing composition of WEEE flows and battery systems. Necessary knowledge, such as methodologically uniform procedures and recommendations for the determination of minimum sample sizes (MSS) for representative results, is missing. The direct consequences are increased sampling efforts, lack of quality-assured data, gaps in the monitoring of battery losses in complementary flows, and impeded quality control of depollution during WEEE treatment. In this study, we provide detailed data sets on built-in batteries in WEEE and propose a non-parametric approach (NPA) to determine MSS. For the pilot dataset, more than 23 Mg WEEE (6500 devices) were sampled, examined for built-in batteries, and classified according to product-specific keys (UNUkeys and BATTkeys). The results show that 21% of the devices had battery compartments, distributed over almost all UNUkeys considered and that only about every third battery was removed prior to treatment. Moreover, the characterization of battery masses (BM) and battery mass shares (BMS) using descriptive statistical analysis showed that neither product- nor battery-specific characteristics are given and that the assumption of (log-)normally distributed data is not generally applicable. Consequently, parametric approaches (PA) to determine the MSS for representative sampling are prone to be biased. The presented NPA for MSS using data-driven simulation (bootstrapping) shows its applicability despite small sample sizes and inconclusive data distribution. If consistently applied, the method presented can be used to optimize future sampling and thus reduce sampling costs and efforts while increasing data quality.


Author(s):  
Shiqi Cui ◽  
Tieming Ji ◽  
Jilong Li ◽  
Jianlin Cheng ◽  
Jing Qiu

AbstractIdentifying differentially expressed (DE) genes between different conditions is one of the main goals of RNA-seq data analysis. Although a large amount of RNA-seq data were produced for two-group comparison with small sample sizes at early stage, more and more RNA-seq data are being produced in the setting of complex experimental designs such as split-plot designs and repeated measure designs. Data arising from such experiments are traditionally analyzed by mixed-effects models. Therefore an appropriate statistical approach for analyzing RNA-seq data from such designs should be generalized linear mixed models (GLMM) or similar approaches that allow for random effects. However, common practices for analyzing such data in literature either treat random effects as fixed or completely ignore the experimental design and focus on two-group comparison using partial data. In this paper, we examine the effect of ignoring the random effects when analyzing RNA-seq data. We accomplish this goal by comparing the standard GLMM model to the methods that ignore the random effects through simulation studies and real data analysis. Our studies show that, ignoring random effects in a multi-factor experiment can lead to the increase of the false positives among the top selected genes or lower power when the nominal FDR level is controlled.


2019 ◽  
Author(s):  
Dustin Fife

Data analysis is a risky endeavor, particularly among those unaware of its dangers. In the words of Cook and Campbell (1976; see also Cook, Campbell, and Shadish 2002), “Statistical Conclusions Validity” threatens all experiments that subject themselves to the dark arts of statistical magic. Although traditional statistics classes may advise against certain practices (e.g., multiple comparisons, small sample sizes, violating normality), they may fail to cover others (e.g., outlier detection and violating linearity). More common, perhaps, is that researchers may fail to remember them. In this paper, rather than rehashing old warnings and diatribes against this practice or that, I instead advocate a general statistical analysis strategy. This graphically-based eight step strategy promises to resolve the majority of statistical traps researchers may fall in without having to remember large lists of problematic statistical practices. These steps will assist in preventing both Type I and Type II errors and yield critical insights about the data that would have otherwise been missed. I conclude with an applied example that shows how the eight steps highlight data problems that would not be detected with standard statistical practices.


2019 ◽  
Vol 35 (20) ◽  
pp. 3996-4003
Author(s):  
Insha Ullah ◽  
Sudhir Paul ◽  
Zhenjie Hong ◽  
You-Gan Wang

Abstract Motivation Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch’s approximate test is most reliable one. The Welch’s test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch’s approximate test by avoiding one layer of approximation. Results We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and performs better than Welch’s t-approximation, especially when the equal variance assumption is not met and the sample size of the sample with a larger variance is smaller. We analyzed two gene-expression datasets, namely the childhood acute lymphoblastic leukemia gene-expression dataset with 22 283 genes and Golden Spike dataset produced by a controlled experiment with 13 966 genes. The new test identified additional genes of interest in both datasets. Some of these genes have been proven to play important roles in medical literature. Availability and implementation R scripts and the R package mcBFtest is available in CRAN and to reproduce all reported results are available at the GitHub repository, https://github.com/iullah1980/MCTcodes. Supplementary information Supplementary data is available at Bioinformatics online.


2020 ◽  
Vol 15 (4) ◽  
pp. 1054-1075
Author(s):  
Dustin Fife

Data analysis is a risky endeavor, particularly among people who are unaware of its dangers. According to some researchers, “statistical conclusions validity” threatens all research subjected to the dark arts of statistical magic. Although traditional statistics classes may advise against certain practices (e.g., multiple comparisons, small sample sizes, violating normality), they may fail to cover others (e.g., outlier detection and violating linearity). More common, perhaps, is that researchers may fail to remember them. In this article, rather than rehashing old warnings and diatribes against this practice or that, I instead advocate a general statistical-analysis strategy. This graphic-based eight-step strategy promises to resolve the majority of statistical traps researchers may fall into—without having to remember large lists of problematic statistical practices. These steps will assist in preventing both false positives and false negatives and yield critical insights about the data that would have otherwise been missed. I conclude with an applied example that shows how the eight steps reveal interesting insights that would not be detected with standard statistical practices.


1988 ◽  
Vol 15 (5) ◽  
pp. 515 ◽  
Author(s):  
DJ Slip ◽  
R Shine

Miniature radio transmitters were surgically implanted in 15 adult diamond pythons from two areas near Sydney, N.S.W., in south-eastern Australia, and the snakes monitored for intervals of 4-32 months. We document patterns of habitat use and movements, and interpret these in terms of the feeding habits and reproductive biology of the pythons. These snakes were usually sedentary in summer and autumn, with occasional long movements to new sites. During spring (the mating season), males moved long distances, often daily. Telemetered pythons were generally diurnal and terrestrial rather than arboreal. Snakes were most commonly recorded coiled under vegetation which provided filtering cover (34% of locations). The relative use of different habitats by diamond pythons changed with season. In summer and autumn, snakes were most frequently in disturbed habitats (such as areas around houses), where prey are relatively common. In winter the snakes used rocky habitats, especially sandstone crevices. No winter aggregations were observed. The radio-tracked snakes had large (up to 124 ha), well-defined but overlapping home ranges, and these varied significantly between sexes and among seasons. Detailed analysis of python movements shows that at least two assumptions of many home-range analyses (normally distributed data and adequacy of small sample sizes) are invalid for our study.


2017 ◽  
Author(s):  
Colleen Molloy Farrelly

Studies of highly and profoundly gifted children typically involve small sample sizes, as the population is relatively rare, and many statistical methods cannot handle these small sample sizes well. However, topological data analysis (TDA) tools are robust, even with very small samples, and can provide useful information as well as robust statistical tests.This study demonstrates these capabilities on data simulated from previous talent search results (small and large samples), as well as a subset of data from Ruf’s cohort of gifted children. TDA methods show strong, robust performance and uncover insight into sample characteristics and subgroups, including the appearance of similar subgroups across assessment populations.


Sign in / Sign up

Export Citation Format

Share Document