Tolerance Bounds and Cpk Confidence Bounds Under Batch Effects

Author(s):  
Fritz Scholz ◽  
Mark Vangel
Author(s):  
Ryan Ka Yau Lai ◽  
Youngah Do

This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Alexandra J Lee ◽  
YoSon Park ◽  
Georgia Doing ◽  
Deborah A Hogan ◽  
Casey S Greene

Abstract Motivation In the past two decades, scientists in different laboratories have assayed gene expression from millions of samples. These experiments can be combined into compendia and analyzed collectively to extract novel biological patterns. Technical variability, or "batch effects," may result from combining samples collected and processed at different times and in different settings. Such variability may distort our ability to extract true underlying biological patterns. As more integrative analysis methods arise and data collections get bigger, we must determine how technical variability affects our ability to detect desired patterns when many experiments are combined. Objective We sought to determine the extent to which an underlying signal was masked by technical variability by simulating compendia comprising data aggregated across multiple experiments. Method We developed a generative multi-layer neural network to simulate compendia of gene expression experiments from large-scale microbial and human datasets. We compared simulated compendia before and after introducing varying numbers of sources of undesired variability. Results The signal from a baseline compendium was obscured when the number of added sources of variability was small. Applying statistical correction methods rescued the underlying signal in these cases. However, as the number of sources of variability increased, it became easier to detect the original signal even without correction. In fact, statistical correction reduced our power to detect the underlying signal. Conclusion When combining a modest number of experiments, it is best to correct for experiment-specific noise. However, when many experiments are combined, statistical correction reduces our ability to extract underlying patterns.


2021 ◽  
Author(s):  
Konrad H. Stopsack ◽  
Molin Wang ◽  
Svitlana Tyekucheva ◽  
Travis A. Gerke ◽  
J. Bailey Vaselkiv ◽  
...  

Talanta ◽  
2019 ◽  
Vol 195 ◽  
pp. 77-86 ◽  
Author(s):  
Julien Boccard ◽  
David Tonoli ◽  
Petra Strajhar ◽  
Fabienne Jeanneret ◽  
Alex Odermatt ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document