Tolerance Bounds and Cpk Confidence Bounds Under Batch Effects

2020 ◽

Vol 6 (1) ◽

pp. 19-54

Author(s):

Ryan Ka Yau Lai ◽

Youngah Do

Keyword(s):

Maximum Likelihood ◽

Corpus Linguistics ◽

Delta Method ◽

Confidence Bounds ◽

Likelihood Estimator ◽

Information Theoretic ◽

Leibler Divergence ◽

Information Theoretic Measures ◽

Data Points ◽

Measure Of Uncertainty

This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.

Download Full-text

Capturing Confidence Bounds on Heat Transfer Using Large Eddy Simulations

50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference ◽

10.2514/6.2009-2286 ◽

2009 ◽

Author(s):

Paul Constantine ◽

Gianluca Iaccarino

Keyword(s):

Heat Transfer ◽

Large Eddy Simulations ◽

Confidence Bounds ◽

Large Eddy

Download Full-text

Correcting for experiment-specific variability in expression compendia can remove underlying signals

GigaScience ◽

10.1093/gigascience/giaa117 ◽

2020 ◽

Vol 9 (11) ◽

Author(s):

Alexandra J Lee ◽

YoSon Park ◽

Georgia Doing ◽

Deborah A Hogan ◽

Casey S Greene

Keyword(s):

Gene Expression ◽

Large Scale ◽

Original Signal ◽

Batch Effects ◽

Technical Variability ◽

The Past ◽

Statistical Correction ◽

Before And After ◽

Data Collections ◽

Biological Patterns

Abstract Motivation In the past two decades, scientists in different laboratories have assayed gene expression from millions of samples. These experiments can be combined into compendia and analyzed collectively to extract novel biological patterns. Technical variability, or "batch effects," may result from combining samples collected and processed at different times and in different settings. Such variability may distort our ability to extract true underlying biological patterns. As more integrative analysis methods arise and data collections get bigger, we must determine how technical variability affects our ability to detect desired patterns when many experiments are combined. Objective We sought to determine the extent to which an underlying signal was masked by technical variability by simulating compendia comprising data aggregated across multiple experiments. Method We developed a generative multi-layer neural network to simulate compendia of gene expression experiments from large-scale microbial and human datasets. We compared simulated compendia before and after introducing varying numbers of sources of undesired variability. Results The signal from a baseline compendium was obscured when the number of added sources of variability was small. Applying statistical correction methods rescued the underlying signal in these cases. However, as the number of sources of variability increased, it became easier to detect the original signal even without correction. In fact, statistical correction reduced our power to detect the underlying signal. Conclusion When combining a modest number of experiments, it is best to correct for experiment-specific noise. However, when many experiments are combined, statistical correction reduces our ability to extract underlying patterns.

Download Full-text