An Empirical Bayes Approach to Partially Labeled and Shuffled Data Sets

Abstract Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinG Empirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called “bridge samples”, to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinal ComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.

Download Full-text

An Empirical Bayes Approach to Topic Modeling

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412837 ◽

2021 ◽

Author(s):

Anirban Gangopadhyay

Keyword(s):

Topic Modeling ◽

Empirical Bayes ◽

Empirical Bayes Approach ◽

Bayes Approach

Download Full-text

An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2020-0026 ◽

2021 ◽

Vol 20 (1) ◽

pp. 1-15

Author(s):

Qi Zhang ◽

Zheng Xu ◽

Yutong Lai

Keyword(s):

Long Range ◽

Latent Variables ◽

Empirical Bayes ◽

Genome Structure ◽

Peak Detection ◽

Data Matrix ◽

Dispersion Modeling ◽

Empirical Bayes Approach ◽

Over Dispersion ◽

Bayes Approach

Abstract Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).

Download Full-text