scholarly journals Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE)

Author(s):  
Qing Xia ◽  
Jeffrey A. Thompson ◽  
Devin C. Koestler

Abstract Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinG Empirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called “bridge samples”, to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinal ComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.

2020 ◽  
Vol 80 (6) ◽  
pp. 1090-1114
Author(s):  
Xinya Liang ◽  
Akihito Kamata ◽  
Ji Li

One important issue in Bayesian estimation is the determination of an effective informative prior. In hierarchical Bayes models, the uncertainty of hyperparameters in a prior can be further modeled via their own priors, namely, hyper priors. This study introduces a framework to construct hyper priors for both the mean and the variance hyperparameters for estimating the treatment effect in a two-group randomized controlled trial. Assuming a random sample of treatment effect sizes is obtained from past studies, the hyper priors can be constructed based on the sampling distributions of the effect size mean and precision. The performance of the hierarchical Bayes approach was compared with the empirical Bayes approach (hyperparameters are fixed values or point estimates) and the ordinary least squares (OLS) method via simulation. The design factors for data generation included the sample treatment effect size, treatment/control group size ratio, and sample size. Each generated data set was analyzed using the hierarchical Bayes approach with three hyper priors, the empirical Bayes approach with twelve priors (including correct and inaccurate priors), and the OLS method. Results indicated that the proposed hierarchical Bayes approach generally outperformed the empirical Bayes approach and the OLS method, especially with small samples. When more sample effect sizes were available, the treatment effect was estimated more accurately regardless of the sample sizes. Practical implications and future research directions are discussed.


2021 ◽  
Vol 20 (1) ◽  
pp. 1-15
Author(s):  
Qi Zhang ◽  
Zheng Xu ◽  
Yutong Lai

Abstract Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).


1995 ◽  
Vol 6 (1) ◽  
pp. 1-5 ◽  
Author(s):  
Andrew R. Solow ◽  
Arthur G. Gaines

Sign in / Sign up

Export Citation Format

Share Document