empirical bayes approach
Recently Published Documents


TOTAL DOCUMENTS

172
(FIVE YEARS 19)

H-INDEX

22
(FIVE YEARS 3)

Author(s):  
Qing Xia ◽  
Jeffrey A. Thompson ◽  
Devin C. Koestler

Abstract Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinG Empirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called “bridge samples”, to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinal ComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.


2021 ◽  
Author(s):  
Sara Venkatraman ◽  
Sumanta Basu ◽  
Andrew G. Clark ◽  
Sofie Y.N. Delbare ◽  
Myung Hee Lee ◽  
...  

Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of such datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive similarity metrics that can be used to identify groups of genes with co-moving or time-delayed expression patterns. These metrics, which we call the Bayesian lead-lag R2 values, can be used to construct clusters or networks of functionally-related genes. A key feature of this method is that it leverages biological databases that document known interactions between genes. This information is automatically used to define informative prior distributions on the ODE model's parameters. We then derive data-driven shrinkage parameters from Stein's unbiased risk estimate that optimally balance the ODE model's fit to both the data and external biological information. Using real gene expression data, we demonstrate that our biologically-informed similarity metrics allow us to recover sparse, interpretable gene networks. These networks reveal new insights about the dynamics of biological systems.


2021 ◽  
Vol 31 (4) ◽  
Author(s):  
Vojtech Kejzlar ◽  
Mookyong Son ◽  
Shrijita Bhattacharya ◽  
Tapabrata Maiti

2021 ◽  
Vol 20 (1) ◽  
pp. 1-15
Author(s):  
Qi Zhang ◽  
Zheng Xu ◽  
Yutong Lai

Abstract Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).


Sign in / Sign up

Export Citation Format

Share Document