scholarly journals A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data

Author(s):  
Zhixiang Lin ◽  
Mingfeng Li ◽  
Nenad Sestan ◽  
Hongyu Zhao

AbstractThe statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered. We propose a Markov random field (MRF)-based approach to capitalizing on the between layers similarity, temporal dependency and the similarity between sex. The model parameters are estimated by an efficient EM algorithm with mean field-like approximation. Simulation results and real data analysis suggest that the proposed model improves the power to detect differentially expressed genes than simple marginal analysis. Our method also reveals biologically interesting results in the mouse brain RNA-Seq data set.

2020 ◽  
Author(s):  
Hongyu Li ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
Hongyu Zhao

AbstractMotivationRecent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes.ResultsIn this article, we propose to borrow information through known biological networks. Our approach is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cells to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.AvailabilityThe algorithm is implemented in R. The source code can be downloaded at https://github.com/eddiehli/[email protected] informationSupplementary data are available online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hongyu Li ◽  
Biqing Zhu ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
...  

Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.


2014 ◽  
Vol 696 ◽  
pp. 114-118 ◽  
Author(s):  
Wen Long Yin ◽  
Hong Song Li ◽  
Hao Ran Zhang ◽  
Shu Ting Zhao

Some diseases, particularly cardiovascular disease, will change the shape and structure of retinal vessels. Observation and detection of retinal vessels play an important role in the diagnosis of diseases. Traditional diagnosis of the retinal vessels that ophthalmologist perform under artificial visual attending. Image segmentation based on Markov random field is a method based on statistical theory, which takes into account the correlation between the local pixels, uses the prior knowledge effectively, has fewer model parameters and is easy to be combined with other methods etc., so this method has been widely researched and applied in the field of image segmentation. This paper which mainly studied the Markov random field is how to specific apply to image segmentation, and the iterated conditional mode and the traditional segmentation (clustering) algorithm segmented and compared in the medical retinal vessel image. The method of MRF can effectively restrain the noise in the vessel segmentation.


2015 ◽  
Author(s):  
Hugo Varet ◽  
Jean-Yves Coppée ◽  
Marie-Agnès Dillies

Background Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods. Results SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used. Conclusions SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.


2011 ◽  
Vol 403-408 ◽  
pp. 3438-3445
Author(s):  
Sucheta Panda ◽  
P.K. Nanda

In this paper, an unsupervised color image segmentation scheme has been proposed for preserving strong and weak edges as well. A Constrained Compound Markov Random Field (MRF) has been proposed as the a priori model for the color labels. We have used Ohta (I1, I2, I3) color model and a controlled correlation of the color space has been accomplished by the proposed compound MRF model. The Constrained Compound MRF (CCMRF) is found to possess the unifying property of modeling scenes as well as color textures. In unsupervised scheme, the associated model parameters and the image labels are estimated recursively. The model parameters are the Maximum Conditional Pseudo Likelihood (MCPL) estimates and the labels are the Maximum a Posteriori (MAP) estimates. The performance of the proposed scheme has been compared with that of Yu’s method and has been found to exhibit improved performance in the context of misclassification error.


2010 ◽  
Vol 32 (8) ◽  
pp. 1392-1405 ◽  
Author(s):  
Victor Lempitsky ◽  
Carsten Rother ◽  
Stefan Roth ◽  
Andrew Blake

Sign in / Sign up

Export Citation Format

Share Document