scholarly journals A Markov Random Field Model for Network-based Differential Expression Analysis of Single-cell RNA-seq Data

2020 ◽  
Author(s):  
Hongyu Li ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
Hongyu Zhao

AbstractMotivationRecent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes.ResultsIn this article, we propose to borrow information through known biological networks. Our approach is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cells to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.AvailabilityThe algorithm is implemented in R. The source code can be downloaded at https://github.com/eddiehli/[email protected] informationSupplementary data are available online.

2020 ◽  
Author(s):  
Hongyu Li ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
Hongyu Zhao

Abstract Background: Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes. In this article, we propose to borrow information through known biological networks. Results: We develop MRFscRNAseq, which is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.Conclusions: The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method provides differential expression analysis for scRNA-seq data with increased statistical power.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hongyu Li ◽  
Biqing Zhu ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
...  

Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.


Author(s):  
Zhixiang Lin ◽  
Mingfeng Li ◽  
Nenad Sestan ◽  
Hongyu Zhao

AbstractThe statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered. We propose a Markov random field (MRF)-based approach to capitalizing on the between layers similarity, temporal dependency and the similarity between sex. The model parameters are estimated by an efficient EM algorithm with mean field-like approximation. Simulation results and real data analysis suggest that the proposed model improves the power to detect differentially expressed genes than simple marginal analysis. Our method also reveals biologically interesting results in the mouse brain RNA-Seq data set.


2018 ◽  
Author(s):  
Wennan Chang ◽  
Changlin Wan ◽  
Xiaoyu Lu ◽  
Szu-wei Tu ◽  
Yifan Sun ◽  
...  

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.


2021 ◽  
pp. 002203452110497
Author(s):  
Y. Chiba ◽  
K. Yoshizaki ◽  
T. Tian ◽  
K. Miyazaki ◽  
D. Martin ◽  
...  

Organ development is dictated by the regulation of genes preferentially expressed in tissues or cell types. Gene expression profiling and identification of specific genes in organs can provide insights into organogenesis. Therefore, genome-wide analysis is a powerful tool for clarifying the mechanisms of development during organogenesis as well as tooth development. Single-cell RNA sequencing (scRNA-seq) is a suitable tool for unraveling the gene expression profile of dental cells. Using scRNA-seq, we can obtain a large pool of information on gene expression; however, identification of functional genes, which are key molecules for tooth development, via this approach remains challenging. In the present study, we performed cap analysis of gene expression sequence (CAGE-seq) using mouse tooth germ to identify the genes preferentially expressed in teeth. The CAGE-seq counts short reads at the 5′-end of transcripts; therefore, this method can quantify the amount of transcripts without bias related to the transcript length. We hypothesized that this CAGE data set would be of great help for further understanding a gene expression profile through scRNA-seq. We aimed to identify the important genes involved in tooth development via bioinformatics analyses, using a combination of scRNA-seq and CAGE-seq. We obtained the scRNA-seq data set of 12,212 cells from postnatal day 1 mouse molars and the CAGE-seq data set from postnatal day 1 molars. scRNA-seq analysis revealed the spatiotemporal expression of cell type–specific genes, and CAGE-seq helped determine whether these genes are preferentially expressed in tooth or ubiquitously. Furthermore, we identified candidate genes as novel tooth-enriched and dental cell type–specific markers. Our results show that the integration of scRNA-seq and CAGE-seq highlights the genes important for tooth development among numerous gene expression profiles. These findings should contribute to resolving the mechanism of tooth development and establishing the basis for tooth regeneration in the future.


2019 ◽  
Vol 18 (2) ◽  
pp. 181-197 ◽  
Author(s):  
Xiaofeng Xu ◽  
Yue Guan ◽  
Hui Gong ◽  
Zhao Feng ◽  
Wenjuan Shi ◽  
...  

2019 ◽  
Author(s):  
Nigatu A. Adossa ◽  
Leif Schauser ◽  
Vivi G. Gregersen ◽  
Laura L. Elo

AbstractBackgroundRecent advances in single-cell gene expression profiling technology have revolutionized the understanding of molecular processes underlying developmental cell and tissue differentiation, enabling the discovery of novel cell-types and molecular markers that characterize developmental trajectories. Common approaches for identifying marker genes are based on pairwise statistical testing for differential gene expression between cell-types in heterogeneous cell populations, which is challenging due to unequal sample sizes and variance between groups resulting in little statistical power and inflated type I errors.ResultsWe developed an alternative feature extraction method, Marker gene Identification for Cell-type Identity (MICTI) that encodes the cell-type specific expression information to each gene in every single-cell. This approach identifies features (genes) that are cell-type specific for a given cell-type in heterogeneous cell population. To validate this approach, we used (i) simulated single cell RNA-seq data, (ii) human pancreatic islet single-cell RNA-seq data and (iii) a simulated mixture of human single-cell RNA-seq data related to immune cells, particularly B cells, CD4+ memory cells, CD8+ memory cells, dendritic cells, fibroblast cells, and lymphoblast cells. For all cases, we were able to identify established cell-type-specific markers.ConclusionsOur approach represents a highly efficient and fast method as an alternative to differential expression analysis for molecular marker identification in heterogeneous single-cell RNA-seq data.


2020 ◽  
Author(s):  
Nithin Joshy ◽  
Kyuson Yun

AbstractMotivationSingle-cell RNA sequencing (scRNA-seq) is a recent technology that has provided many valuable biological insights. Notable uses include identifying novel cell-types, measuring the cellular response to treatment, and tracking trajectories of distinct cell lineages in time. The raw data generated in this process typically amounts to hundreds of millions of sequencing reads and requires substantial computational infrastructure for downstream analysis, a major hurdle for a biological research lab. Fortunately, the preprocessing step that converts this huge sequence data into manageable cell-specific expression profiles is standardized and can be performed in the cloud. We demonstrate how a cloud-based computational framework can be used to transform the raw data into biologically interpretable cell-type-specific information, using either 3’ or 5’ transcriptome libraries from 10x Genomics. The processed data which is an order of magnitude smaller in size can be easily downloaded to a laptop for customized analysis to gain deeper biological insights.ResultsWe produced an automated and easily extensible pipeline in the cloud for the analysis of single-cell RNA-seq data which provides a convenient method to handle post-processing of scRNA sequencing using next generation sequencing platforms. The basic step provides the transformation of the scRNA-seq data to cell-type-specific expression profiles and computes the quality control metrics for the dataset. The extensibility of the platform is demonstrated by adding a doublet-removal algorithm and recomputing the clustering of the cells. Any additional computational steps that take a cell-type expression counts matrix as input can be easily added to this framework with minimal effort.AvailabilityThe framework and its documentation for installation is available at the Github repository http://github.com/nj3252/CB-Source/[email protected] informationSupplementary data available at Bioinformatics online.


2014 ◽  
Vol 696 ◽  
pp. 114-118 ◽  
Author(s):  
Wen Long Yin ◽  
Hong Song Li ◽  
Hao Ran Zhang ◽  
Shu Ting Zhao

Some diseases, particularly cardiovascular disease, will change the shape and structure of retinal vessels. Observation and detection of retinal vessels play an important role in the diagnosis of diseases. Traditional diagnosis of the retinal vessels that ophthalmologist perform under artificial visual attending. Image segmentation based on Markov random field is a method based on statistical theory, which takes into account the correlation between the local pixels, uses the prior knowledge effectively, has fewer model parameters and is easy to be combined with other methods etc., so this method has been widely researched and applied in the field of image segmentation. This paper which mainly studied the Markov random field is how to specific apply to image segmentation, and the iterated conditional mode and the traditional segmentation (clustering) algorithm segmented and compared in the medical retinal vessel image. The method of MRF can effectively restrain the noise in the vessel segmentation.


2011 ◽  
Vol 403-408 ◽  
pp. 3438-3445
Author(s):  
Sucheta Panda ◽  
P.K. Nanda

In this paper, an unsupervised color image segmentation scheme has been proposed for preserving strong and weak edges as well. A Constrained Compound Markov Random Field (MRF) has been proposed as the a priori model for the color labels. We have used Ohta (I1, I2, I3) color model and a controlled correlation of the color space has been accomplished by the proposed compound MRF model. The Constrained Compound MRF (CCMRF) is found to possess the unifying property of modeling scenes as well as color textures. In unsupervised scheme, the associated model parameters and the image labels are estimated recursively. The model parameters are the Maximum Conditional Pseudo Likelihood (MCPL) estimates and the labels are the Maximum a Posteriori (MAP) estimates. The performance of the proposed scheme has been compared with that of Yu’s method and has been found to exhibit improved performance in the context of misclassification error.


Sign in / Sign up

Export Citation Format

Share Document