A Markov Random Field Model for Network-based Differential Expression Analysis of Single-cell RNA-seq Data

Abstract Background: Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes. In this article, we propose to borrow information through known biological networks. Results: We develop MRFscRNAseq, which is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.Conclusions: The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method provides differential expression analysis for scRNA-seq data with increased statistical power.

Download Full-text

A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data

BMC Bioinformatics ◽

10.1186/s12859-021-04412-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hongyu Li ◽

Biqing Zhu ◽

Zhichao Xu ◽

Taylor Adams ◽

Naftali Kaminski ◽

...

Keyword(s):

Random Field ◽

Single Cell ◽

Differentially Expressed Genes ◽

Markov Random Field ◽

Statistical Power ◽

Cell Types ◽

Differentially Expressed ◽

Cell Type ◽

Markov Random ◽

Cell Type Specific

Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.

Download Full-text

A Markov Random Field Model for Network-based Differential Expression Analysis of Single-cell RNA-seq Data

10.1101/2020.11.11.378976 ◽

2020 ◽

Author(s):

Hongyu Li ◽

Zhichao Xu ◽

Taylor Adams ◽

Naftali Kaminski ◽

Hongyu Zhao

Keyword(s):

Random Field ◽

Single Cell ◽

Markov Random Field ◽

Supplementary Information ◽

Model Parameters ◽

Type I ◽

Cell Type ◽

Data Set ◽

Markov Random ◽

Cell Type Specific

AbstractMotivationRecent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes.ResultsIn this article, we propose to borrow information through known biological networks. Our approach is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cells to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.AvailabilityThe algorithm is implemented in R. The source code can be downloaded at https://github.com/eddiehli/[email protected] informationSupplementary data are available online.

Download Full-text

Estimating and Correcting for Off-Target Cellular Contamination in Brain Cell Type Specific RNA-Seq Data

Frontiers in Molecular Neuroscience ◽

10.3389/fnmol.2021.637143 ◽

2021 ◽

Vol 14 ◽

Author(s):

Jordan Sicherman ◽

Dwight F. Newton ◽

Paul Pavlidis ◽

Etienne Sibille ◽

Shreejoy J. Tripathy

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Target Cell ◽

Differential Expression Analysis ◽

Cell Types ◽

Brain Cell ◽

Cell Type ◽

Cell Purification ◽

Single Cell Type

Transcriptionally profiling minor cellular populations remains an ongoing challenge in molecular genomics. Single-cell RNA sequencing has provided valuable insights into a number of hypotheses, but practical and analytical challenges have limited its widespread adoption. A similar approach, which we term single-cell type RNA sequencing (sctRNA-seq), involves the enrichment and sequencing of a pool of cells, yielding cell type-level resolution transcriptomes. While this approach offers benefits in terms of mRNA sampling from targeted cell types, it is potentially affected by off-target contamination from surrounding cell types. Here, we leveraged single-cell sequencing datasets to apply a computational approach for estimating and controlling the amount of off-target cell type contamination in sctRNA-seq datasets. In datasets obtained using a number of technologies for cell purification, we found that most sctRNA-seq datasets tended to show some amount of off-target mRNA contamination from surrounding cells. However, using covariates for cellular contamination in downstream differential expression analyses increased the quality of our models for differential expression analysis in case/control comparisons and typically resulted in the discovery of more differentially expressed genes. In general, our method provides a flexible approach for detecting and controlling off-target cell type contamination in sctRNA-seq datasets.

Download Full-text

Confronting false discoveries in single-cell differential expression

Nature Communications ◽

10.1038/s41467-021-25960-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Jordan W. Squair ◽

Matthieu Gautier ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Nicholas D. James ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Differentially Expressed Genes ◽

Differential Expression Analysis ◽

Differentially Expressed ◽

Cell Type ◽

Mouse Spinal Cord ◽

Cell Type Specific ◽

False Discoveries ◽

Biological Differences

AbstractDifferential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulations. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. To exemplify these principles, we exposed true and false discoveries of differentially expressed genes in the injured mouse spinal cord.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data

10.1101/426593 ◽

2018 ◽

Cited By ~ 2

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Xiaoyu Lu ◽

Szu-wei Tu ◽

Yifan Sun ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Training Data ◽

Marker Genes ◽

Cell Detection ◽

Omics Data ◽

Deconvolution Method ◽

Cell Type ◽

Data Set ◽

Cell Type Specific

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.

Download Full-text

Integrated Single Cell Atlas of Endothelial Cells of the Human Lung

Circulation ◽

10.1161/circulationaha.120.052318 ◽

2021 ◽

Author(s):

Jonas C. Schupp ◽

Taylor S. Adams ◽

Carlos Cosme Jr. ◽

Micha Sam Brickman Raredon ◽

Yifan Yuan ◽

...

Keyword(s):

Endothelial Cells ◽

Pulmonary Hypertension ◽

Single Cell ◽

Differential Expression ◽

Human Lung ◽

Differential Expression Analysis ◽

Cell Types ◽

Marker Genes ◽

Lung Endothelium ◽

Lung Endothelial Cells

Background: The cellular diversity of the lung endothelium has not been systematically characterized in humans. Here, we provide a reference atlas of human lung endothelial cells (ECs) to facilitate a better understanding of the phenotypic diversity and composition of cells comprising the lung endothelium. Methods: We reprocessed human control single cell RNA sequencing (scRNAseq) data from six datasets. EC populations were characterized through iterative clustering with subsequent differential expression analysis. Marker genes were validated by fluorescent microscopy and in situ hybridization. scRNAseq of primary lung ECs cultured in-vitro was performed. The signaling network between different lung cell types was studied. For cross species analysis or disease relevance, we applied the same methods to scRNAseq data obtained from mouse lungs or from human lungs with pulmonary hypertension. Results: Six lung scRNAseq datasets were reanalyzed and annotated to identify over 15,000 vascular EC cells from 73 individuals. Differential expression analysis of EC revealed signatures corresponding to endothelial lineage, including pan-endothelial, pan-vascular and subpopulation-specific marker gene sets. Beyond the broad cellular categories of lymphatic, capillary, arterial and venous ECs, we found previously indistinguishable subpopulations: among venous EC, we identified two previously indistinguishable populations, pulmonary-venous ECs (COL15A1neg) localized to the lung parenchyma and systemic-venous ECs (COL15A1pos) localized to the airways and the visceral pleura; among capillary EC, we confirmed their subclassification into recently discovered aerocytes characterized by EDNRB, SOSTDC1 and TBX2 and general capillary EC. We confirmed that all six endothelial cell types, including the systemic-venous EC and aerocytes, are present in mice and identified endothelial marker genes conserved in humans and mice. Ligand-receptor connectome analysis revealed important homeostatic crosstalk of EC with other lung resident cell types. scRNAseq of commercially available primary lung ECs demonstrated a loss of their native lung phenotype in culture. scRNAseq revealed that the endothelial diversity is maintained in pulmonary hypertension. Our manuscript is accompanied by an online data mining tool (www.LungEndothelialCellAtlas.com). Conclusions: Our integrated analysis provides the comprehensive and well-crafted reference atlas of lung endothelial cells in the normal lung and confirms and describes in detail previously unrecognized endothelial populations across a large number of humans and mice.

Download Full-text

Single-cell RNA sequencing reveals cell type- and artery type-specific vascular remodelling in male spontaneously hypertensive rats

Cardiovascular Research ◽

10.1093/cvr/cvaa164 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jun Cheng ◽

Wenduo Gu ◽

Ting Lan ◽

Jiacheng Deng ◽

Zhichao Ni ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Spontaneously Hypertensive Rats ◽

Cell Types ◽

Vascular Remodelling ◽

Cell Type ◽

Hypertensive Rats ◽

Spontaneously Hypertensive ◽

Single Cell Rna Sequencing ◽

Cell Type Specific

Abstract Aims Hypertension is a major risk factor for cardiovascular diseases. However, vascular remodelling, a hallmark of hypertension, has not been systematically characterized yet. We described systematic vascular remodelling, especially the artery type- and cell type-specific changes, in hypertension using spontaneously hypertensive rats (SHRs). Methods and results Single-cell RNA sequencing was used to depict the cell atlas of mesenteric artery (MA) and aortic artery (AA) from SHRs. More than 20 000 cells were included in the analysis. The number of immune cells more than doubled in aortic aorta in SHRs compared to Wistar Kyoto controls, whereas an expansion of MA mesenchymal stromal cells (MSCs) was observed in SHRs. Comparison of corresponding artery types and cell types identified in integrated datasets unravels dysregulated genes specific for artery types and cell types. Intersection of dysregulated genes with curated gene sets including cytokines, growth factors, extracellular matrix (ECM), receptors, etc. revealed vascular remodelling events involving cell–cell interaction and ECM re-organization. Particularly, AA remodelling encompasses upregulated cytokine genes in smooth muscle cells, endothelial cells, and especially MSCs, whereas in MA, change of genes involving the contractile machinery and downregulation of ECM-related genes were more prominent. Macrophages and T cells within the aorta demonstrated significant dysregulation of cellular interaction with vascular cells. Conclusion Our findings provide the first cell landscape of resistant and conductive arteries in hypertensive animal models. Moreover, it also offers a systematic characterization of the dysregulated gene profiles with unbiased, artery type-specific and cell type-specific manners during hypertensive vascular remodelling.

Download Full-text

B-Cell and Monocyte Contribution to Systemic Lupus Erythematosus Identified by Cell-Type-Specific Differential Expression Analysis in RNA-Seq Data

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29470 ◽

2015 ◽

Vol 9s3 ◽

pp. BBI.S29470 ◽

Cited By ~ 6

Author(s):

Mikhail G. Dozmorov ◽

Nicolas Dominguez ◽

Krista Bean ◽

Susan R. Macwana ◽

Virginia Roberts ◽

...

Keyword(s):

Gene Expression ◽

B Cell ◽

Differential Expression ◽

Expression Analysis ◽

Lupus Erythematosus ◽

Differential Expression Analysis ◽

Specific Gene ◽

Cell Type ◽

Specific Gene Expression ◽

Cell Type Specific

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.

Download Full-text