ClusterMap: compare multiple single cell RNA-Seq datasets across different experimental conditions

2019 ◽  
Vol 35 (17) ◽  
pp. 3038-3045 ◽  
Author(s):  
Xin Gao ◽  
Deqing Hu ◽  
Madelaine Gogol ◽  
Hua Li

Abstract Motivation Single cell RNA-Seq (scRNA-Seq) facilitates the characterization of cell type heterogeneity and developmental processes. Further study of single cell profiles across different conditions enables the understanding of biological processes and underlying mechanisms at the sub-population level. However, developing proper methodology to compare multiple scRNA-Seq datasets remains challenging. Results We have developed ClusterMap, a systematic method and workflow to facilitate the comparison of scRNA-seq profiles across distinct biological contexts. Using hierarchical clustering of the marker genes of each sub-group, ClusterMap matches the sub-types of cells across different samples and provides ‘similarity’ as a metric to quantify the quality of the match. We introduce a purity tree cut method designed specifically for this matching problem. We use Circos plot and regrouping method to visualize the results concisely. Furthermore, we propose a new metric ‘separability’ to summarize sub-population changes among all sample pairs. In the case studies, we demonstrate that ClusterMap has the ability to provide us further insight into the different molecular mechanisms of cellular sub-populations across different conditions. Availability and implementation ClusterMap is implemented in R and available at https://github.com/xgaoo/ClusterMap. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Xin Gao ◽  
Deqing Hu ◽  
Madelaine Gogol ◽  
Hua Li

AbstractSingle cell RNA-Seq facilitates the characterization of cell type heterogeneity and developmental processes. Further study of single cell profiles across different conditions enables the understanding of biological processes and underlying mechanisms at the sub-population level. However, developing proper methodology to compare multiple scRNA-Seq datasets remains challenging. We have developed ClusterMap, a systematic method and workflow to facilitate the comparison of scRNA profiles across distinct biological contexts. Using hierarchical clustering of the marker genes of each sub-group, ClusterMap matches the sub-types of cells across different samples and provides “similarity” as a metric to quantify the quality of the match. We introduce a purity tree cut method designed specifically for this matching problem. We use Circos plot and regrouping method to visualize the results concisely. Furthermore, we propose a new metric “separability” to summarize sub-population changes among all sample pairs. In three case studies, we demonstrate that ClusterMap has the ability to offer us further insight into the different molecular mechanisms of cellular sub-populations across different conditions. ClusterMap is implemented in R and available at https://github.com/xgaoo/ClusterMap.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Momoko Hamano ◽  
Seitaro Nomura ◽  
Midori Iida ◽  
Issei Komuro ◽  
Yoshihiro Yamanishi

AbstractHeart failure is a heterogeneous disease with multiple risk factors and various pathophysiological types, which makes it difficult to understand the molecular mechanisms involved. In this study, we proposed a trans-omics approach for predicting molecular pathological mechanisms of heart failure and identifying marker genes to distinguish heterogeneous phenotypes, by integrating multiple omics data including single-cell RNA-seq, ChIP-seq, and gene interactome data. We detected a significant increase in the expression level of natriuretic peptide A (Nppa), after stress loading with transverse aortic constriction (TAC), and showed that cardiomyocytes with high Nppa expression displayed specific gene expression patterns. Multiple NADH ubiquinone complex family, which are associated with the mitochondrial electron transport system, were negatively correlated with Nppa expression during the early stages of cardiac hypertrophy. Large-scale ChIP-seq data analysis showed that Nkx2-5 and Gtf2b were transcription factors characteristic of high-Nppa-expressing cardiomyocytes. Nppa expression levels may, therefore, represent a useful diagnostic marker for heart failure.


Author(s):  
Yiheng Peng ◽  
Huanyu Qiao

Meiosis is a cellular division process that produces gametes for sexual reproduction. Disruption of complex events throughout meiosis, such as synapsis and homologous recombination, can lead to infertility and aneuploidy. To reveal the molecular mechanisms of these events, transcriptome studies of specific substages must be conducted. However, conventional methods, such as bulk RNA-seq and RT-qPCR, are not able to detect the transcriptional variations effectively and precisely, especially for identifying cell types and stages with subtle differences. In recent years, mammalian meiotic transcriptomes have been intensively studied at the single-cell level by using single-cell RNA-seq (scRNA-seq) approaches, especially through two widely used platforms, Smart-seq2 and Drop-seq. The scRNA-seq protocols along with their downstream analysis enable researchers to accurately identify cell heterogeneities and investigate meiotic transcriptomes at a higher resolution. In this review, we compared bulk RNA-seq and scRNA-seq to show the advantages of the scRNA-seq in meiosis studies; meanwhile, we also pointed out the challenges and limitations of the scRNA-seq. We listed recent findings from mammalian meiosis (male and female) studies where scRNA-seq applied. Next, we summarized the scRNA-seq analysis methods and the meiotic marker genes from spermatocytes and oocytes. Specifically, we emphasized the different features of the two scRNA-seq protocols (Smart-seq2 and Drop-seq) in the context of meiosis studies and discussed their strengths and weaknesses in terms of different research purposes. Finally, we discussed the future applications of scRNA-seq in the meiosis field.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


Author(s):  
Tongbin Wu ◽  
Zhengyu Liang ◽  
Zengming Zhang ◽  
Canzhao Liu ◽  
Lunfeng Zhang ◽  
...  

Background: Left ventricular noncompaction cardiomyopathy (LVNC) was discovered half a century ago as a cardiomyopathy with excessive trabeculation and a thin ventricular wall. In the decades since, numerous studies have demonstrated that LVNC primarily impacts left ventricles (LVs), and is often associated with LV dilation and dysfunction. However, owing in part to the lack of suitable mouse models that faithfully mirror the selective LV vulnerability in patients, mechanisms underlying susceptibility of LV to dilation and dysfunction in LVNC remain unknown. Genetic studies have revealed that deletions and mutations in PRDM16 cause LVNC, but previous conditional Prdm16 knockout mouse models do not mirror the LVNC phenotype in patients, and importantly, the underlying molecular mechanisms by which PRDM16 deficiency causes LVNC are still unclear. Methods: Prdm16 cardiomyocyte (CM)-specific knockout ( Prdm16 cKO ) mice were generated and analyzed for cardiac phenotypes. RNA sequencing and ChIP sequencing were performed to identify direct transcriptional targets of PRDM16 in CMs. Single cell RNA sequencing in combination with Spatial Transcriptomics were employed to determine CM identity at single cell level. Results: CM-specific ablation of Prdm16 in mice caused LV-specific dilation and dysfunction, as well as biventricular noncompaction, which fully recapitulated LVNC in patients. Mechanistically, PRDM16 functioned as a compact myocardium-enriched transcription factor, which activated compact myocardial genes while repressing trabecular myocardial genes in LV compact myocardium. Consequently, Prdm16 cKO LV compact myocardial CMs shifted from their normal transcriptomic identity to a transcriptional signature resembling trabecular myocardial CMs and/or neurons. Chamber-specific transcriptional regulation by PRDM16 was in part due to its cooperation with LV-enriched transcription factors Tbx5 and Hand1. Conclusions: These results demonstrate that disruption of proper specification of compact CM may play a key role in the pathogenesis of LVNC. They also shed light on underlying mechanisms of LV-restricted transcriptional program governing LV chamber growth and maturation, providing a tangible explanation for the susceptibility of LV in a subset of LVNC cardiomyopathies.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2017 ◽  
Author(s):  
Zhun Miao ◽  
Ke Deng ◽  
Xiaowo Wang ◽  
Xuegong Zhang

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.


2018 ◽  
Vol 34 (12) ◽  
pp. 2077-2086 ◽  
Author(s):  
Suoqin Jin ◽  
Adam L MacLean ◽  
Tao Peng ◽  
Qing Nie

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using ‘single-cell energy’ and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are—in combination—more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document