scholarly journals XCVATR: Detection and Characterization of Variant Impact on the Embeddings of Single -Cell and Bulk RNA-Sequencing Samples

2021 ◽  
Author(s):  
Arif Ozgun Harmanci ◽  
Akdes Serin Harmanci ◽  
Tiemo Klisch ◽  
Akash J Patel

Gene expression profiling via RNA-sequencing has become standard for measuring and analyzing the gene activity in bulk and at single cell level. Increasing sample sizes and cell counts provides substantial information about transcriptional architecture of samples. In addition to quantification of expression at cellular level, RNA-seq can be used for detecting of variants, including single nucleotide variants and small insertions/deletions and also large variants such as copy number variants. The joint analysis of variants with transcriptional state of cells or samples can provide insight about impact of mutations. To provide a comprehensive method to jointly analyze the genetic variants and cellular states, we introduce XCVATR, a method that can identify variants, detect local enrichment of expressed variants within embedding of samples and cells. The embeddings provide information about cellular states among cells by defining a cell-cell distance metric. Unlike clustering algorithms, which depend on a cell-cell distance and use it to define clusters that explain cells globally, XCVATR detects the local enrichment of expressed variants in the embedding space such that embedding can be computed using any type of measurement or method, for example by PCA or tSNE of the expression levels. In other words, XCVATR searches patterns of association of each variant with the positions of cells in an embedding of the cells. XCVATR also visualizes the local clumps of small and large-scale variant calls in single cell and bulk RNA-sequencing datasets. We perform simulations and demonstrate that XCVATR can identify the enrichments of expressed variants and demonstrate its application on several single cell and bulk RNA-seq datasets.

2018 ◽  
Author(s):  
Xianwen Ren ◽  
Liangtao Zheng ◽  
Zemin Zhang

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.


2017 ◽  
Author(s):  
Bo Wang ◽  
Daniele Ramazzotti ◽  
Luca De Sano ◽  
Junjie Zhu ◽  
Emma Pierson ◽  
...  

AbstractMotivationWe here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.Availability and ImplementationSIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on [email protected] or [email protected] InformationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Lin Li ◽  
Hao Dai ◽  
Zhaoyuan Fang ◽  
Luonan Chen

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.


2018 ◽  
Author(s):  
Shibiao Wan ◽  
Junil Kim ◽  
Kyoung Jae Won

ABSTRACTTo process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size datasets (>40,000 cells), SHARP’s running speed far excels other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.


2021 ◽  
Author(s):  
Feiyang Ma ◽  
Patrice A Salomé ◽  
Sabeeha S Merchant ◽  
Matteo Pellegrini

Abstract The photosynthetic unicellular alga Chlamydomonas (Chlamydomonas reinhardtii) is a versatile reference for algal biology because of its ease of culture in the laboratory. Genomic and systems biology approaches have previously described transcriptome responses to environmental changes using bulk data, thus representing the average behavior from pools of cells. Here, we apply single-cell RNA sequencing (scRNA-seq) to probe the heterogeneity of Chlamydomonas cell populations under three environments and in two genotypes differing by the presence of a cell wall. First, we determined that RNA can be extracted from single algal cells with or without a cell wall, offering the possibility to sample natural algal communities. Second, scRNA-seq successfully separated single cells into non-overlapping cell clusters according to their growth conditions. Cells exposed to iron or nitrogen deficiency were easily distinguished despite a shared tendency to arrest photosynthesis and cell division to economize resources. Notably, these groups of cells recapitulated known patterns observed with bulk RNA-seq, but also revealed their inherent heterogeneity. A substantial source of variation between cells originated from their endogenous diurnal phase, although cultures were grown in constant light. We exploited this result to show that circadian iron responses may be conserved from algae to land plants. We document experimentally that bulk RNA-seq data represent an average of typically hidden heterogeneity in the population.


2021 ◽  
Vol 12 ◽  
Author(s):  
Wenbo Yu ◽  
Ahmed Mahfouz ◽  
Marcel J. T. Reinders

The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiaqi Li ◽  
Chengxuan Yu ◽  
Lifeng Ma ◽  
Jingjing Wang ◽  
Guoji Guo

AbstractWith the development of single-cell RNA sequencing (scRNA-seq) technology, analysts need to integrate hundreds of thousands of cells with multiple experimental batches. It is becoming increasingly difficult for users to select the best integration methods to remove batch effects. Here, we compared the advantages and limitations of four commonly used Scanpy-based batch-correction methods using two representative and large-scale scRNA-seq datasets. We quantitatively evaluated batch-correction performance and efficiency. Furthermore, we discussed the performance differences among the evaluated methods at the algorithm level.


Author(s):  
Feiyang Ma ◽  
Patrice A. Salomé ◽  
Sabeeha S. Merchant ◽  
Matteo Pellegrini

ABSTRACTThe photosynthetic unicellular alga Chlamydomonas (Chlamydomonas reinhardtii) is a versatile reference for algal biology because of the facility with which it can be cultured in the laboratory. Genomic and systems biology approaches have previously been used to describe how the transcriptome responds to environmental changes, but this analysis has been limited to bulk data, representing the average behavior from pools of cells. Here, we apply single-cell RNA sequencing (scRNA-seq) to probe the heterogeneity of Chlamydomonas cell populations under three environments and in two genotypes differing in the presence of a cell wall. First, we determined that RNA can be extracted from single algal cells with or without a cell wall, offering the possibility to sample algae communities in the wild. Second, scRNA-seq successfully separated single cells into non-overlapping cell clusters according to their growth conditions. Cells exposed to iron or nitrogen deficiency were easily distinguished despite a shared tendency to arrest cell division to economize resources. Notably, these groups of cells recapitulated known patterns observed with bulk RNA-seq, but also revealed their inherent heterogeneity. A substantial source of variation between cells originated from their endogenous diurnal phase, although cultures were grown in constant light. We exploited this result to show that circadian iron responses may be conserved from algae to land plants. We propose that bulk RNA-seq data represent an average of varied cell states that hides underappreciated heterogeneity.One-sentence summaryWe show that single-cell RNA-seq (scRNA-seq) can be applied to Chlamydomonas cultures to reveal the that heterogenity in bulk cultures is largely driven by diurnal cycle phasesThe author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Matteo Pellegrini ([email protected])


2022 ◽  
Vol 12 ◽  
Author(s):  
Shunuo Zhang ◽  
Yixin Zhang ◽  
Peiru Min

Hypertrophic scar (HS) is a common skin disorder characterized by excessive extracellular matrix (ECM) deposition. However, it is still unclear how the cellular composition, cell-cell communications, and crucial transcriptionally regulatory network were changed in HS. In the present study, we found that FB-1, which was identified a major type of fibroblast and had the characteristics of myofibroblast, was significantly expanded in HS by integrative analysis of the single-cell and bulk RNA sequencing (RNA-seq) data. Moreover, the proportion of KC-2, which might be a differentiated type of keratinocyte (KC), was reduced in HS. To decipher the intercellular signaling, we conducted the cell-cell communication analysis between the cell types, and found the autocrine signaling of HB-1 through COL1A1/2-CD44 and CD99-CD99 and the intercellular contacts between FB-1/FB-5 and KC-2 through COL1A1/COL1A2/COL6A1/COL6A2-SDC4. Almost all the ligands and receptors involved in the autocrine signaling of HB-1 were upregulated in HS by both scRNA-seq and bulk RNA-seq data. In contrast, the receptor of KC-2, SDC4, which could bind to multiple ligands, was downregulated in HS, suggesting that the reduced proportion of KC-2 and apoptotic phenotype of KC-2 might be associated with the downregulation of SDC4. Furthermore, we also investigated the transcriptionally regulatory network involved in HS formation. The integrative analysis of the scRNA-seq and bulk RNA-seq data identified CREB3L1 and TWIST2 as the critical TFs involved in the myofibroblast of HS. In summary, the integrative analysis of the single-cell RNA sequencing (scRNA-seq) and bulk RNA-seq data greatly improved our understanding of the biological characteristics during the HS formation.


2018 ◽  
Author(s):  
Juan Xie ◽  
Anjun Ma ◽  
Yu Zhang ◽  
Bingqiang Liu ◽  
Changlin Wan ◽  
...  

ABSTRACTThe combination of biclustering and large-scale gene expression data holds a promising potential for inference of the condition specific functional pathways/networks. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-sequencing (RNA-Seq) data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, e.g., the massive zeros or lowly expressed genes in the data, especially for single-cell RNA-Seq (scRNA-Seq) data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. Here we presented a novel biclustering algorithm namely QUBIC2, for the analysis of large-scale bulk RNA-Seq and scRNA-Seq data. Key novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression, (ii) adopted the mixture Gaussian distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes, (iii) utilized a Core-Dual strategy to identify biclusters and optimize relevant parameters, and (iv) developed a size-based P-value framework to evaluate the statistical significances of all the identified biclusters. Our method validation on comprehensive data sets of bulk and single cell RNA-seq data suggests that QUBIC2 had superior performance in functional modules detection and cell type classification compared with the other five widely-used biclustering tools. In addition, the applications of temporal and spatial data demonstrated that QUBIC2 can derive meaningful biological information from scRNA-Seq data. The source code for QUBIC2 can be freely accessed at https://github.com/maqin2001/qubic2.


Sign in / Sign up

Export Citation Format

Share Document