scholarly journals CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq

2021 ◽  
Vol 12 ◽  
Author(s):  
Wenbo Yu ◽  
Ahmed Mahfouz ◽  
Marcel J. T. Reinders

The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.

Author(s):  
Yinghao Cao ◽  
Xiaoyue Wang ◽  
Gongxin Peng

AbstractCurrently most methods take manual strategies to annotate cell types after clustering the single-cell RNA sequencing (scRNA-seq) data. Such methods are labor-intensive and heavily rely on user expertise, which may lead to inconsistent results. We present SCSA, an automatic tool to annotate cell types from scRNA-seq data, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Evaluation on real scRNA-seq datasets from different sources with other methods shows that SCSA is able to assign the cells into the correct types at a fully automated mode with a desirable precision.


2020 ◽  
Vol 11 ◽  
Author(s):  
Yinghao Cao ◽  
Xiaoyue Wang ◽  
Gongxin Peng
Keyword(s):  
Rna Seq ◽  

2019 ◽  
Vol 36 (8) ◽  
pp. 2474-2485 ◽  
Author(s):  
Zhanying Feng ◽  
Xianwen Ren ◽  
Yuan Fang ◽  
Yining Yin ◽  
Chutian Huang ◽  
...  

Abstract Motivation Single cell RNA-seq data offers us new resource and resolution to study cell type identity and its conversion. However, data analyses are challenging in dealing with noise, sparsity and poor annotation at single cell resolution. Detecting cell-type-indicative markers is promising to help denoising, clustering and cell type annotation. Results We developed a new method, scTIM, to reveal cell-type-indicative markers. scTIM is based on a multi-objective optimization framework to simultaneously maximize gene specificity by considering gene-cell relationship, maximize gene’s ability to reconstruct cell–cell relationship and minimize gene redundancy by considering gene–gene relationship. Furthermore, consensus optimization is introduced for robust solution. Experimental results on three diverse single cell RNA-seq datasets show scTIM’s advantages in identifying cell types (clustering), annotating cell types and reconstructing cell development trajectory. Applying scTIM to the large-scale mouse cell atlas data identifies critical markers for 15 tissues as ‘mouse cell marker atlas’, which allows us to investigate identities of different tissues and subtle cell types within a tissue. scTIM will serve as a useful method for single cell RNA-seq data mining. Availability and implementation scTIM is freely available at https://github.com/Frank-Orwell/scTIM. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Maria Brbić ◽  
Marinka Zitnik ◽  
Sheng Wang ◽  
Angela O. Pisco ◽  
Russ B. Altman ◽  
...  

Although tremendous effort has been put into cell type annotation and classification, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as novel cell types. MARS overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets. MARS uses deep learning to learn a cell embedding function as well as a set of landmarks in the cell embedding space. The method annotates cells by probabilistically defining a cell type based on nearest landmarks in the embedding space. MARS has a unique ability to discover cell types that have never been seen before and annotate experiments that are yet unannotated. We apply MARS to a large aging cell atlas of 23 tissues covering the life span of a mouse. MARS accurately identifies cell types, even when it has never seen them before. Further, the method automatically generates interpretable names for novel cell types. Remarkably, MARS estimates meaningful cell-type-specific signatures of aging and visualizes them as trajectories reflecting temporal relationships of cells in a tissue.


2019 ◽  
Author(s):  
Jingxin Liu ◽  
You Song ◽  
Jinzhi Lei

We present the use of single-cell entropy (scEntropy) to measure the order of the cellular transcriptome profile from single-cell RNA-seq data, which leads to a method of unsupervised cell type classification through scEntropy followed by the Gaussian mixture model (scEGMM). scEntropy is straightforward in defining an intrinsic transcriptional state of a cell. scEGMM is a coherent method of cell type classification that includes no parameters and no clustering; however, it is comparable to existing machine learning-based methods in benchmarking studies and facilitates biological interpretation.


2021 ◽  
Author(s):  
Arif Ozgun Harmanci ◽  
Akdes Serin Harmanci ◽  
Tiemo Klisch ◽  
Akash J Patel

Gene expression profiling via RNA-sequencing has become standard for measuring and analyzing the gene activity in bulk and at single cell level. Increasing sample sizes and cell counts provides substantial information about transcriptional architecture of samples. In addition to quantification of expression at cellular level, RNA-seq can be used for detecting of variants, including single nucleotide variants and small insertions/deletions and also large variants such as copy number variants. The joint analysis of variants with transcriptional state of cells or samples can provide insight about impact of mutations. To provide a comprehensive method to jointly analyze the genetic variants and cellular states, we introduce XCVATR, a method that can identify variants, detect local enrichment of expressed variants within embedding of samples and cells. The embeddings provide information about cellular states among cells by defining a cell-cell distance metric. Unlike clustering algorithms, which depend on a cell-cell distance and use it to define clusters that explain cells globally, XCVATR detects the local enrichment of expressed variants in the embedding space such that embedding can be computed using any type of measurement or method, for example by PCA or tSNE of the expression levels. In other words, XCVATR searches patterns of association of each variant with the positions of cells in an embedding of the cells. XCVATR also visualizes the local clumps of small and large-scale variant calls in single cell and bulk RNA-sequencing datasets. We perform simulations and demonstrate that XCVATR can identify the enrichments of expressed variants and demonstrate its application on several single cell and bulk RNA-seq datasets.


2020 ◽  
Vol 15 (01) ◽  
pp. 35-49
Author(s):  
Jingxin Liu ◽  
You Song ◽  
Jinzhi Lei

The cell is the basic functional and biological unit of life, and a complex system that contains a huge number of molecular components. How can we quantify the macroscopic state of a cell from the microscopic information of these molecular components? This is a fundamental question to increase the understanding of the human body. The recent maturation of single-cell RNA sequencing (scRNA-seq) technologies has allowed researchers to gain information on the transcriptomes of individual cells. Although considerable progress has been made in terms of cell-type clustering over the past few years, there is no strong consensus about how to define a cell state from scRNA-seq data. Here, we present single-cell entropy (scEntropy) as an order parameter for cellular transcriptome profiles from scRNA-seq data. scEntropy is a straightforward parameter with which to define the intrinsic transcriptional state of a cell that can provide a quantity to measure the developmental process and to distinguish different cell types. The proposed scEntropy followed by Gaussian mixture model (scEGMM) provides a coherent method of cell-type classification that is simple, includes no parameters or clustering and is comparable to existing machine learning-based methods in benchmarking studies. The results of cell-type classification based on scEGMM are robust and easy to biologically interpret.


2021 ◽  
Vol 4 (6) ◽  
pp. e202001004
Author(s):  
Almut Lütge ◽  
Joanna Zyprych-Walczak ◽  
Urszula Brykczynska Kunzmann ◽  
Helena L Crowell ◽  
Daniela Calini ◽  
...  

A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, we systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. We developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. We compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, we find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. We find that cell-specific metrics outperform cell type–specific and global metrics and recommend them for both method benchmarks and batch exploration.


Author(s):  
Qianhui Huang ◽  
Yu Liu ◽  
Yuheng Du ◽  
Lana X. Garmire
Keyword(s):  
Rna Seq ◽  

Genomics ◽  
2021 ◽  
Vol 113 (6) ◽  
pp. 3582-3598
Author(s):  
Xiujun Sun ◽  
Li Li ◽  
Biao Wu ◽  
Jianlong Ge ◽  
Yanxin Zheng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document