scholarly journals CellFishing.jl: an ultrafast and scalable cell search method for single-cell RNA-sequencing

2018 ◽  
Author(s):  
Kenta Sato ◽  
Koki Tsuyuzaki ◽  
Kentaro Shimizu ◽  
Itoshi Nikaido

AbstractRecent technical improvements in single-cell RNA sequencing (scRNA-seq) have enabled massively parallel profiling of transcriptomes, thereby promoting large-scale studies encompassing a wide range of cell types of multicellular organisms. With this background, we propose CellFishing.jl, a new method for searching atlas-scale datasets for similar cells and detecting noteworthy genes of query cells with high accuracy and throughput. Using multiple scRNA-seq datasets, we validate that our method demonstrates comparable accuracy to and is markedly faster than the state-of-the-art software. Moreover, CellFishing.jl is scalable to more than one million cells, and the throughput of the search is approximately 1,600 cells per second.

2021 ◽  
Vol 12 ◽  
Author(s):  
Bin Zou ◽  
Tongda Zhang ◽  
Ruilong Zhou ◽  
Xiaosen Jiang ◽  
Huanming Yang ◽  
...  

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.


2019 ◽  
Author(s):  
Allison Jevitt ◽  
Deeptiman Chatterjee ◽  
Gengqiang Xie ◽  
Xian-Feng Wang ◽  
Taylor Otwell ◽  
...  

AbstractOogenesis is a complex developmental process that involves spatiotemporally regulated coordination between the germline and supporting, somatic cell populations. This process has been modelled extensively using theDrosophilaovary. While different ovarian cell types have been identified through traditional means, the large-scale expression profiles underlying each cell type remain unknown. Using single-cell RNA sequencing technology, we have built a transcriptomic dataset for the adultDrosophilaovary and connected tissues. This dataset captures the entire transcriptional trajectory of the developing follicle cell population over time. Our findings provide detailed insight into processes such as cell-cycle switching, migration, symmetry breaking, nurse cell engulfment, egg-shell formation, and signaling during corpus luteum formation, marking a newly identified oogenesis-to-ovulation transition. Altogether, these findings provide a broad perspective on oogenesis at a single-cell resolution while revealing new genetic markers and fate-specific transcriptional signatures to facilitate future studies.


2020 ◽  
Vol 21 (8) ◽  
pp. 585-601
Author(s):  
Zhongli Chen ◽  
Liang Wei ◽  
Firat Duru ◽  
Liang Chen

Background: The cardiac system is a combination of a complex structure, various cells, and versatile specified functions and sophisticated regulatory mechanisms. Moreover, cardiac diseases that encompass a wide range of endogenous conditions, remain a serious health burden worldwide. Recent genome-wide profiling techniques have taken the lead in uncovering a new realm of cell types and molecular programs driving physiological and pathological processes in various organs and diseases. In particular, the emerging technique single-cell RNA sequencing dominates a breakthrough in decoding the cell heterogeneity, phenotype transition, and developmental dynamics in cardiovascular science. Conclusion: Herein, we review recent advances in single cellular studies of cardiovascular system and summarize new insights provided by single-cell RNA sequencing in heart developmental sciences, stem-cell researches as well as normal or disease-related working mechanisms.


2017 ◽  
Author(s):  
Lihua Zhang ◽  
Shihua Zhang

AbstractSingle-cell RNA-sequencing (scRNA-seq) is a recent breakthrough technology, which paves the way for measuring RNA levels at single cell resolution to study precise biological functions. One of the main challenges when analyzing scRNA-seq data is the presence of zeros or dropout events, which may mislead downstream analyses. To compensate the dropout effect, several methods have been developed to impute gene expression since the first Bayesian-based method being proposed in 2016. However, these methods have shown very diverse characteristics in terms of model hypothesis and imputation performance. Thus, large-scale comparison and evaluation of these methods is urgently needed now. To this end, we compared eight imputation methods, evaluated their power in recovering original real data, and performed broad analyses to explore their effects on clustering cell types, detecting differentially expressed genes, and reconstructing lineage trajectories in the context of both simulated and real data. Simulated datasets and case studies highlight that there are no one method performs the best in all the situations. Some defects of these methods such as scalability, robustness and unavailability in some situations need to be addressed in future studies.


2021 ◽  
Author(s):  
Mohammad Lotfollahi ◽  
leander Dony ◽  
Harshita Agarwala ◽  
Fabian J Theis

Learning robust representations can help uncover underlying biological variation in scRNA-seq data. Disentangled representation learning is one approach to obtain such informative as well interpretable representations. Here, we learn disentangled representations of scRNA-seq data using β-variational autoencoder (β-VAE) and apply the model for out-of-distribution (OOD) prediction. We demonstrate accurate gene expression predictions for cell types absent from training in a perturbation and a developmental dataset. We further show that β-VAE outperforms a state-of-the-art disentanglement method for scRNA-seq in OOD prediction while achieving better disentanglement performance.


2021 ◽  
Vol 2 (12) ◽  
pp. 1283-1290
Author(s):  
Safir Ullah Khan ◽  
Munir Ullah Khan

Multicellular organisms have many cell types and are complex, and heterogeneity is common among cells. Single-Cell RNA Sequencing (scRNA-SEQ) is a new technique for studying the transcriptional activity of a single cell that is still in its early stages of development. It generates transcriptional profiles from thousands of parallel cells to reveal the differential expression of individual cell genomes. They reflect the heterogeneity between cells to identify different cell types and form cell maps of tissues or organs, which play an essential role in biology and clinical medicine. Based on the introduction and comparison of the scRNA-SEQ sequencing platform, this paper focuses on the application of scRNA-SEQ in the exploration of cell types in the nervous system and immune system and summarizes the research results of the combination of scRNA-SEQ and spatial transcriptome technology.


2019 ◽  
Author(s):  
Baolin Liu ◽  
Chenwei Li ◽  
Ziyi Li ◽  
Xianwen Ren ◽  
Zemin Zhang

AbstractSingle-cell RNA sequencing (scRNA-seq) is a versatile tool for discovering and annotating cell types and states, but the determination and annotation of cell subtypes is often subjective and arbitrary. Often, it is not even clear whether a given cluster is uniform. Here we present an entropy-based statistic, ROGUE, to accurately quantify the purity of identified cell clusters. We demonstrated that our ROGUE metric is generalizable across datasets, and enables accurate, sensitive and robust assessment of cluster purity on a wide range of simulated and real datasets. Applying this metric to fibroblast and B cell datasets, we identified additional subtypes and demonstrated the application of ROGUE-guided analyses to detect true signals in specific subpopulations. ROGUE can be applied to all tested scRNA-seq datasets, and has important implications for evaluating the quality of putative clusters, discovering pure cell subtypes and constructing comprehensive, detailed and standardized single cell atlas.


Author(s):  
Yinlei Hu ◽  
Bin Li ◽  
Falai Chen ◽  
Kun Qu

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.


Sign in / Sign up

Export Citation Format

Share Document