scholarly journals scJoint: transfer learning for data integration of single-cell RNA-seq and ATAC-seq

2021 ◽  
Author(s):  
Yingxin Lin ◽  
Tung-Yu Wu ◽  
Sheng Wan ◽  
Jean Y.H. Yang ◽  
Y. X. Rachel Wang ◽  
...  

AbstractSingle-cell multi-omics data continues to grow at an unprecedented pace, and while integrating different modalities holds the promise for better characterisation of cell identities, it remains a significant computational challenge. In particular, extreme sparsity is a hallmark in many modalities such as scATAC-seq data and often limits their power in cell type identification. Here we present scJoint, a transfer learning method to integrate heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint uses a neural network to simultaneously train labelled and unlabelled data and embed cells from both modalities in a common lower dimensional space, enabling label transfer and joint visualisation in an integrative framework. We demonstrate scJoint consistently provides meaningful joint visualisations and achieves significantly higher label transfer accuracy than existing methods using a complex cell atlas data and a biologically varying multi-modal data. This suggests scJoint is effective in overcoming the heterogeneity in different modalities towards a more comprehensive understanding of cellular phenotypes.

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Wenbin Ye ◽  
Guoli Ji ◽  
Pengchao Ye ◽  
Yuqi Long ◽  
Xuesong Xiao ◽  
...  

2021 ◽  
Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.


2019 ◽  
Author(s):  
Wenbo Guo ◽  
Dongfang Wang ◽  
Shicheng Wang ◽  
Yiran Shan ◽  
Jin Gu

AbstractSummaryMolecular heterogeneities bring great challenges for cancer diagnosis and treatment. Recent advance in single cell RNA-sequencing (scRNA-seq) technology make it possible to study cancer transcriptomic heterogeneities at single cell level. Here, we develop an R package named scCancer which focuses on processing and analyzing scRNA-seq data for cancer research. Except basic data processing steps, this package takes several special considerations for cancer-specific features. Firstly, the package introduced comprehensive quality control metrics. Secondly, it used a data-driven machine learning algorithm to accurately identify major cancer microenvironment cell populations. Thirdly, it estimated a malignancy score to classify malignant (cancerous) and non-malignant cells. Then, it analyzed intra-tumor heterogeneities by key cellular phenotypes (such as cell cycle and stemness) and gene signatures. Finally, a user-friendly graphic report was generated for all the analyses.Availabilityhttp://lifeome.net/software/sccancer/[email protected]


2018 ◽  
Author(s):  
Laura T. Donlin ◽  
Deepak A. Rao ◽  
Kevin Wei ◽  
Kamil Slowikowski ◽  
Mandy J. McGeachy ◽  
...  

AbstractBackgroundDetailed molecular analyses of cells from rheumatoid arthritis (RA) synovium hold promise in identifying cellular phenotypes that drive tissue pathology and joint damage. The Accelerating Medicines Partnership (AMP) RA/SLE network aims to deconstruct autoimmune pathology by examining cells within target tissues through multiple high-dimensional assays. Robust standardized protocols need to be developed before cellular phenotypes at a single cell level can be effectively compared across patient samples.MethodsMultiple clinical sites collected cryopreserved synovial tissue fragments from arthroplasty and synovial biopsy in a 10%-DMSO solution. Mechanical and enzymatic dissociation parameters were optimized for viable cell extraction and surface protein preservation for cell sorting and mass cytometry, as well as for reproducibility in RNA sequencing (RNA-seq). Cryopreserved synovial samples were collectively analyzed at a central processing site by a custom-designed and validated 35-marker mass cytometry panel. In parallel, each sample was flow sorted into fibroblast, T cell, B cell, and macrophage suspensions for bulk population RNA-seq and plate-based single cell CEL-Seq2 RNA-seq.ResultsUpon dissociation, cryopreserved synovial tissue fragments yielded a high frequency of viable cells, comparable to samples undergoing immediate processing. Optimization of synovial tissue dissociation across six clinical collection sites with ∼30 arthroplasty and ∼20 biopsy samples yielded a consensus digestion protocol using 100µg/mL of Liberase TL™ enzyme. This protocol yielded immune and stromal cell lineages with preserved surface markers and minimized variability across replicate RNA-seq transcriptomes. Mass cytometry analysis of cells from cryopreserved synovium distinguished: 1) diverse fibroblast phenotypes, 2) distinct populations of memory B cells and antibody-secreting cells, and 3) multiple CD4+ and CD8+ T cell activation states. Bulk RNA sequencing of sorted cell populations demonstrated robust separation of synovial lymphocytes, fibroblasts, and macrophages. Single cell RNA-seq produced transcriptomes of over 1000 genes/cell, including transcripts encoding characteristic lineage markers identified.ConclusionWe have established a robust protocol to acquire viable cells from cryopreserved synovial tissue with intact transcriptomes and cell surface phenotypes. A centralized pipeline to generate multiple high-dimensional analyses of synovial tissue samples collected across a collaborative network was developed. Integrated analysis of such datasets from large patient cohorts may help define molecular heterogeneity within RA pathology and identify new therapeutic targets and biomarkers.


Author(s):  
Bowei Kang ◽  
Eroma Abeysinghe ◽  
Divyansh Agarwal ◽  
Quanli Wang ◽  
Sudhakar Pamidighantam ◽  
...  

2020 ◽  
Vol 2 (10) ◽  
pp. 607-618 ◽  
Author(s):  
Jian Hu ◽  
Xiangjie Li ◽  
Gang Hu ◽  
Yafei Lyu ◽  
Katalin Susztak ◽  
...  

2020 ◽  
Author(s):  
Kevin Z. Lin ◽  
Jing Lei ◽  
Kathryn Roeder

AbstractScientists often embed cells into a lower-dimensional space when studying single-cell RNA-seq data for improved downstream analyses such as developmental trajectory analyses, but the statistical properties of such non-linear embedding methods are often not well understood. In this article, we develop the eSVD (exponential-family SVD), a non-linear embedding method for both cells and genes jointly with respect to a random dot product model using exponential-family distributions. Our estimator uses alternating minimization, which enables us to have a computationally-efficient method, prove the identifiability conditions and consistency of our method, and provide statistically-principled procedures to tune our method. All these qualities help advance the single-cell embedding literature, and we provide extensive simulations to demonstrate that the eSVD is competitive compared to other embedding methods.We apply the eSVD via Gaussian distributions where the standard deviations are proportional to the means to analyze a single-cell dataset of oligodendrocytes in mouse brains (Marques et al., 2016). Using the eSVD estimated embedding, we then investigate the cell developmental trajectories of the oligodendrocytes. While previous results are not able to distinguish the trajectories among the mature oligodendrocyte cell types, our diagnostics and results demonstrate there are two major developmental trajectories that diverge at mature oligodendrocytes.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Bettina Mieth ◽  
James R. F. Hockley ◽  
Nico Görnitz ◽  
Marina M.-C. Vidovic ◽  
Klaus-Robert Müller ◽  
...  

AbstractIn many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.


2021 ◽  
Author(s):  
Shijie C. Zheng ◽  
Genevieve Stein-O’Brien ◽  
Jonathan J. Augustin ◽  
Jared Slosberg ◽  
Giovanni A. Carosso ◽  
...  

ABSTRACTThe cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle as both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the ubiquitous applicability of transfer learning. We show that tricycle can predict any cell’s position in the cell cycle regardless of the cell type, species of origin, and even sequencing assay. The accuracy of tricycle compares favorably to gold-standard experimental assays which generally require specialized measurements in specifically constructed in vitro systems. Unlike gold-standard assays, tricycle is easily applicable to any single-cell RNA-seq dataset. Tricycle is highly scalable, universally accurate, and eminently pertinent for atlas-level data.


Sign in / Sign up

Export Citation Format

Share Document