Solo: doublet identification via semi-supervised deep learning

Mapping Intimacies ◽

10.1101/841981 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nicholas Bernstein ◽

Nicole Fong ◽

Irene Lam ◽

Margaret Roy ◽

David G. Hendrickson ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

High Resolution ◽

Single Cell ◽

Single Cells ◽

Detection Methods ◽

Learning Approach ◽

Rna Seq ◽

Previous Approach ◽

Cell Technology

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.

SCAPTURE: a deep learning-embedded pipeline that captures polyadenylation information from 3' tag-based RNA-seq of single cells

10.1101/2021.03.17.435782 ◽

2021 ◽

Author(s):

Guo-Wei Li ◽

Fang Nan ◽

Guo-Hua Yuan ◽

Bin Tian ◽

Li Yang

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Single Cell ◽

De Novo ◽

Single Cells ◽

High Sensitivity ◽

Computational Pipeline ◽

Rna Seq ◽

Cleavage And Polyadenylation ◽

Polyadenylation Sites

Single-cell RNA-seq (scRNA-seq) profiles gene expression with a resolution that empowers depiction of cell atlas in complex systems. Here, we developed a stepwise computational pipeline SCAPTURE to identify, evaluate, and quantify cleavage and polyadenylation sites (PASs) from 3' tag-based scRNA-seq. SCAPTURE detects PASs de novo in single cells with high sensitivity and accuracy, enabling detection of previously unannotated PASs. Quantified alternative PAS transcripts refine cell identities, enriching information extracted from scRNA-seq.

Clustering single-cell RNA-seq data with a model-based deep learning approach

Nature Machine Intelligence ◽

10.1038/s42256-019-0037-0 ◽

2019 ◽

Vol 1 (4) ◽

pp. 191-198 ◽

Cited By ~ 22

Author(s):

Tian Tian ◽

Ji Wan ◽

Qi Song ◽

Zhi Wei

Keyword(s):

Deep Learning ◽

Single Cell ◽

Learning Approach ◽

Rna Seq ◽

Model Based

SHERRY2: A method for rapid and sensitive single cell RNA-seq

10.1101/2021.12.25.474161 ◽

2021 ◽

Author(s):

Lin Di ◽

Bo Liu ◽

Yuzhu Lyu ◽

Shihui Zhao ◽

Yuhong Pang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Dynamic Range ◽

Single Cells ◽

Rna Seq ◽

Wide Dynamic Range ◽

Uniform Coverage ◽

Optimized Protocol ◽

Tn5 Transposase ◽

Higher Sensitivity

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.

Bayesian inference of the gene expression states of single cells from scRNA-seq data

10.1101/2019.12.28.889956 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jérémie Breda ◽

Mihaela Zavolan ◽

Erik van Nimwegen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Downstream Processing ◽

Noise Removal ◽

Rna Seq ◽

Expression Of Genes ◽

Normalization Methods ◽

Quantify Gene Expression ◽

Selection Of

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.

Single-cell RNA-seq data reveals TNBC tumor heterogeneity through characterizing subclone compositions and proportions

10.1101/858290 ◽

2019 ◽

Author(s):

Weida Wang ◽

Jinyuan Xu ◽

Shuyuan Wang ◽

Peng Xia ◽

Li Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Single Cells ◽

Rna Seq ◽

Biological Functions ◽

Gene Markers ◽

Gene Expression Matrix ◽

Deconvolution Algorithm ◽

Expression Matrix

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.

Using Single Nucleotide Variations in Single-Cell RNA-Seq to Identify Subpopulations and Genotype-phenotype Linkage

10.1101/095810 ◽

2016 ◽

Cited By ~ 4

Author(s):

Olivier Poirion ◽

Xun Zhu ◽

Travers Ching ◽

Lana X. Garmire

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Transcript Abundance ◽

Rna Seq ◽

Linear Modeling ◽

Modeling Framework ◽

Single Nucleotide ◽

Single Nucleotide Variations

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.

Artificial Intelligence Technique for Gene Expression by Tumor RNA-Seq Data: A Novel Optimized Deep Learning Approach

IEEE Access ◽

10.1109/access.2020.2970210 ◽

2020 ◽

Vol 8 ◽

pp. 22874-22883 ◽

Cited By ~ 13

Author(s):

Nour Eldeen M. Khalifa ◽

Mohamed Hamed N. Taha ◽

Dalia Ezzat Ali ◽

Adam Slowik ◽

Aboul Ella Hassanien

Keyword(s):

Gene Expression ◽

Artificial Intelligence ◽

Deep Learning ◽

Learning Approach ◽

Rna Seq ◽

Artificial Intelligence Technique ◽

Intelligence Technique

Comparative analysis of sequencing technologies platforms for single-cell transcriptomics

10.1101/463117 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kedar Nath Natarajan ◽

Zhichao Miao ◽

Miaomiao Jiang ◽

Xiaoyun Huang ◽

Hongpo Zhou ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

K562 Cells ◽

Library Preparation ◽

Rna Seq ◽

Illumina Hiseq ◽

Technical Variability ◽

Sequencing Technologies ◽

Sequencing Platforms

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.

LTMG: A novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data

10.1101/430009 ◽

2018 ◽

Cited By ~ 1

Author(s):

Changlin Wan ◽

Wennan Chang ◽

Yu Zhang ◽

Fenil Shah ◽

Xiaoyu Lu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

R Package ◽

Data Sets ◽

Rna Seq ◽

Cell Functions ◽

Transcriptional Regulatory ◽

A Cell

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.

Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis

10.1101/2020.11.20.391896 ◽

2020 ◽

Author(s):

T. Lohoff ◽

S. Ghazanfar ◽

A. Missarova ◽

N. Koulena ◽

N. Pierson ◽

...

Keyword(s):

Gene Expression ◽

High Resolution ◽

Single Cell ◽

Cell Fate ◽

Target Genes ◽

Single Cells ◽

Spatial Context ◽

Sequencing Data ◽

Cell Fate Decisions ◽

Spatially Resolved

AbstractTranscriptional and epigenetic profiling of single-cells has advanced our knowledge of the molecular bases of gastrulation and early organogenesis. However, current approaches rely on dissociating cells from tissues, thereby losing the crucial spatial context that is necessary for understanding cell and tissue interactions during development. Here, we apply an image-based single-cell transcriptomics method, seqFISH, to simultaneously and precisely detect mRNA molecules for 387 selected target genes in 8-12 somite stage mouse embryo tissue sections. By integrating spatial context and highly multiplexed transcriptional measurements with two single-cell transcriptome atlases we accurately characterize cell types across the embryo and demonstrate how spatially-resolved expression of genes not profiled by seqFISH can be imputed. We use this high-resolution spatial map to characterize fundamental steps in the patterning of the midbrain-hindbrain boundary and the developing gut tube. Our spatial atlas uncovers axes of resolution that are not apparent from single-cell RNA sequencing data – for example, in the gut tube we observe early dorsal-ventral separation of esophageal and tracheal progenitor populations. In sum, by computationally integrating high-resolution spatially-resolved gene expression maps with single-cell genomics data, we provide a powerful new approach for studying how and when cell fate decisions are made during early mammalian development.