scholarly journals Solo: doublet identification via semi-supervised deep learning

2019 ◽  
Author(s):  
Nicholas Bernstein ◽  
Nicole Fong ◽  
Irene Lam ◽  
Margaret Roy ◽  
David G. Hendrickson ◽  
...  

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.

2021 ◽  
Author(s):  
Guo-Wei Li ◽  
Fang Nan ◽  
Guo-Hua Yuan ◽  
Bin Tian ◽  
Li Yang

Single-cell RNA-seq (scRNA-seq) profiles gene expression with a resolution that empowers depiction of cell atlas in complex systems. Here, we developed a stepwise computational pipeline SCAPTURE to identify, evaluate, and quantify cleavage and polyadenylation sites (PASs) from 3' tag-based scRNA-seq. SCAPTURE detects PASs de novo in single cells with high sensitivity and accuracy, enabling detection of previously unannotated PASs. Quantified alternative PAS transcripts refine cell identities, enriching information extracted from scRNA-seq.


2019 ◽  
Vol 1 (4) ◽  
pp. 191-198 ◽  
Author(s):  
Tian Tian ◽  
Ji Wan ◽  
Qi Song ◽  
Zhi Wei

2021 ◽  
Author(s):  
Lin Di ◽  
Bo Liu ◽  
Yuzhu Lyu ◽  
Shihui Zhao ◽  
Yuhong Pang ◽  
...  

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.


Author(s):  
Jérémie Breda ◽  
Mihaela Zavolan ◽  
Erik van Nimwegen

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.


2019 ◽  
Author(s):  
Weida Wang ◽  
Jinyuan Xu ◽  
Shuyuan Wang ◽  
Peng Xia ◽  
Li Zhang ◽  
...  

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.


2016 ◽  
Author(s):  
Olivier Poirion ◽  
Xun Zhu ◽  
Travers Ching ◽  
Lana X. Garmire

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 22874-22883 ◽  
Author(s):  
Nour Eldeen M. Khalifa ◽  
Mohamed Hamed N. Taha ◽  
Dalia Ezzat Ali ◽  
Adam Slowik ◽  
Aboul Ella Hassanien

2018 ◽  
Author(s):  
Kedar Nath Natarajan ◽  
Zhichao Miao ◽  
Miaomiao Jiang ◽  
Xiaoyun Huang ◽  
Hongpo Zhou ◽  
...  

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2020 ◽  
Author(s):  
T. Lohoff ◽  
S. Ghazanfar ◽  
A. Missarova ◽  
N. Koulena ◽  
N. Pierson ◽  
...  

AbstractTranscriptional and epigenetic profiling of single-cells has advanced our knowledge of the molecular bases of gastrulation and early organogenesis. However, current approaches rely on dissociating cells from tissues, thereby losing the crucial spatial context that is necessary for understanding cell and tissue interactions during development. Here, we apply an image-based single-cell transcriptomics method, seqFISH, to simultaneously and precisely detect mRNA molecules for 387 selected target genes in 8-12 somite stage mouse embryo tissue sections. By integrating spatial context and highly multiplexed transcriptional measurements with two single-cell transcriptome atlases we accurately characterize cell types across the embryo and demonstrate how spatially-resolved expression of genes not profiled by seqFISH can be imputed. We use this high-resolution spatial map to characterize fundamental steps in the patterning of the midbrain-hindbrain boundary and the developing gut tube. Our spatial atlas uncovers axes of resolution that are not apparent from single-cell RNA sequencing data – for example, in the gut tube we observe early dorsal-ventral separation of esophageal and tracheal progenitor populations. In sum, by computationally integrating high-resolution spatially-resolved gene expression maps with single-cell genomics data, we provide a powerful new approach for studying how and when cell fate decisions are made during early mammalian development.


Sign in / Sign up

Export Citation Format

Share Document