scholarly journals Expression Based Species Deconvolution and Realignment Removes Misalignent Error in Multi-species Single Cell Data

2021 ◽  
Author(s):  
Jaeyong Choi ◽  
Woochan Lee ◽  
Jung-Ki Yoon ◽  
Jong-Il Kim

Background: Although single cell RNAseq of xenograft samples are widely used, there is no comprehensive pipeline for human and mouse mixed single cell analysis. Method: We used public data to assess misalignment error when using human and mouse combined reference, and generated a pipeline based on expression-based species deconvolution with species matching reference realignment to remove errors. We also found false-positive signals presumed to originate from ambient RNA of the other species, and use computational method to adequately remove them. Result: Misaligned reads account to on average 0.5% of total reads but expression of few genees were greatly affected leading to 99.8% loss in expression. Human and mouse mixed single cell data analyzed by our pipeline clustered well with unmixed data. We also applied our pipeline to multi-species multi-sample single cell library containing breast cancer xenograft tissue and successfully identified all identities along with the diverse cell types of tumor microenvironment. Conclusion: We present our pipeline for mixed human and mose single cell data which can also be applied to pooled libraries to obtain cost effective single cell data. We also address consideration points when analyzing mixed single cell data for future development.

2020 ◽  
Author(s):  
Cody N. Heiser ◽  
Victoria M. Wang ◽  
Bob Chen ◽  
Jacob J. Hughey ◽  
Ken S. Lau

AbstractA major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in datasets with disparate library sizes confounded by high technical noise (i.e. batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining dataset-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against a conventional thresholding approach and EmptyDrops, a popular computational method, demonstrating greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low and high-background datasets that dropkick’s weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to dataset-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell analysis Python packages.


2021 ◽  
Author(s):  
Hope M. Healey ◽  
Susan Bassham ◽  
William A. Cresko

ABSTRACTSingle cell RNA sequencing (scRNAseq) is a powerful technique that continues to expand across various biological applications. However, incomplete 3′ UTR annotations in less developed or non-model systems can impede single cell analysis resulting in genes that are partially or completely uncounted. Performing scRNAseq with incomplete 3′ UTR annotations can impede the identification of cell identities and gene expression patterns and lead to erroneous biological inferences. We demonstrate that performing single cell isoform sequencing (ScISOr-Seq) in tandem with scRNAseq can rapidly improve 3′ UTR annotations. Using threespine stickleback fish (Gasterosteus aculeatus), we show that gene models resulting from a minimal embryonic ScISOr-Seq dataset retained 26.1% greater scRNAseq reads than gene models from Ensembl alone. Furthermore, pooling our ScISOr-Seq isoforms with a previously published adult bulk Iso-Seq dataset from stickleback, and merging the annotation with the Ensembl gene models, resulted in a marginal improvement (+0.8%) over the ScISOr-Seq only dataset. In addition, isoforms identified by ScISOr-Seq included thousands of new splicing variants. The improved gene models obtained using ScISOr-Seq lead to successful identification of cell types and increased the reads identified of many genes in our scRNAseq stickleback dataset. Our work illuminates ScISOr-Seq as a cost-effective and efficient mechanism to rapidly annotate genomes for scRNAseq.


2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Michael A. Skinnider ◽  
Matthieu Gautier ◽  
Leonard J. Foster ◽  
Grégoire Courtine
Keyword(s):  

2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yuanyuan Li ◽  
Ping Luo ◽  
Yi Lu ◽  
Fang-Xiang Wu

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A520-A520
Author(s):  
Son Pham ◽  
Tri Le ◽  
Tan Phan ◽  
Minh Pham ◽  
Huy Nguyen ◽  
...  

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A


2019 ◽  
Vol 2 (1) ◽  
pp. 97-109 ◽  
Author(s):  
Jinchu Vijay ◽  
Marie-Frédérique Gauthier ◽  
Rebecca L. Biswell ◽  
Daniel A. Louiselle ◽  
Jeffrey J. Johnston ◽  
...  

2021 ◽  
Author(s):  
Wei Vivian Li ◽  
Dinghai Zheng ◽  
Ruijia Wang ◽  
Bin Tian

Most eukaryotic genes harbor multiple cleavage and polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms. APA regulation has been implicated in a diverse array of physiological and pathological conditions. While RNA sequencing tools that generate reads containing the PAS, named onSite reads, have been instrumental in identifying PASs, they have not been widely used. By contrast, a growing number of methods generate reads that are close to the PAS, named nearSite reads, including the 3' end counting strategy commonly used in single cell analysis. How these nearSite reads can be used for APA analysis, however, is poorly studied. Here, we present a computational method, named model-based analysis of alternative polyadenylation using 3' end-linked reads (MAAPER), to examine APA using nearSite reads. MAAPER uses a probabilistic model to predict PASs for nearSite reads with high accuracy and sensitivity, and examines different types of APA events, including those in 3'UTRs and introns, with robust statistics. We show MAAPER's accuracy with data from both bulk and single cell RNA samples and its applicability in unpaired or paired experimental designs. Our result also highlights the importance of using well annotated PASs for nearSite read analysis.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Alexander J Tarashansky ◽  
Jacob M Musser ◽  
Margarita Khariton ◽  
Pengyang Li ◽  
Detlev Arendt ◽  
...  

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Hongyu Zhao ◽  
Yu Teng ◽  
Wende Hao ◽  
Jie Li ◽  
Zhefeng Li ◽  
...  

Abstract Background Ovarian cancer was one of the leading causes of female deaths. Patients with OC were essentially incurable and portends a poor prognosis, presumably because of profound genetic heterogeneity limiting reproducible prognostic classifications. Methods We comprehensively analyzed an ovarian cancer single-cell RNA sequencing dataset, GSE118828, and identified nine major cell types. Relationship between the clusters was explored with CellPhoneDB. A malignant epithelial cluster was confirmed using pseudotime analysis, CNV and GSVA. Furthermore, we constructed the prediction model (i.e., RiskScore) consisted of 10 prognosis-specific genes from 2397 malignant epithelial genes using the LASSO Cox regression algorithm based on public datasets. Then, the prognostic value of Riskscore was assessed with Kaplan–Meier survival analysis and time-dependent ROC curves. At last, a series of in-vitro assays were conducted to explore the roles of IL4I1, an important gene in Riskscore, in OC progression. Results We found that macrophages possessed the most interaction pairs with other clusters, and M2-like TAMs were the dominant type of macrophages. C0 was identified as the malignant epithelial cluster. Patients with a lower RiskScore had a greater OS (log-rank P < 0.01). In training set, the AUC of RiskScore was 0.666, 0.743 and 0.809 in 1-year, 3-year and 5-year survival, respectively. This was also validated in another two cohorts. Moreover, downregulation of IL4I1 inhibited OC cells proliferation, migration and invasion. Conclusions Our work provide novel insights into our understanding of the heterogeneity among OCs, and would help elucidate the biology of OC and provide clinical guidance in prognosis for OC patients.


2020 ◽  
Author(s):  
N. Kakava-Georgiadou ◽  
J.F. Severens ◽  
A.M. Jørgensen ◽  
K.M. Garner ◽  
M.C.M Luijendijk ◽  
...  

AbstractHypothalamic nuclei which regulate homeostatic functions express leptin receptor (LepR), the primary target of the satiety hormone leptin. Single-cell RNA sequencing (scRNA-seq) has facilitated the discovery of a variety of hypothalamic cell types. However, low abundance of LepR transcripts prevented further characterization of LepR cells. Therefore, we perform scRNA-seq on isolated LepR cells and identify eight neuronal clusters, including three uncharacterized Trh-expressing populations as well as 17 non-neuronal populations including tanycytes, oligodendrocytes and endothelial cells. Food restriction had a major impact on Agrp neurons and changed the expression of obesity-associated genes. Multiple cell clusters were enriched for GWAS signals of obesity. We further explored changes in the gene regulatory landscape of LepR cell types. We thus reveal the molecular signature of distinct populations with diverse neurochemical profiles, which will aid efforts to illuminate the multi-functional nature of leptin’s action in the hypothalamus.


Sign in / Sign up

Export Citation Format

Share Document