scholarly journals Deep Learning model to Automate the process of mapping Cancer Cells to Cell Lines & Cancer Types from Single Cell RNA-Seq Data

Author(s):  
Vatsal Patel

Single Cell RNA Sequencing has given us a broad domain to study heterogeneity & expression profiles of cells. Downstream analysis of such data has led us to important observation and classification of cell types. However, these approaches demand great exertion and effort added that it seems the only way to proceed ahead for the first time. Results of such verified analysis have led us to create labels from our dataset. We can use the same labeled data as an input to a neural network and this way we would be able to automate the tedious & time-consuming process of downstream analysis. In this paper, we have automated the process of mapping cancer cells to cancer cell lines & cancer types. For the same, we have used pan-cancer single cell sequencing data of 53513 cells from 198 cell lines reflecting 22 cancer types.

2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A520-A520
Author(s):  
Son Pham ◽  
Tri Le ◽  
Tan Phan ◽  
Minh Pham ◽  
Huy Nguyen ◽  
...  

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Jiani Wu ◽  
Dongqiang Zeng ◽  
Shimeng Zhi ◽  
Zilan Ye ◽  
Wenjun Qiu ◽  
...  

Abstract Background Tumor-derived exosomes (TEXs) are involved in tumor progression and the immune modulation process and mediate intercellular communication in the tumor microenvironment. Although exosomes are considered promising liquid biomarkers for disease diagnosis, it is difficult to discriminate TEXs and to develop TEX-based predictive biomarkers. Methods In this study, the gene expression profiles and clinical information were collected from The Cancer Genome Atlas (TCGA) database, IMvigor210 cohorts, and six independent Gene Expression Omnibus datasets. A TEXs-associated signature named TEXscore was established to predict overall survival in multiple cancer types and in patients undergoing immune checkpoint blockade therapies. Results Based on exosome-associated genes, we first constructed a tumor-derived exosome signature named TEXscore using a principal component analysis algorithm. In single-cell RNA-sequencing data analysis, ascending TEXscore was associated with disease progression and poor clinical outcomes. In the TCGA Pan-Cancer cohort, TEXscore was elevated in tumor samples rather than in normal tissues, thereby serving as a reliable biomarker to distinguish cancer from non-cancer sources. Moreover, high TEXscore was associated with shorter overall survival across 12 cancer types. TEXscore showed great potential in predicting immunotherapy response in melanoma, urothelial cancer, and renal cancer. The immunosuppressive microenvironment characterized by macrophages, cancer-associated fibroblasts, and myeloid-derived suppressor cells was associated with high TEXscore in the TCGA and immunotherapy cohorts. Besides, TEXscore-associated miRNAs and gene mutations were also identified. Further experimental research will facilitate the extending of TEXscore in tumor-associated exosomes. Conclusions TEXscore capturing tumor-derived exosome features might be a robust biomarker for prognosis and treatment responses in independent cohorts.


2018 ◽  
Author(s):  
Matthew D Young ◽  
Sam Behjati

AbstractBackgroundDroplet based single-cell RNA sequence analyses assume all acquired RNAs are endogenous to cells. However, any cell free RNAs contained within the input solution are also captured by these assays. This sequencing of cell free RNA constitutes a background contamination that confounds the biological interpretation of single-cell transcriptomic data.ResultsWe demonstrate that contamination from this ‘soup’ of cell free RNAs is ubiquitous, with experiment-specific variations in composition and magnitude. We present a method, SoupX, for quantifying the extent of the contamination and estimating ‘background corrected’ cell expression profiles that seamlessly integrate with existing downstream analysis tools. Applying this method to several datasets using multiple droplet sequencing technologies, we demonstrate that its application improves biological interpretation of otherwise misleading data, as well as improving quality control metrics.ConclusionsWe present ‘SoupX’, a tool for removing ambient RNA contamination from droplet based single cell RNA sequencing experiments. This tool has broad applicability and its application can improve the biological utility of existing and future data sets.Key PointsThe signal from droplet based single cell RNA sequencing is ubiquitously contaminated by capture of ambient mRNA.SoupX is a method to quantify the abundance of these ambient mRNAs and remove them.Correcting for ambient mRNA contamination improves biological interpretation.


2021 ◽  
Author(s):  
Ayshwarya Subramanian ◽  
Mikhail Alperovich ◽  
Bo Li ◽  
Yiming Yang

Quality control (QC) of cells, a critical step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds on QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation in the commonly used QC criteria. We demonstrate that the QC metrics vary both at the tissue and cell state level across technologies, study conditions, and species. We propose data-driven QC (ddqc), an unsupervised adaptive quality control framework that performs flexible and data-driven quality control at the level of cell states while retaining critical biological insights and improved power for downstream analysis. On applying ddqc to 6,228,212 cells and 835 mouse and human samples, we retain a median of 39.7% more cells when compared to conventional data-agnostic QC filters. With ddqc, we recover biologically meaningful trends in gene complexity and ribosomal expression among cell-types enabling exploration of cell states with minimal transcriptional diversity or maximum ribosomal protein expression. Moreover, ddqc allows us to retain cell-types often lost by conventional QC such as metabolically active parenchymal cells, and specialized cells such as neutrophils or gastric chief cells. Taken together, our work proposes a revised paradigm to quality filtering best practices - iterative QC, providing a data-driven quality control framework compatible with observed biological diversity.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Matthew D Young ◽  
Sam Behjati

Abstract Background Droplet-based single-cell RNA sequence analyses assume that all acquired RNAs are endogenous to cells. However, any cell-free RNAs contained within the input solution are also captured by these assays. This sequencing of cell-free RNA constitutes a background contamination that confounds the biological interpretation of single-cell transcriptomic data. Results We demonstrate that contamination from this "soup" of cell-free RNAs is ubiquitous, with experiment-specific variations in composition and magnitude. We present a method, SoupX, for quantifying the extent of the contamination and estimating "background-corrected" cell expression profiles that seamlessly integrate with existing downstream analysis tools. Applying this method to several datasets using multiple droplet sequencing technologies, we demonstrate that its application improves biological interpretation of otherwise misleading data, as well as improving quality control metrics. Conclusions We present SoupX, a tool for removing ambient RNA contamination from droplet-based single-cell RNA sequencing experiments. This tool has broad applicability, and its application can improve the biological utility of existing and future datasets.


2021 ◽  
Vol 11 ◽  
Author(s):  
Xiaoteng Cui ◽  
Qixue Wang ◽  
Junhu Zhou ◽  
Yunfei Wang ◽  
Can Xu ◽  
...  

BackgroundThe main immune cells in GBM are tumor-associated macrophages (TAMs). Thus far, the studies investigating the activation status of TAM in GBM are mainly limited to bulk RNA analyses of individual tumor biopsies. The activation states and transcriptional signatures of TAMs in GBM remain poorly characterized.MethodsWe comprehensively analyzed single-cell RNA-sequencing data, covering a total of 16,201 cells, to clarify the relative proportions of the immune cells infiltrating GBMs. The origin and TAM states in GBM were characterized using the expression profiles of differential marker genes. The vital transcription factors were examined by SCENIC analysis. By comparing the variable gene expression patterns in different clusters and cell types, we identified components and characteristics of TAMs unique to each GBM subtype. Meanwhile, we interrogated the correlation between SPI1 expression and macrophage infiltration in the TCGA-GBM dataset.ResultsThe expression patterns of TMEM119 and MHC-II can be utilized to distinguish the origin and activation states of TAMs. In TCGA-Mixed tumors, almost all TAMs were bone marrow-derived macrophages. The TAMs in TCGA-proneural tumors were characterized by primed microglia. A different composition was observed in TCGA-classical tumors, which were infiltrated by repressed microglia. Our results further identified SPI1 as a crucial regulon and potential immunotherapeutic target important for TAM maturation and polarization in GBM.ConclusionsWe describe the immune landscape of human GBM at a single-cell level and define a novel categorization scheme for TAMs in GBM. The immunotherapy against SPI1 would reprogram the immune environment of GBM and enhance the treatment effect of conventional chemotherapy drugs.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 377 ◽  
Author(s):  
Maryam Zand ◽  
Jianhua Ruan

Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of expression profiles hinders further characterizations at cell/gene-level such as cell type identification and downstream analysis. To alleviate this dropout issue we introduce a network-based method, netImpute, by leveraging the hidden information in gene co-expression networks to recover real signals. netImpute employs Random Walk with Restart (RWR) to adjust the gene expression level in a given cell by borrowing information from its neighbors in a gene co-expression network. Performance evaluation and comparison with existing tools on simulated data and seven real datasets show that netImpute substantially enhances clustering accuracy and data visualization clarity, thanks to its effective treatment of dropouts. While the idea of netImpute is general and can be applied with other types of networks such as cell co-expression network or protein–protein interaction (PPI) network, evaluation results show that gene co-expression network is consistently more beneficial, presumably because PPI network usually lacks cell type context, while cell co-expression network can cause information loss for rare cell types. Evaluation results on several biological datasets show that netImpute can more effectively recover missing transcripts in scRNA-seq data and enhance the identification and visualization of heterogeneous cell types than existing methods.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yin Zhang ◽  
Fei Wang

Abstract Background With the continuous maturity of sequencing technology, different laboratories or different sequencing platforms have generated a large amount of single-cell transcriptome sequencing data for the same or different tissues. Due to batch effects and high dimensions of scRNA data, downstream analysis often faces challenges. Although a number of algorithms and tools have been proposed for removing batch effects, the current mainstream algorithms have faced the problem of data overcorrection when the cell type composition varies greatly between batches. Results In this paper, we propose a novel method named SSBER by utilizing biological prior knowledge to guide the correction, aiming to solve the problem of poor batch-effect correction when the cell type composition differs greatly between batches. Conclusions SSBER effectively solves the above problems and outperforms other algorithms when the cell type structure among batches or distribution of cell population varies considerably, or some similar cell types exist across batches.


2021 ◽  
Author(s):  
David Peeney ◽  
Yu Fan ◽  
Sadeechya Gurung ◽  
Carolyn Lazaroff ◽  
Shashikala Ratnayake ◽  
...  

Tissue inhibitor of metalloproteinases (TIMPs/Timps) are an endogenous family of widely expressed matrisome-associated proteins that were initially identified as inhibitors of matrix metalloproteinase activity (Metzincin family proteases). Consequently, TIMPs are often considered simply as protease inhibitors by many investigators. However, an evolving list of new metalloproteinase-independent functions for TIMP family members suggests that this concept is outdated. These novel TIMP functions include direct agonism/antagonism of multiple transmembrane receptors, as well as interactions with matrisome targets. While the family was fully identified over 2 decades ago, there has yet to be an in-depth study describing the expression of TIMPs in normal tissues of adult mammals. An understanding of the tissues and cell-types that express TIMPs 1 through 4, in both normal and disease states are important to contextualize the growing functional capabilities of TIMP proteins, which are often dismissed as non-canonical. Using publicly available single cell RNA sequencing data from the Tabula Muris Consortium, we analyzed approximately 100,000 murine cells across nineteen tissues from non-diseased organs, representing seventy-three annotated cell types, to define the diversity in Timp gene expression across healthy tissues. We describe that Timp genes display unique expression profiles across tissues and organ-specific cell types. Within annotated cell-types, we identify clear and discrete cluster-specific patterns of Timp expression, particularly in cells of stromal and endothelial origins. Differential expression and gene set pathway analysis provide evidence of the biological significance of Timp expression in these identified cell sub-types, which are consistent with novel roles in normal tissue homeostasis and changing roles in disease progression. This understanding of the tissues, specific cell types and conditions of the microenvironment in which Timp genes are expressed adds important physiological context to the growing array of novel TIMP protein functions.


2021 ◽  
Author(s):  
Min Dai ◽  
Xiaobing Pei ◽  
Xiu-Jie Wang

Accurate cell classification is the groundwork for downstream analysis of single-cell sequencing data, yet how to identify marker genes to distinguish different cell types still remains as a big challenge. We developed COSG as a cosine similarity-based method for more accurate and scalable marker gene identification. COSG is applicable to single-cell RNA sequencing data, single-cell ATAC sequencing data and spatially resolved transcriptome data. COSG is fast and scalable for ultra-large datasets of million-scale cells. Application on both simulated and real experimental datasets demonstrates the superior performance of COSG in terms of both accuracy and efficiency as compared with other available methods. Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.


Sign in / Sign up

Export Citation Format

Share Document