Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data

AbstractMotivationIntra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.ResultsHere we describe ‘Dhaka’, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.Availability and ImplementationAll the datasets used in the paper are publicly available and developed software package is available on Github https://github.com/MicrosoftGenomics/Dhaka.Supporting info and Software: https://github.com/MicrosoftGenomics/Dhaka

Download Full-text

PhyliCS: a Python library to explore scCNA data and quantify spatial tumor heterogeneity

BMC Bioinformatics ◽

10.1186/s12859-021-04277-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Marilisa Montemurro ◽

Elena Grassi ◽

Carmelo Gabriele Pizzino ◽

Andrea Bertotti ◽

Elisa Ficarra ◽

...

Keyword(s):

Spatial Heterogeneity ◽

Single Cell ◽

Tumor Heterogeneity ◽

Third Party ◽

Sequencing Technology ◽

Reduction Methods ◽

Visualization Tools ◽

Cell Subpopulations ◽

Multiple Samples ◽

Analytical Approaches

Abstract Background Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. Results We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. Conclusions PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.

Download Full-text

Tumor Heterogeneity Regarding Radiosensitivity, Recurrence Risk, and Immune-Checkpoint in Breast Cancer: Transcriptome Analysis of Single-cell RNA Sequencing Data

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2019.06.1062 ◽

2019 ◽

Vol 105 (1) ◽

pp. E664

Author(s):

B.S. Jang ◽

W. Han ◽

I.A. Kim

Keyword(s):

Breast Cancer ◽

Single Cell ◽

Rna Sequencing ◽

Transcriptome Analysis ◽

Immune Checkpoint ◽

Tumor Heterogeneity ◽

Recurrence Risk ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Cancer Transcriptome

Download Full-text

Single-Cell Transcriptomic Analysis Revealed a Critical Role of SPP1/CD44-Mediated Crosstalk Between Macrophages and Cancer Cells in Glioma

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.779319 ◽

2021 ◽

Vol 9 ◽

Author(s):

Cong He ◽

Luoyan Sheng ◽

Deshen Pan ◽

Shuai Jiang ◽

Li Ding ◽

...

Keyword(s):

Single Cell ◽

Tumor Heterogeneity ◽

Molecular Mechanisms ◽

Critical Role ◽

Cell Communication ◽

High Grade Glioma ◽

Sequencing Analysis ◽

High Grade ◽

Cellular Components

High-grade glioma is one of the most lethal human cancers characterized by extensive tumor heterogeneity. In order to identify cellular and molecular mechanisms that drive tumor heterogeneity of this lethal disease, we performed single-cell RNA sequencing analysis of one high-grade glioma. Accordingly, we analyzed the individual cellular components in the ecosystem of this tumor. We found that tumor-associated macrophages are predominant in the immune microenvironment. Furthermore, we identified five distinct subpopulations of tumor cells, including one cycling, two OPC/NPC-like and two MES-like cell subpopulations. Moreover, we revealed the evolutionary transition from the cycling to OPC/NPC-like and MES-like cells by trajectory analysis. Importantly, we found that SPP1/CD44 interaction plays a critical role in macrophage-mediated activation of MES-like cells by exploring the cell-cell communication among all cellular components in the tumor ecosystem. Finally, we showed that high expression levels of both SPP1 and CD44 correlate with an increased infiltration of macrophages and poor prognosis of glioma patients. Taken together, this study provided a single-cell atlas of one high-grade glioma and revealed a critical role of macrophage-mediated SPP1/CD44 signaling in glioma progression, indicating that the SPP1/CD44 axis is a potential target for glioma treatment.

Download Full-text

Abstract 4697: Expression variation analysis for tumor heterogeneity in single-cell RNA-sequencing data

10.1158/1538-7445.am2019-4697 ◽

2019 ◽

Author(s):

Emily F. Davis-Marcisak ◽

Pranay Orugunta ◽

Genevieve Stein-O'Brien ◽

Sidharth V. Puram ◽

Evanthia Roussos Torres ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Tumor Heterogeneity ◽

Variation Analysis ◽

Sequencing Data ◽

Expression Variation ◽

Single Cell Rna Sequencing

Download Full-text

ME-VAE: Multi-Encoder Variational AutoEncoder for Controlling Multiple Transformational Features in Single Cell Image Analysis

10.1101/2021.04.22.441005 ◽

2021 ◽

Author(s):

Luke Ternes ◽

Mark Dane ◽

Marilyne Labrie ◽

Gordon Mills ◽

Joe Gray ◽

...

Keyword(s):

Image Analysis ◽

Single Cell ◽

Imaging Features ◽

Phenotypic Differences ◽

Cell Image ◽

Intensity Measurements ◽

Quantitative Measurements ◽

Variational Autoencoder ◽

Cell Image Analysis ◽

Organizational Features

AbstractImage-based cell phenotyping relies on quantitative measurements as encoded representations of cells; however, defining suitable representations that capture complex imaging features is challenging since there are many obstacles, including segmentation and identifying subcellular compartments for feature extraction. Variational autoencoder (VAE) approaches produce encouraging results by mapping from an image to a representative descriptor, and outperform classical hand-crafted features for morphology, intensity, and texture at differentiating data. Although VAEs show promising results for capturing morphological and organizational features in tissue, single cell image analyses based on VAEs often fail to identify biologically informative features due to the intrinsic amount of uninformative variability. Herein, we propose a multi-encoder VAE (ME-VAE) in single cell image analysis using transformed images as a self-supervised signal to extract transform-invariant biologically meaningful features. We show that the proposed architecture improves analysis by making distinct populations more separable compared to traditional VAEs and intensity measurements by enhancing phenotypic differences between cells and by improving correlations to other modalities.

Download Full-text

Single-Cell Transcriptomic Analysis of Tumor Heterogeneity and Intercellular Networks in Human Urothelial Carcinoma

SSRN Electronic Journal ◽

10.2139/ssrn.3978564 ◽

2021 ◽

Author(s):

Xingwei Jin ◽

Guoliang Lu ◽

Fangxiu Luo ◽

Junwei Pan ◽

Tingwei Lu ◽

...

Keyword(s):

Urothelial Carcinoma ◽

Single Cell ◽

Tumor Heterogeneity ◽

Transcriptomic Analysis

Download Full-text

Single-Cell Genomic Sequencing of Three Peritrichs (Protista, Ciliophora) Reveals Less Biased Stop Codon Usage and More Prevalent Programmed Ribosomal Frameshifting Than in Other Ciliates

Frontiers in Marine Science ◽

10.3389/fmars.2020.602323 ◽

2020 ◽

Vol 7 ◽

Author(s):

Xiao Chen ◽

Chundi Wang ◽

Bo Pan ◽

Borong Lu ◽

Chao Li ◽

...

Keyword(s):

Codon Usage ◽

Single Cell ◽

Stop Codon ◽

Genomic Data ◽

Evolutionary Strategy ◽

Genomic Sequencing ◽

Ribosomal Frameshifting ◽

Genomic Features ◽

First Time ◽

Stop Codon Reassignment

Peritrichs are one of the largest groups of ciliates with over 1,000 species described so far. However, their genomic features are largely unknown. By single-cell genomic sequencing, we acquired the genomic data of three sessilid peritrichs (Cothurnia ceramicola, Vaginicola sp., and Zoothamnium sp. 2). Using genomic data from another 53 ciliates including 14 peritrichs, we reconstructed their evolutionary relationships and confirmed genome skimming as an efficient approach for expanding sampling. In addition, we profiled the stop codon usage and programmed ribosomal frameshifting (PRF) events in peritrichs for the first time. Our analysis reveals no evidence of stop codon reassignment for peritrichs, but they have prevalent +1 or -1 PRF events. These genomic features are distinguishable from other ciliates, and our observations suggest a unique evolutionary strategy for peritrichs.

Download Full-text

Prediction of condition-specific regulatory genes using machine learning

Nucleic Acids Research ◽

10.1093/nar/gkaa264 ◽

2020 ◽

Vol 48 (11) ◽

pp. e62-e62 ◽

Cited By ~ 2

Author(s):

Qi Song ◽

Jiyoung Lee ◽

Shamima Akter ◽

Matthew Rogers ◽

Ruth Grene ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Single Cell ◽

Control Cell ◽

Genomic Data ◽

Regulatory Genes ◽

Genomic Research ◽

Open Chromatin ◽

Data Set ◽

Better Than

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.

Download Full-text