scholarly journals Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data

Author(s):  
Sabrina Rashid ◽  
Sohrab Shah ◽  
Ziv Bar-Joseph ◽  
Ravi Pandya
2017 ◽  
Author(s):  
Sabrina Rashid ◽  
Sohrab Shah ◽  
Ziv Bar-Joseph ◽  
Ravi Pandya

AbstractMotivationIntra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.ResultsHere we describe ‘Dhaka’, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.Availability and ImplementationAll the datasets used in the paper are publicly available and developed software package is available on Github https://github.com/MicrosoftGenomics/Dhaka.Supporting info and Software: https://github.com/MicrosoftGenomics/Dhaka


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Marilisa Montemurro ◽  
Elena Grassi ◽  
Carmelo Gabriele Pizzino ◽  
Andrea Bertotti ◽  
Elisa Ficarra ◽  
...  

Abstract Background Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. Results We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. Conclusions PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.


Author(s):  
Cong He ◽  
Luoyan Sheng ◽  
Deshen Pan ◽  
Shuai Jiang ◽  
Li Ding ◽  
...  

High-grade glioma is one of the most lethal human cancers characterized by extensive tumor heterogeneity. In order to identify cellular and molecular mechanisms that drive tumor heterogeneity of this lethal disease, we performed single-cell RNA sequencing analysis of one high-grade glioma. Accordingly, we analyzed the individual cellular components in the ecosystem of this tumor. We found that tumor-associated macrophages are predominant in the immune microenvironment. Furthermore, we identified five distinct subpopulations of tumor cells, including one cycling, two OPC/NPC-like and two MES-like cell subpopulations. Moreover, we revealed the evolutionary transition from the cycling to OPC/NPC-like and MES-like cells by trajectory analysis. Importantly, we found that SPP1/CD44 interaction plays a critical role in macrophage-mediated activation of MES-like cells by exploring the cell-cell communication among all cellular components in the tumor ecosystem. Finally, we showed that high expression levels of both SPP1 and CD44 correlate with an increased infiltration of macrophages and poor prognosis of glioma patients. Taken together, this study provided a single-cell atlas of one high-grade glioma and revealed a critical role of macrophage-mediated SPP1/CD44 signaling in glioma progression, indicating that the SPP1/CD44 axis is a potential target for glioma treatment.


2019 ◽  
Author(s):  
Emily F. Davis-Marcisak ◽  
Pranay Orugunta ◽  
Genevieve Stein-O'Brien ◽  
Sidharth V. Puram ◽  
Evanthia Roussos Torres ◽  
...  

2021 ◽  
Author(s):  
Luke Ternes ◽  
Mark Dane ◽  
Marilyne Labrie ◽  
Gordon Mills ◽  
Joe Gray ◽  
...  

AbstractImage-based cell phenotyping relies on quantitative measurements as encoded representations of cells; however, defining suitable representations that capture complex imaging features is challenging since there are many obstacles, including segmentation and identifying subcellular compartments for feature extraction. Variational autoencoder (VAE) approaches produce encouraging results by mapping from an image to a representative descriptor, and outperform classical hand-crafted features for morphology, intensity, and texture at differentiating data. Although VAEs show promising results for capturing morphological and organizational features in tissue, single cell image analyses based on VAEs often fail to identify biologically informative features due to the intrinsic amount of uninformative variability. Herein, we propose a multi-encoder VAE (ME-VAE) in single cell image analysis using transformed images as a self-supervised signal to extract transform-invariant biologically meaningful features. We show that the proposed architecture improves analysis by making distinct populations more separable compared to traditional VAEs and intensity measurements by enhancing phenotypic differences between cells and by improving correlations to other modalities.


2020 ◽  
Vol 7 ◽  
Author(s):  
Xiao Chen ◽  
Chundi Wang ◽  
Bo Pan ◽  
Borong Lu ◽  
Chao Li ◽  
...  

Peritrichs are one of the largest groups of ciliates with over 1,000 species described so far. However, their genomic features are largely unknown. By single-cell genomic sequencing, we acquired the genomic data of three sessilid peritrichs (Cothurnia ceramicola, Vaginicola sp., and Zoothamnium sp. 2). Using genomic data from another 53 ciliates including 14 peritrichs, we reconstructed their evolutionary relationships and confirmed genome skimming as an efficient approach for expanding sampling. In addition, we profiled the stop codon usage and programmed ribosomal frameshifting (PRF) events in peritrichs for the first time. Our analysis reveals no evidence of stop codon reassignment for peritrichs, but they have prevalent +1 or -1 PRF events. These genomic features are distinguishable from other ciliates, and our observations suggest a unique evolutionary strategy for peritrichs.


2020 ◽  
Vol 48 (11) ◽  
pp. e62-e62 ◽  
Author(s):  
Qi Song ◽  
Jiyoung Lee ◽  
Shamima Akter ◽  
Matthew Rogers ◽  
Ruth Grene ◽  
...  

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.


Sign in / Sign up

Export Citation Format

Share Document