scholarly journals Deep feature extraction of single-cell transcriptomes by generative adversarial network

Author(s):  
Mojtaba Bahrami ◽  
Malosree Maitra ◽  
Corina Nagy ◽  
Gustavo Turecki ◽  
Hamid R Rabiee ◽  
...  

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers the opportunity to dissect heterogeneous cellular compositions and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition designs. Results Here, we present a single-cell Generative Adversarial Network (scGAN) to simultaneously acquire patterns from raw data while minimizing the confounding effect driven by technical artifacts or other factors inherent to the data. Specifically, scGAN models the data likelihood of the raw scRNA-seq counts by projecting each cell onto a latent embedding. Meanwhile, scGAN attempts to minimize the correlation between the latent embeddings and the batch labels across all cells. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder. Availabilityand implementation The scGAN code and the information for the public scRNA-seq datasets are available at https://github.com/li-lab-mcgill/singlecell-deepfeature. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Mojtaba Bahrami ◽  
Malosree Maitra ◽  
Corina Nagy ◽  
Gustavo Turecki ◽  
Hamid R. Rabiee ◽  
...  

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) has opened the opportunities to dissect the heterogeneous cellular composition and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition design.ResultsWe present single-cell Generative Adversarial Network (scGAN). Our main contribution is to introduce an adversarial network to predict batch effects using the embeddings from the variational autoencoder network, which does not only need to maximize the Negative Binomial data likelihood of the raw scRNA-seq counts but also minimize the correlation between the latent embeddings and the batch effects. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder.AvailabilityThe code is available at https://github.com/li-lab-mcgill/[email protected]


2020 ◽  
Author(s):  
Michael C. Dimmick ◽  
Leo J. Lee ◽  
Brendan J. Frey

AbstractMotivationHi-C data has enabled the genome-wide study of chromatin folding and architecture, and has led to important discoveries in the structure and function of chromatin conformation. Here, high resolution data plays a particularly important role as many chromatin substructures such as Topologically Associating Domains (TADs) and chromatin loops cannot be adequately studied with low resolution contact maps. However, the high sequencing costs associated with the generation of high resolution Hi-C data has become an experimental barrier. Data driven machine learning models, which allow low resolution Hi-C data to be computationally enhanced, offer a promising avenue to address this challenge.ResultsBy carefully examining the properties of Hi-C maps and integrating various recent advances in deep learning, we developed a Hi-C Super-Resolution (HiCSR) framework capable of accurately recovering the fine details, textures, and substructures found in high resolution contact maps. This was achieved using a novel loss function tailored to the Hi-C enhancement problem which optimizes for an adversarial loss from a Generative Adversarial Network (GAN), a feature reconstruction loss derived from the latent representation of a denoising autoencoder, and a pixel-wise loss. Not only can the resulting framework generate enhanced Hi-C maps more visually similar to the original high resolution maps, it also excels on a suite of reproducibility metrics produced by members of the ENCODE Consortium compared to existing approaches, including HiCPlus, HiCNN, hicGAN and DeepHiC. Finally, we demonstrate that HiCSR is capable of enhancing Hi-C data across sequencing depth, cell types, and species, recovering biologically significant contact domain boundaries.AvailabilityWe make our implementation available for download at: https://github.com/PSI-Lab/[email protected] informationAvailable Online


2017 ◽  
Author(s):  
Garth R. Ilsley ◽  
Ritsuko Suyama ◽  
Takeshi Noda ◽  
Nori Satoh ◽  
Nicholas M. Luscombe

AbstractSingle-cell RNA-seq has been established as a reliable and accessible technique enabling new types of analyses, such as identifying cell types and studying spatial and temporal gene expression variation and change at single-cell resolution. Recently, single-cell RNA-seq has been applied to developing embryos, which offers great potential for finding and characterising genes controlling the course of development along with their expression patterns. In this study, we applied single-cell RNA-seq to the 16-cell stage of the Ciona embryo, a marine chordate and performed a computational search for cell-specific gene expression patterns. We recovered many known expression patterns from our single-cell RNA-seq data and despite extensive previous screens, we succeeded in finding new cell-specific patterns, which we validated by in situ and single-cell qPCR.


2018 ◽  
Author(s):  
Michael L. Mucenski ◽  
Robert Mahoney ◽  
Mike Adam ◽  
Andrew S. Potter ◽  
S. Steven Potter

AbstractThe uterus is a remarkable organ that must guard against infections while maintaining the ability to support growth of a fetus without rejection. The Hoxa10 and Hoxa11 genes have previously been shown to play essential roles in uterus development and function. In this report we show that the Hoxc9,10,11 genes play a redundant role in the formation of uterine glands. In addition, we use single cell RNA-seq to create a high resolution gene expression atlas of the developing wild type mouse uterus. Cell types and subtypes are defined, for example dividing endothelial cells into arterial, venous, capillary, and lymphatic, while epithelial cells separate into luminal and glandular subtypes. Further, a surprising heterogeneity of stromal and myocyte cell types are identified. Transcription factor codes and ligand/receptor interactions are characterized. We also used single cell RNA-seq to globally define the altered gene expression patterns in all developing uterus cell types for two Hox mutants, with 8 or 9 mutant Hox genes. The mutants show a striking disruption of Wnt signaling as well as the Cxcl12/Cxcr4 ligand/receptor axis.Summary statementA single cell RNA-seq study of the developing mouse uterus defines cellular heterogeneities, lineage specific gene expression programs and perturbed pathways in Hox9,10,11 mutants.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Tatyana Dobreva ◽  
David Brown ◽  
Jong Hwee Park ◽  
Matt Thomson

AbstractAn individual’s immune system is driven by both genetic and environmental factors that vary over time. To better understand the temporal and inter-individual variability of gene expression within distinct immune cell types, we developed a platform that leverages multiplexed single-cell sequencing and out-of-clinic capillary blood extraction to enable simplified, cost-effective profiling of the human immune system across people and time at single-cell resolution. Using the platform, we detect widespread differences in cell type-specific gene expression between subjects that are stable over multiple days.


2020 ◽  
Author(s):  
Shunfu Mao ◽  
Yue Zhang ◽  
Georg Seelig ◽  
Sreeram Kannan

AbstractSingle-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad-hoc effort that requires expert biological knowledge. Here, we introduce CellMeSH - a new automated approach to identifying cell types based on prior literature. CellMeSH combines a database of gene-cell type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and scales automatically. The probabilistic query method enables reliable information retrieval even though the gene-cell type associations extracted from the literature are necessarily noisy. CellMeSH achieves up to 60% top-1 accuracy and 90% top-3 accuracy in annotating the cell types on a human dataset, and up to 58.8% top-1 accuracy and 88.2% top-3 accuracy on three mouse datasets, which is consistently better than existing approaches.AvailabilityWeb server: https://uncurl.cs.washington.edu/db_query and API: https://github.com/shunfumao/cellmesh


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yuanyuan Li ◽  
Ping Luo ◽  
Yi Lu ◽  
Fang-Xiang Wu

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Rongxin Fang ◽  
Sebastian Preissl ◽  
Yang Li ◽  
Xiaomeng Hou ◽  
Jacinta Lucero ◽  
...  

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Tatyana Dobreva ◽  
David Brown ◽  
Jong Hwee Park ◽  
Matt Thomson

AbstractAn individual’s immune system is driven by both genetic and environmental factors that vary over time. To better understand the temporal and inter-individual variability of gene expression within distinct immune cell types, we developed a platform that leverages multiplexed single-cell sequencing and out-of-clinic capillary blood extraction to enable simplified, cost-effective profiling of the human immune system across people and time at single-cell resolution. Using the platform, we detect widespread differences in cell type-specific gene expression between subjects that are stable over multiple days.SummaryIncreasing evidence implicates the immune system in an overwhelming number of diseases, and distinct cell types play specific roles in their pathogenesis.1,2 Studies of peripheral blood have uncovered a wealth of associations between gene expression, environmental factors, disease risk, and therapeutic efficacy.4 For example, in rheumatoid arthritis, multiple mechanistic paths have been found that lead to disease, and gene expression of specific immune cell types can be used as a predictor of therapeutic non-response.12 Furthermore, vaccines, drugs, and chemotherapy have been shown to yield different efficacy based on time of administration, and such findings have been linked to the time-dependence of gene expression in downstream pathways.21,22,23 However, human immune studies of gene expression between individuals and across time remain limited to a few cell types or time points per subject, constraining our understanding of how networks of heterogeneous cells making up each individual’s immune system respond to adverse events and change over time.


Sign in / Sign up

Export Citation Format

Share Document