Deep feature extraction of single-cell transcriptomes by generative adversarial network

Bioinformatics ◽

10.1093/bioinformatics/btaa976 ◽

2020 ◽

Author(s):

Mojtaba Bahrami ◽

Malosree Maitra ◽

Corina Nagy ◽

Gustavo Turecki ◽

Hamid R Rabiee ◽

...

Keyword(s):

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Individual Variability ◽

Superior Performance ◽

Supplementary Information ◽

Specific Gene ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Deep Feature Extraction

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers the opportunity to dissect heterogeneous cellular compositions and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition designs. Results Here, we present a single-cell Generative Adversarial Network (scGAN) to simultaneously acquire patterns from raw data while minimizing the confounding effect driven by technical artifacts or other factors inherent to the data. Specifically, scGAN models the data likelihood of the raw scRNA-seq counts by projecting each cell onto a latent embedding. Meanwhile, scGAN attempts to minimize the correlation between the latent embeddings and the batch labels across all cells. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder. Availabilityand implementation The scGAN code and the information for the public scRNA-seq datasets are available at https://github.com/li-lab-mcgill/singlecell-deepfeature. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep feature extraction of single-cell transcriptomes by generative adversarial network

10.1101/2020.04.29.066464 ◽

2020 ◽

Author(s):

Mojtaba Bahrami ◽

Malosree Maitra ◽

Corina Nagy ◽

Gustavo Turecki ◽

Hamid R. Rabiee ◽

...

Keyword(s):

Single Cell ◽

Negative Binomial ◽

Expression Patterns ◽

Cell Types ◽

Individual Variability ◽

Superior Performance ◽

Specific Gene ◽

Batch Effects ◽

Generative Adversarial Network ◽

Adversarial Network

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) has opened the opportunities to dissect the heterogeneous cellular composition and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition design.ResultsWe present single-cell Generative Adversarial Network (scGAN). Our main contribution is to introduce an adversarial network to predict batch effects using the embeddings from the variational autoencoder network, which does not only need to maximize the Negative Binomial data likelihood of the raw scRNA-seq counts but also minimize the correlation between the latent embeddings and the batch effects. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder.AvailabilityThe code is available at https://github.com/li-lab-mcgill/[email protected]

Download Full-text

HiCSR: a Hi-C super-resolution framework for producing highly realistic contact maps

10.1101/2020.02.24.961714 ◽

2020 ◽

Author(s):

Michael C. Dimmick ◽

Leo J. Lee ◽

Brendan J. Frey

Keyword(s):

High Resolution ◽

Super Resolution ◽

Cell Types ◽

Supplementary Information ◽

Low Resolution ◽

Generative Adversarial Network ◽

High Resolution Data ◽

Contact Maps ◽

Adversarial Network ◽

And Function

AbstractMotivationHi-C data has enabled the genome-wide study of chromatin folding and architecture, and has led to important discoveries in the structure and function of chromatin conformation. Here, high resolution data plays a particularly important role as many chromatin substructures such as Topologically Associating Domains (TADs) and chromatin loops cannot be adequately studied with low resolution contact maps. However, the high sequencing costs associated with the generation of high resolution Hi-C data has become an experimental barrier. Data driven machine learning models, which allow low resolution Hi-C data to be computationally enhanced, offer a promising avenue to address this challenge.ResultsBy carefully examining the properties of Hi-C maps and integrating various recent advances in deep learning, we developed a Hi-C Super-Resolution (HiCSR) framework capable of accurately recovering the fine details, textures, and substructures found in high resolution contact maps. This was achieved using a novel loss function tailored to the Hi-C enhancement problem which optimizes for an adversarial loss from a Generative Adversarial Network (GAN), a feature reconstruction loss derived from the latent representation of a denoising autoencoder, and a pixel-wise loss. Not only can the resulting framework generate enhanced Hi-C maps more visually similar to the original high resolution maps, it also excels on a suite of reproducibility metrics produced by members of the ENCODE Consortium compared to existing approaches, including HiCPlus, HiCNN, hicGAN and DeepHiC. Finally, we demonstrate that HiCSR is capable of enhancing Hi-C data across sequencing depth, cell types, and species, recovering biologically significant contact domain boundaries.AvailabilityWe make our implementation available for download at: https://github.com/PSI-Lab/[email protected] informationAvailable Online

Download Full-text

Finding cell-specific expression patterns in the early Ciona embryo with single-cell RNA-seq

10.1101/197699 ◽

2017 ◽

Author(s):

Garth R. Ilsley ◽

Ritsuko Suyama ◽

Takeshi Noda ◽

Nori Satoh ◽

Nicholas M. Luscombe

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Cell Stage ◽

Specific Expression ◽

Temporal Gene Expression

AbstractSingle-cell RNA-seq has been established as a reliable and accessible technique enabling new types of analyses, such as identifying cell types and studying spatial and temporal gene expression variation and change at single-cell resolution. Recently, single-cell RNA-seq has been applied to developing embryos, which offers great potential for finding and characterising genes controlling the course of development along with their expression patterns. In this study, we applied single-cell RNA-seq to the 16-cell stage of the Ciona embryo, a marine chordate and performed a computational search for cell-specific gene expression patterns. We recovered many known expression patterns from our single-cell RNA-seq data and despite extensive previous screens, we succeeded in finding new cell-specific patterns, which we validated by in situ and single-cell qPCR.

Download Full-text

Single cell RNA-seq study of wild type and Hox9,10,11 mutant developing uterus

10.1101/395574 ◽

2018 ◽

Cited By ~ 1

Author(s):

Michael L. Mucenski ◽

Robert Mahoney ◽

Mike Adam ◽

Andrew S. Potter ◽

S. Steven Potter

Keyword(s):

Gene Expression ◽

Single Cell ◽

Hox Genes ◽

Expression Patterns ◽

Wild Type Mouse ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Wild Type ◽

Mouse Uterus

AbstractThe uterus is a remarkable organ that must guard against infections while maintaining the ability to support growth of a fetus without rejection. The Hoxa10 and Hoxa11 genes have previously been shown to play essential roles in uterus development and function. In this report we show that the Hoxc9,10,11 genes play a redundant role in the formation of uterine glands. In addition, we use single cell RNA-seq to create a high resolution gene expression atlas of the developing wild type mouse uterus. Cell types and subtypes are defined, for example dividing endothelial cells into arterial, venous, capillary, and lymphatic, while epithelial cells separate into luminal and glandular subtypes. Further, a surprising heterogeneity of stromal and myocyte cell types are identified. Transcription factor codes and ligand/receptor interactions are characterized. We also used single cell RNA-seq to globally define the altered gene expression patterns in all developing uterus cell types for two Hox mutants, with 8 or 9 mutant Hox genes. The mutants show a striking disruption of Wnt signaling as well as the Cxcl12/Cxcr4 ligand/receptor axis.Summary statementA single cell RNA-seq study of the developing mouse uterus defines cellular heterogeneities, lineage specific gene expression programs and perturbed pathways in Hox9,10,11 mutants.

Download Full-text

Single cell profiling of capillary blood enables out of clinic human immunity studies

Scientific Reports ◽

10.1038/s41598-020-77073-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Tatyana Dobreva ◽

David Brown ◽

Jong Hwee Park ◽

Matt Thomson

Keyword(s):

Gene Expression ◽

Immune System ◽

Single Cell ◽

Immune Cell ◽

Cost Effective ◽

Cell Types ◽

Individual Variability ◽

Capillary Blood ◽

Specific Gene ◽

Human Immune System

Download Full-text

CellMeSH: Probabilistic Cell-Type Identification Using Indexed Literature

10.1101/2020.05.29.124743 ◽

2020 ◽

Author(s):

Shunfu Mao ◽

Yue Zhang ◽

Georg Seelig ◽

Sreeram Kannan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Probabilistic Method ◽

Expression Patterns ◽

Cell Types ◽

Cellular Systems ◽

Biological Knowledge ◽

Specific Gene ◽

Cell Type ◽

Link Type

AbstractSingle-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad-hoc effort that requires expert biological knowledge. Here, we introduce CellMeSH - a new automated approach to identifying cell types based on prior literature. CellMeSH combines a database of gene-cell type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and scales automatically. The probabilistic query method enables reliable information retrieval even though the gene-cell type associations extracted from the literature are necessarily noisy. CellMeSH achieves up to 60% top-1 accuracy and 90% top-3 accuracy in annotating the cell types on a human dataset, and up to 58.8% top-1 accuracy and 88.2% top-3 accuracy on three mouse datasets, which is consistently better than existing approaches.AvailabilityWeb server: https://uncurl.cs.washington.edu/db_query and API: https://github.com/shunfumao/cellmesh

Download Full-text

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btab257 ◽

2021 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Enabling out-of-clinic human immunity studies via single-cell profiling of capillary blood

10.1101/2020.07.25.210468 ◽

2020 ◽

Author(s):

Tatyana Dobreva ◽

David Brown ◽

Jong Hwee Park ◽

Matt Thomson

Keyword(s):

Gene Expression ◽

Immune System ◽

Environmental Factors ◽

Single Cell ◽

Immune Cell ◽

Cell Types ◽

Capillary Blood ◽

Specific Gene ◽

Human Immune ◽

Over Time

AbstractAn individual’s immune system is driven by both genetic and environmental factors that vary over time. To better understand the temporal and inter-individual variability of gene expression within distinct immune cell types, we developed a platform that leverages multiplexed single-cell sequencing and out-of-clinic capillary blood extraction to enable simplified, cost-effective profiling of the human immune system across people and time at single-cell resolution. Using the platform, we detect widespread differences in cell type-specific gene expression between subjects that are stable over multiple days.SummaryIncreasing evidence implicates the immune system in an overwhelming number of diseases, and distinct cell types play specific roles in their pathogenesis.1,2 Studies of peripheral blood have uncovered a wealth of associations between gene expression, environmental factors, disease risk, and therapeutic efficacy.4 For example, in rheumatoid arthritis, multiple mechanistic paths have been found that lead to disease, and gene expression of specific immune cell types can be used as a predictor of therapeutic non-response.12 Furthermore, vaccines, drugs, and chemotherapy have been shown to yield different efficacy based on time of administration, and such findings have been linked to the time-dependence of gene expression in downstream pathways.21,22,23 However, human immune studies of gene expression between individuals and across time remain limited to a few cell types or time points per subject, constraining our understanding of how networks of heterogeneous cells making up each individual’s immune system respond to adverse events and change over time.

Download Full-text