Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data

Abstract Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.

Download Full-text

Integrating Pathway Knowledge with Deep Neural Networks to Reduce the Dimensionality in Single-Cell RNA-Seq Data

10.21203/rs.3.rs-847372/v1 ◽

2021 ◽

Author(s):

Pelin Gundogdu ◽

Carlos Loucera ◽

Inmaculada Alamo-Alvarez ◽

Joaquin Dopazo ◽

Isabel Nepomuceno

Keyword(s):

Neural Networks ◽

Single Cell ◽

Deep Neural Networks ◽

Cell Types ◽

Cellular Heterogeneity ◽

Biological Information ◽

Superior Performance ◽

Biological Knowledge ◽

Additional Advantage ◽

Functional Relationships

Abstract BackgroundSingle-cell RNA sequencing (scRNA-seq) data provides valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data.ResultsIn this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets.ConclusionsHere we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.

Download Full-text

A multiresolution framework to characterize single-cell state landscapes

Nature Communications ◽

10.1038/s41467-020-18416-6 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Shahin Mohammadi ◽

Jose Davila-Velderrain ◽

Manolis Kellis

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Real Data ◽

Cell Types ◽

Cellular Heterogeneity ◽

Superior Performance ◽

Data Sets ◽

Structural Representation ◽

Archetypal Analysis ◽

Cell State

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Single-cell transcriptomics following ischemic injury identifies a role for B2M in cardiac repair

Communications Biology ◽

10.1038/s42003-020-01636-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Bas Molenaar ◽

Louk T. Timmer ◽

Marjolein Droog ◽

Ilaria Perini ◽

Danielle Versteeg ◽

...

Keyword(s):

Single Cell ◽

Communication Networks ◽

Cardiac Remodeling ◽

Cardiac Injury ◽

Ischemic Injury ◽

Cell Types ◽

Repair Process ◽

Cardiac Repair ◽

Cellular Heterogeneity ◽

Intercellular Signaling

AbstractThe efficiency of the repair process following ischemic cardiac injury is a crucial determinant for the progression into heart failure and is controlled by both intra- and intercellular signaling within the heart. An enhanced understanding of this complex interplay will enable better exploitation of these mechanisms for therapeutic use. We used single-cell transcriptomics to collect gene expression data of all main cardiac cell types at different time-points after ischemic injury. These data unveiled cellular and transcriptional heterogeneity and changes in cellular function during cardiac remodeling. Furthermore, we established potential intercellular communication networks after ischemic injury. Follow up experiments confirmed that cardiomyocytes express and secrete elevated levels of beta-2 microglobulin in response to ischemic damage, which can activate fibroblasts in a paracrine manner. Collectively, our data indicate phase-specific changes in cellular heterogeneity during different stages of cardiac remodeling and allow for the identification of therapeutic targets relevant for cardiac repair.

Download Full-text

Sparsely-connected autoencoder (SCA) for single cell RNAseq data mining

npj Systems Biology and Applications ◽

10.1038/s41540-020-00162-6 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Luca Alessandri ◽

Francesca Cordero ◽

Marco Beccuti ◽

Nicola Licheri ◽

Maddalena Arigoni ◽

...

Keyword(s):

Quality Control ◽

Single Cell ◽

Medical Personnel ◽

Cellular Heterogeneity ◽

Biological Information ◽

Cell Clusters ◽

The Neural Network ◽

Rnaseq Data ◽

Functional Features ◽

Cell Subpopulations

AbstractSingle-cell RNA sequencing (scRNAseq) is an essential tool to investigate cellular heterogeneity. Thus, it would be of great interest being able to disclose biological information belonging to cell subpopulations, which can be defined by clustering analysis of scRNAseq data. In this manuscript, we report a tool that we developed for the functional mining of single cell clusters based on Sparsely-Connected Autoencoder (SCA). This tool allows uncovering hidden features associated with scRNAseq data. We implemented two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), which allow quantifying the ability of SCA to reconstruct valuable cell clusters and to evaluate the quality of the neural network achievements, respectively. Our data indicate that SCA encoded space, derived by different experimentally validated data (TF targets, miRNA targets, Kinase targets, and cancer-related immune signatures), can be used to grasp single cell cluster-specific functional features. In our implementation, SCA efficacy comes from its ability to reconstruct only specific clusters, thus indicating only those clusters where the SCA encoding space is a key element for cells aggregation. SCA analysis is implemented as module in rCASC framework and it is supported by a GUI to simplify it usage for biologists and medical personnel.

Download Full-text

Single-cell analysis reveals cellular heterogeneity and molecular determinants of hypothalamic leptin-receptor cells

10.1101/2020.07.23.217729 ◽

2020 ◽

Author(s):

N. Kakava-Georgiadou ◽

J.F. Severens ◽

A.M. Jørgensen ◽

K.M. Garner ◽

M.C.M Luijendijk ◽

...

Keyword(s):

Single Cell ◽

Leptin Receptor ◽

Single Cell Analysis ◽

Cell Types ◽

Cellular Heterogeneity ◽

Molecular Signature ◽

Neuronal Populations ◽

Hypothalamic Nuclei ◽

Satiety Hormone ◽

Multiple Cell

AbstractHypothalamic nuclei which regulate homeostatic functions express leptin receptor (LepR), the primary target of the satiety hormone leptin. Single-cell RNA sequencing (scRNA-seq) has facilitated the discovery of a variety of hypothalamic cell types. However, low abundance of LepR transcripts prevented further characterization of LepR cells. Therefore, we perform scRNA-seq on isolated LepR cells and identify eight neuronal clusters, including three uncharacterized Trh-expressing populations as well as 17 non-neuronal populations including tanycytes, oligodendrocytes and endothelial cells. Food restriction had a major impact on Agrp neurons and changed the expression of obesity-associated genes. Multiple cell clusters were enriched for GWAS signals of obesity. We further explored changes in the gene regulatory landscape of LepR cell types. We thus reveal the molecular signature of distinct populations with diverse neurochemical profiles, which will aid efforts to illuminate the multi-functional nature of leptin’s action in the hypothalamus.

Download Full-text

Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the ‘Big Data’ Era

10.26434/chemrxiv.12119040 ◽

2020 ◽

Cited By ~ 1

Author(s):

Vishal Babu Siramshetty ◽

Dac-Trung Nguyen ◽

Natalia J. Martinez ◽

Anton Simeonov ◽

Noel T. Southall ◽

...

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Big Data ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Prediction Models ◽

Chemical Space ◽

Superior Performance ◽

Gradient Boosting ◽

Artificial Intelligence Methods

The rise of novel artificial intelligence methods necessitates a comparison of this wave of new approaches with classical machine learning for a typical drug discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by human Ether-à-go-go-Related Gene (hERG), leads to prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here we perform a comprehensive comparison of prediction models based on classical (random forests and gradient boosting) and modern (deep neural networks and recurrent neural networks) artificial intelligence methods. The training set (~9000 compounds) was compiled by integrating hERG bioactivity data from ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-valued continuous vectors derived from chemical autoencoders trained on a large chemical space (> 1.5 million compounds). The models were prospectively validated on ~840 in-house compounds screened in the same thallium flux assay. The deep neural networks performed significantly better than the classical methods with the latent descriptors. The recurrent neural networks that operate on SMILES provided highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Further, we shed light on the potential of artificial intelligence methods to exploit the chemistry big data and generate novel chemical representations useful in predictive modeling and tailoring new chemical space.<br>

Download Full-text

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Download Full-text

Single-cell transcriptome analysis reveals cellular heterogeneity in the ascending aortas of normal and high-fat diet-fed mice

Experimental & Molecular Medicine ◽

10.1038/s12276-021-00671-2 ◽

2021 ◽

Vol 53 (9) ◽

pp. 1379-1389

Author(s):

Hao Kan ◽

Ka Zhang ◽

Aiqin Mao ◽

Li Geng ◽

Mengru Gao ◽

...

Keyword(s):

Endothelial Cells ◽

Single Cell ◽

High Fat Diet ◽

Cell Types ◽

Ascending Aorta ◽

Normal Diet ◽

Cellular Heterogeneity ◽

Cellular Composition ◽

High Fat ◽

Aortic Diseases

AbstractThe aorta contains numerous cell types that contribute to vascular inflammation and thus the progression of aortic diseases. However, the heterogeneity and cellular composition of the ascending aorta in the setting of a high-fat diet (HFD) have not been fully assessed. We performed single-cell RNA sequencing on ascending aortas from mice fed a normal diet and mice fed a HFD. Unsupervised cluster analysis of the transcriptional profiles from 24,001 aortic cells identified 27 clusters representing 10 cell types: endothelial cells (ECs), fibroblasts, vascular smooth muscle cells (SMCs), immune cells (B cells, T cells, macrophages, and dendritic cells), mesothelial cells, pericytes, and neural cells. After HFD intake, subpopulations of endothelial cells with lipid transport and angiogenesis capacity and extensive expression of contractile genes were defined. In the HFD group, three major SMC subpopulations showed increased expression of extracellular matrix-degradation genes, and a synthetic SMC subcluster was proportionally increased. This increase was accompanied by upregulation of proinflammatory genes. Under HFD conditions, aortic-resident macrophage numbers were increased, and blood-derived macrophages showed the strongest expression of proinflammatory cytokines. Our study elucidates the nature and range of the cellular composition of the ascending aorta and increases understanding of the development and progression of aortic inflammatory disease.

Download Full-text

Cellular heterogeneity and lineage restriction during mouse digit tip regeneration at single cell resolution

10.1101/737023 ◽

2019 ◽

Author(s):

Gemma L. Johnson ◽

Erick J. Masias ◽

Jessica A. Lehoczky

Keyword(s):

Single Cell ◽

Limb Regeneration ◽

Cell Types ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Genetic Lineage ◽

Epimorphic Regeneration ◽

Related Cell ◽

Lower Vertebrates ◽

Single Cell Rna Sequencing

ABSTRACTInnate regeneration following digit tip amputation is one of the few examples of epimorphic regeneration in mammals. Digit tip regeneration is mediated by the blastema, the same structure invoked during limb regeneration in some lower vertebrates. By genetic lineage analyses in mice, the digit tip blastema has been defined as a population of heterogeneous, lineage restricted progenitor cells. These previous studies, however, do not comprehensively evaluate blastema heterogeneity or address lineage restriction of closely related cell types. In this report we present single cell RNA sequencing of over 38,000 cells from mouse digit tip blastemas and unamputated control digit tips and generate an atlas of the cell types participating in digit tip regeneration. We define the differentiation trajectories of vascular, monocytic, and fibroblastic lineages over regeneration, and while our data confirm broad lineage restriction of progenitors, our analysis reveals an early blastema fibroblast population expressing a novel regeneration-specific gene, Mest.

Download Full-text