scholarly journals Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data

2022 ◽  
Vol 15 (1) ◽  
Author(s):  
Pelin Gundogdu ◽  
Carlos Loucera ◽  
Inmaculada Alamo-Alvarez ◽  
Joaquin Dopazo ◽  
Isabel Nepomuceno

Abstract Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.

2021 ◽  
Author(s):  
Pelin Gundogdu ◽  
Carlos Loucera ◽  
Inmaculada Alamo-Alvarez ◽  
Joaquin Dopazo ◽  
Isabel Nepomuceno

Abstract BackgroundSingle-cell RNA sequencing (scRNA-seq) data provides valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data.ResultsIn this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets.ConclusionsHere we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Shahin Mohammadi ◽  
Jose Davila-Velderrain ◽  
Manolis Kellis

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.


Author(s):  
Chen Qi ◽  
Shibo Shen ◽  
Rongpeng Li ◽  
Zhifeng Zhao ◽  
Qing Liu ◽  
...  

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Bas Molenaar ◽  
Louk T. Timmer ◽  
Marjolein Droog ◽  
Ilaria Perini ◽  
Danielle Versteeg ◽  
...  

AbstractThe efficiency of the repair process following ischemic cardiac injury is a crucial determinant for the progression into heart failure and is controlled by both intra- and intercellular signaling within the heart. An enhanced understanding of this complex interplay will enable better exploitation of these mechanisms for therapeutic use. We used single-cell transcriptomics to collect gene expression data of all main cardiac cell types at different time-points after ischemic injury. These data unveiled cellular and transcriptional heterogeneity and changes in cellular function during cardiac remodeling. Furthermore, we established potential intercellular communication networks after ischemic injury. Follow up experiments confirmed that cardiomyocytes express and secrete elevated levels of beta-2 microglobulin in response to ischemic damage, which can activate fibroblasts in a paracrine manner. Collectively, our data indicate phase-specific changes in cellular heterogeneity during different stages of cardiac remodeling and allow for the identification of therapeutic targets relevant for cardiac repair.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Luca Alessandri ◽  
Francesca Cordero ◽  
Marco Beccuti ◽  
Nicola Licheri ◽  
Maddalena Arigoni ◽  
...  

AbstractSingle-cell RNA sequencing (scRNAseq) is an essential tool to investigate cellular heterogeneity. Thus, it would be of great interest being able to disclose biological information belonging to cell subpopulations, which can be defined by clustering analysis of scRNAseq data. In this manuscript, we report a tool that we developed for the functional mining of single cell clusters based on Sparsely-Connected Autoencoder (SCA). This tool allows uncovering hidden features associated with scRNAseq data. We implemented two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), which allow quantifying the ability of SCA to reconstruct valuable cell clusters and to evaluate the quality of the neural network achievements, respectively. Our data indicate that SCA encoded space, derived by different experimentally validated data (TF targets, miRNA targets, Kinase targets, and cancer-related immune signatures), can be used to grasp single cell cluster-specific functional features. In our implementation, SCA efficacy comes from its ability to reconstruct only specific clusters, thus indicating only those clusters where the SCA encoding space is a key element for cells aggregation. SCA analysis is implemented as module in rCASC framework and it is supported by a GUI to simplify it usage for biologists and medical personnel.


2020 ◽  
Author(s):  
N. Kakava-Georgiadou ◽  
J.F. Severens ◽  
A.M. Jørgensen ◽  
K.M. Garner ◽  
M.C.M Luijendijk ◽  
...  

AbstractHypothalamic nuclei which regulate homeostatic functions express leptin receptor (LepR), the primary target of the satiety hormone leptin. Single-cell RNA sequencing (scRNA-seq) has facilitated the discovery of a variety of hypothalamic cell types. However, low abundance of LepR transcripts prevented further characterization of LepR cells. Therefore, we perform scRNA-seq on isolated LepR cells and identify eight neuronal clusters, including three uncharacterized Trh-expressing populations as well as 17 non-neuronal populations including tanycytes, oligodendrocytes and endothelial cells. Food restriction had a major impact on Agrp neurons and changed the expression of obesity-associated genes. Multiple cell clusters were enriched for GWAS signals of obesity. We further explored changes in the gene regulatory landscape of LepR cell types. We thus reveal the molecular signature of distinct populations with diverse neurochemical profiles, which will aid efforts to illuminate the multi-functional nature of leptin’s action in the hypothalamus.


Author(s):  
Vishal Babu Siramshetty ◽  
Dac-Trung Nguyen ◽  
Natalia J. Martinez ◽  
Anton Simeonov ◽  
Noel T. Southall ◽  
...  

The rise of novel artificial intelligence methods necessitates a comparison of this wave of new approaches with classical machine learning for a typical drug discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by human Ether-à-go-go-Related Gene (hERG), leads to prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here we perform a comprehensive comparison of prediction models based on classical (random forests and gradient boosting) and modern (deep neural networks and recurrent neural networks) artificial intelligence methods. The training set (~9000 compounds) was compiled by integrating hERG bioactivity data from ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-valued continuous vectors derived from chemical autoencoders trained on a large chemical space (> 1.5 million compounds). The models were prospectively validated on ~840 in-house compounds screened in the same thallium flux assay. The deep neural networks performed significantly better than the classical methods with the latent descriptors. The recurrent neural networks that operate on SMILES provided highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Further, we shed light on the potential of artificial intelligence methods to exploit the chemistry big data and generate novel chemical representations useful in predictive modeling and tailoring new chemical space.<br>


2020 ◽  
Author(s):  
Etienne Becht ◽  
Daniel Tolstrup ◽  
Charles-Antoine Dutertre ◽  
Florent Ginhoux ◽  
Evan W. Newell ◽  
...  

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.


2021 ◽  
Vol 53 (9) ◽  
pp. 1379-1389
Author(s):  
Hao Kan ◽  
Ka Zhang ◽  
Aiqin Mao ◽  
Li Geng ◽  
Mengru Gao ◽  
...  

AbstractThe aorta contains numerous cell types that contribute to vascular inflammation and thus the progression of aortic diseases. However, the heterogeneity and cellular composition of the ascending aorta in the setting of a high-fat diet (HFD) have not been fully assessed. We performed single-cell RNA sequencing on ascending aortas from mice fed a normal diet and mice fed a HFD. Unsupervised cluster analysis of the transcriptional profiles from 24,001 aortic cells identified 27 clusters representing 10 cell types: endothelial cells (ECs), fibroblasts, vascular smooth muscle cells (SMCs), immune cells (B cells, T cells, macrophages, and dendritic cells), mesothelial cells, pericytes, and neural cells. After HFD intake, subpopulations of endothelial cells with lipid transport and angiogenesis capacity and extensive expression of contractile genes were defined. In the HFD group, three major SMC subpopulations showed increased expression of extracellular matrix-degradation genes, and a synthetic SMC subcluster was proportionally increased. This increase was accompanied by upregulation of proinflammatory genes. Under HFD conditions, aortic-resident macrophage numbers were increased, and blood-derived macrophages showed the strongest expression of proinflammatory cytokines. Our study elucidates the nature and range of the cellular composition of the ascending aorta and increases understanding of the development and progression of aortic inflammatory disease.


2019 ◽  
Author(s):  
Gemma L. Johnson ◽  
Erick J. Masias ◽  
Jessica A. Lehoczky

ABSTRACTInnate regeneration following digit tip amputation is one of the few examples of epimorphic regeneration in mammals. Digit tip regeneration is mediated by the blastema, the same structure invoked during limb regeneration in some lower vertebrates. By genetic lineage analyses in mice, the digit tip blastema has been defined as a population of heterogeneous, lineage restricted progenitor cells. These previous studies, however, do not comprehensively evaluate blastema heterogeneity or address lineage restriction of closely related cell types. In this report we present single cell RNA sequencing of over 38,000 cells from mouse digit tip blastemas and unamputated control digit tips and generate an atlas of the cell types participating in digit tip regeneration. We define the differentiation trajectories of vascular, monocytic, and fibroblastic lineages over regeneration, and while our data confirm broad lineage restriction of progenitors, our analysis reveals an early blastema fibroblast population expressing a novel regeneration-specific gene, Mest.


Sign in / Sign up

Export Citation Format

Share Document