Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species

ABSTRACTNew approaches are urgently needed to glean biological insights from the vast amounts of single cell RNA sequencing (scRNA-Seq) data now being generated. To this end, we propose that cell identity should map to a reduced set of factors which will describe both exclusive and shared biology of individual cells, and that the dimensions which contain these factors reflect biologically meaningful relationships across different platforms, tissues and species. To find a robust set of dependent factors in large-scale scRNA- Seq data, we developed a Bayesian non-negative matrix factorization (NMF) algorithm, scCoGAPS. Application of scCoGAPS to scRNA-Seq data obtained over the course of mouse retinal development identified gene expression signatures for factors associated with specific cell types and continuous biological processes. To test whether these signatures are shared across diverse cellular contexts, we developed projectR to map biologically disparate datasets into the factors learned by scCoGAPS. Because projecting these dimensions preserve relative distances between samples, biologically meaningful relationships/factors will stratify new data consistent with their underlying processes, allowing labels or information from one dataset to be used for annotation of the other—a machine learning concept called transfer learning. Using projectR, data from multiple datasets was used to annotate latent spaces and reveal novel parallels between developmental programs in other tissues, species and cellular assays. Using this approach we are able to transfer cell type and state designations across datasets to rapidly annotate cellular features in a new dataset without a priori knowledge of their type, identify a species-specific signature of microglial cells, and identify a previously undescribed subpopulation of neurosecretory cells within the lung. Together, these algorithms define biologically meaningful dimensions of cellular identity, state, and trajectories that persist across technologies, molecular features, and species.GRAPHICAL ABSTRACT

Download Full-text

Indices of transfer: Learning can transfer but still be specific

PsycEXTRA Dataset ◽

10.1037/e520592012-493 ◽

2010 ◽

Author(s):

Erica L. Wohldmann ◽

Alice F. Healy

Keyword(s):

Transfer Learning

Download Full-text

2184-P: Foxo1-CoF Repressor (FCoR) Regulates Pancreatic Alpha- and Beta-Cell Identity by both DNA and Histone Methylation

Diabetes ◽

10.2337/db19-2184-p ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 2184-P

Author(s):

NORIKO KODANI ◽

MASAKI KOBAYASHI ◽

OSAMU KIKUCHI ◽

TADAHIRO KITAMURA ◽

HIROSHI ITOH ◽

...

Keyword(s):

Beta Cell ◽

Histone Methylation ◽

Cell Identity

Download Full-text

A transfer learning based microstructure reconstruction approach using microstructural correlation functions

10.26226/morressier.5f5f8e69aa777f8ba5bd604c ◽

2020 ◽

Author(s):

Ashwini Gupta

Keyword(s):

Transfer Learning ◽

Correlation Functions ◽

Microstructure Reconstruction

Download Full-text

Molecular Generation Targeting Desired Electronic Properties via Deep Generative Models

10.26434/chemrxiv.9913865.v2 ◽

2019 ◽

Author(s):

Qi Yuan ◽

Alejandro Santana-Bonilla ◽

Martijn Zwijnenburg ◽

Kim Jelfs

Keyword(s):

Neural Network ◽

Electronic Properties ◽

Transfer Learning ◽

Recurrent Neural Network ◽

Chemical Space ◽

Generative Models ◽

Molecular Features ◽

Donor Acceptor ◽

Homo Lumo ◽

Training Sets

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>

Download Full-text