Aligning single-cell developmental and reprogramming trajectories identifies molecular determinants of reprogramming outcome

AbstractCellular reprogramming through manipulation of defined factors holds great promise for large-scale production of cell types needed for use in therapy, as well as for expanding our understanding of the general principles of gene regulation. MYOD-mediated myogenic reprogramming, which converts many cell types into contractile myotubes, remains one of the best characterized model system for direct conversion by defined factors. However, why MYOD can efficiently convert some cell types into myotubes but not others remains poorly understood. Here, we analyze MYOD-mediated reprogramming of human fibroblasts at pseudotemporal resolution using single-cell RNA-Seq. Successfully reprogrammed cells navigate a trajectory with two branches that correspond to two barriers to reprogramming, with cells that select incorrect branches terminating at aberrant or incomplete reprogramming outcomes. Differential analysis of the major branch points alongside alignment of the successful reprogramming path to a primary myoblast trajectory revealed Insulin and BMP signaling as crucial molecular determinants of an individual cell’s reprogramming outcome, that when appropriately modulated, increased efficiency more than five-fold. Our single-cell analysis reveals that MYOD is sufficient to reprogram cells only when the extracellular milieu is favorable, supporting MYOD with upstream signaling pathways that drive normal myogenesis in development.

Download Full-text

Production, crystallization and preliminary X-ray analysis of the human integrin \alpha_1 I domain

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444999006009 ◽

1999 ◽

Vol 55 (7) ◽

pp. 1365-1367 ◽

Cited By ~ 10

Author(s):

Tiina A. Salminen ◽

Yvonne Nymalm ◽

Jussi Kankare ◽

Jarmo Käpylä ◽

Jyrki Heino ◽

...

Keyword(s):

Large Scale ◽

Cell Types ◽

Diffusion Method ◽

Scale Production ◽

Data Set ◽

Cell Parameters ◽

Collagen Receptors ◽

Large Scale Production ◽

Peg 4000 ◽

Reservoir Solution

Integrin α1β1 is one of the main collagen receptors in many cell types. A fast large-scale production, purification and crystallization method for the integrin α1 I domain is reported here. The α1 I domain was crystallized using the vapour-diffusion method with a reservoir solution containing a mixture of PEG 4000, sodium acetate, glycerol and Tris–HCl buffer. The crystals beong to the C2 space group, with unit-cell parameters a = 74.5, b = 81.9, c = 37.3 Å, α = γ = 90.0, β = 90.8°. The crystals diffract to 2.0 Å and a 94.2% complete data set to 2.2 Å has been collected from a single crystal with an R merge of 5.8%.

Download Full-text

Abstract 262: Derivation of Duchenne Muscular Dystrophy Disease Specific Cardiomyocytes From Patient Urine

Circulation Research ◽

10.1161/res.113.suppl_1.a262 ◽

2013 ◽

Vol 113 (suppl_1) ◽

Author(s):

Xuan Guan ◽

David L Mack ◽

Claudia M Moreno ◽

Fernando Santana ◽

Charles E Murry ◽

...

Keyword(s):

Stem Cells ◽

Human Urine ◽

Large Scale ◽

Lentiviral Vector ◽

Cellular Reprogramming ◽

Urine Samples ◽

Pcr Analysis ◽

Scale Production ◽

Mechanism Study

Introduction: Human somatic cells can be reprogrammed into primitive stem cells, termed induced pluripotent stem cells (iPSCs). These iPSCs can be extensively expanded in vitro and differentiated into multiple functional cell types, enabling faithful preservation of individual’s genotype and large scale production of disease targeted cellular components. These unique cellular reagents thus hold tremendous potential in disease mechanism study, drugs screening and cell replacement therapy. Due to the genetic mutation of the protein dystrophin, many DMD patients develop fatal cardiomyopathy with no effective treatment. The underlying pathogenesis has not been fully elucidated. Hypothesis: We tested the hypothesis that iPSCs could be generated from DMD patients’ urine samples and differentiated into cardiomyocytes, recapitulating the dystrophic phenotype. Methods: iPSCs generation was achieved by introducing a lentiviral vector expressing Oct4, Sox2, c-Myc and Klf4 into cells derived from patient’s (n=1) and healthy volunteers’ (n=3) urine. Cardiomyocytes were derived by sequentially treating iPSCs with GSK3 inhibitor CHIR99021 and Wnt inhibitor IWP4. Differentiated cardiomyocytes were subjected to calcium imaging, electrophysiology recording, Polymerase Chain Reaction (PCR) analysis, and immunostaining. Results: iPSCs were efficiently generated from human urine samples and further forced to differentiate into contracting cardiomyocytes. PCR analysis and immunostaining confirmed the expression of a panel of cardiac markers. Both normal and patient iPSC derived cardiomyocytes exhibited spontaneous and field stimulated calcium transients (up to 2Hz), as well as action potentials with ventricular-like and nodal-like characteristics. Anti-dystrophin antibodies stained normal iPSC-derived cardiomyocyte membranes but did not react against DMD iPSC-derived cardiomyocytes. Conclusions: Cardiomyocytes can be efficiently generated from human urine, through the cellular reprogramming technology. DMD cardiomyocytes retained the patient’s genetic information and manifested a dystrophin-null phenotype. Functional assessments are underway to determine differences that may exist between genotypes.

Download Full-text

Single-cell RNA counting at allele- and isoform-resolution using Smart-seq3

10.1101/817924 ◽

2019 ◽

Cited By ~ 6

Author(s):

Michael Hagemann-Jensen ◽

Christoph Ziegenhain ◽

Ping Chen ◽

Daniel Ramsköld ◽

Gert-Jan Hendriks ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cell Types ◽

Mouse Strains ◽

Rna Molecules ◽

Counting Strategy ◽

Long Read ◽

Sequencing Strategy ◽

Transcriptome Coverage ◽

Scale Characterization

AbstractLarge-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.

Download Full-text

An Algorithm for Cellular Reprogramming

10.1101/162974 ◽

2017 ◽

Cited By ~ 1

Author(s):

Scott Ronquist ◽

Geoff Patterson ◽

Markus Brown ◽

Stephen Lindsly ◽

Haiming Chen ◽

...

Keyword(s):

Cell Cycle ◽

Transcription Factor ◽

Transcription Factors ◽

Cell Biology ◽

Human Fibroblasts ◽

Cell Types ◽

Approximate Model ◽

Cellular Reprogramming ◽

Biological Processes ◽

Moderate Complexity

AbstractThe day we understand the time evolution of subcellular elements at a level of detail comparable to physical systems governed by Newton’s laws of motion seems far away. Even so, quantitative approaches to cellular dynamics add to our understanding of cell biology, providing data-guided frameworks that allow us to develop better predictions about, and methods for, control over specific biological processes and system-wide cell behavior. In this paper, we describe an approach to optimizing the use of transcription factors (TFs) in the context of cellular reprogramming. We construct an approximate model for the natural evolution of a cell cycle synchronized population of human fibroblasts, based on data obtained by sampling the expression of 22,083 genes at several time points along the cell cycle. In order to arrive at a model of moderate complexity, we cluster gene expression based on the division of the genome into topologically associating domains (TADs) and then model the dynamics of the TAD expression levels. Based on this dynamical model and known bioinformatics, such as transcription factor binding sites (TFBS) and functions, we develop a methodology for identifying the top transcription factor candidates for a specific cellular reprogramming task. The approach used is based on a device commonly used in optimal control. Our data-guided methodology identifies a number of transcription factors previously validated for reprogramming and/or natural differentiation. Our findings highlight the immense potential of dynamical models, mathematics, and data-guided methodologies for improving strategies for control over biological processes.Significance StatementReprogramming the human genome toward any desirable state is within reach; application of select transcription factors drives cell types toward different lineages in many settings. We introduce the concept of data-guided control in building a universal algorithm for directly reprogramming any human cell type into any other type. Our algorithm is based on time series genome transcription and architecture data and known regulatory activities of transcription factors, with natural dimension reduction using genome architectural features. Our algorithm predicts known reprogramming factors, top candidates for new settings, and ideal timing for application of transcription factors. This framework can be used to develop strategies for tissue regeneration, cancer cell reprogramming, and control of dynamical systems beyond cell biology.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

10.1101/2020.06.15.151910 ◽

2020 ◽

Author(s):

Van Hoan Do ◽

Francisca Rojas Ringeling ◽

Stefan Canzar

Keyword(s):

Single Cell ◽

Large Scale ◽

Linear Time ◽

Cell Types ◽

Substantial Improvement ◽

Rna Seq ◽

Sampling Step ◽

Protein Marker ◽

Cluster Ensembles ◽

Sequencing Technologies

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.

Download Full-text

SCISSOR™: a single-cell inferred site-specific omics resource for tumor microenvironment association study

NAR Cancer ◽

10.1093/narcan/zcab037 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Xiang Cui ◽

Fei Qin ◽

Xuanxuan Yu ◽

Feifei Xiao ◽

Guoshuai Cai

Keyword(s):

Tumor Microenvironment ◽

Single Cell ◽

Clinical Outcomes ◽

Large Scale ◽

Cell Types ◽

Cell Interaction ◽

Specific Cell ◽

Dynamic Visualization ◽

Tissue Specific ◽

Cell Composition

Abstract Tumor tissues are heterogeneous with different cell types in tumor microenvironment, which play an important role in tumorigenesis and tumor progression. Several computational algorithms and tools have been developed to infer the cell composition from bulk transcriptome profiles. However, they ignore the tissue specificity and thus a new resource for tissue-specific cell transcriptomic reference is needed for inferring cell composition in tumor microenvironment and exploring their association with clinical outcomes and tumor omics. In this study, we developed SCISSOR™ (https://thecailab.com/scissor/), an online open resource to fulfill that demand by integrating five orthogonal omics data of >6031 large-scale bulk samples, patient clinical outcomes and 451 917 high-granularity tissue-specific single-cell transcriptomic profiles of 16 cancer types. SCISSOR™ provides five major analysis modules that enable flexible modeling with adjustable parameters and dynamic visualization approaches. SCISSOR™ is valuable as a new resource for promoting tumor heterogeneity and tumor–tumor microenvironment cell interaction research, by delineating cells in the tissue-specific tumor microenvironment and characterizing their associations with tumor omics and clinical outcomes.

Download Full-text

deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors

Frontiers in Genetics ◽

10.3389/fgene.2021.708981 ◽

2021 ◽

Vol 12 ◽

Author(s):

Bin Zou ◽

Tongda Zhang ◽

Ruilong Zhou ◽

Xiaosen Jiang ◽

Huanming Yang ◽

...

Keyword(s):

Deep Learning ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Cell Types ◽

Batch Effect ◽

Batch Effects ◽

Batch Correction ◽

Single Cell Rna Sequencing ◽

Identical Cell

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text

The single-cell eQTLGen consortium

eLife ◽

10.7554/elife.52155 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 18

Author(s):

MGP van der Wijst ◽

DH de Vries ◽

HE Groot ◽

G Trynka ◽

CC Hon ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Cell Types ◽

Eqtl Analysis ◽

Sequencing Data ◽

Scale Population ◽

Trait Locus ◽

Different Cell Types ◽

Affect Gene Expression

In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.

Download Full-text