Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic

Mapping Intimacies ◽

10.1101/2021.09.02.458673 ◽

2021 ◽

Author(s):

Yuen Ler Chow ◽

Shantanu Singh ◽

Anne E Carpenter ◽

Gregory P. Way

Keyword(s):

Gene Expression ◽

Cell Morphology ◽

Learning Algorithm ◽

Simulated Data ◽

Biomedical Data ◽

Data Types ◽

Generative Capacity ◽

Latent Space ◽

Variational Autoencoder ◽

Target Effects

A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.

Download Full-text

Regression on imperfect class labels derived by unsupervised clustering

Briefings in Bioinformatics ◽

10.1093/bib/bbaa014 ◽

2020 ◽

Author(s):

Rasmus Froberg Brøndum ◽

Thomas Yssing Michaelsen ◽

Martin Bøgsted

Keyword(s):

Gene Expression ◽

Multiple Myeloma ◽

Learning Algorithm ◽

Gaussian Mixture Models ◽

Simulated Data ◽

Gaussian Mixture ◽

Unsupervised Clustering ◽

Expression Data ◽

Method Performance ◽

Class Labels

Abstract Outcome regressed on class labels identified by unsupervised clustering is custom in many applications. However, it is common to ignore the misclassification of class labels caused by the learning algorithm, which potentially leads to serious bias of the estimated effect parameters. Due to their generality we suggest to address the problem by use of regression calibration or the misclassification simulation and extrapolation method. Performance is illustrated by simulated data from Gaussian mixture models, documenting a reduced bias and improved coverage of confidence intervals when adjusting for misclassification with either method. Finally, we apply our method to data from a previous study, which regressed overall survival on class labels derived from unsupervised clustering of gene expression data from bone marrow samples of multiple myeloma patients.

Download Full-text

Modulation of Human Mesenchymal Stem Cells by Electrical Stimulation Using an Enzymatic Biofuel Cell

Catalysts ◽

10.3390/catal11010062 ◽

2021 ◽

Vol 11 (1) ◽

pp. 62

Author(s):

Won-Yong Jeon ◽

Seyoung Mun ◽

Wei Beng Ng ◽

Keunsoo Kang ◽

Kyudong Han ◽

...

Keyword(s):

Gene Expression ◽

Stem Cells ◽

Mesenchymal Stem Cells ◽

Electrical Stimulation ◽

Regenerative Medicine ◽

Cell Morphology ◽

Electrical Current ◽

Specific Gene ◽

Next Generation ◽

The Impact

Enzymatic biofuel cells (EBFCs) have excellent potential as components in bioelectronic devices, especially as active biointerfaces to regulate stem cell behavior for regenerative medicine applications. However, it remains unclear to what extent EBFC-generated electrical stimulation can regulate the functional behavior of human adipose-derived mesenchymal stem cells (hAD-MSCs) at the morphological and gene expression levels. Herein, we investigated the effect of EBFC-generated electrical stimulation on hAD-MSC cell morphology and gene expression using next-generation RNA sequencing. We tested three different electrical currents, 127 ± 9, 248 ± 15, and 598 ± 75 nA/cm2, in mesenchymal stem cells. We performed transcriptome profiling to analyze the impact of EBFC-derived electrical current on gene expression using next generation sequencing (NGS). We also observed changes in cytoskeleton arrangement and analyzed gene expression that depends on the electrical stimulation. The electrical stimulation of EBFC changes cell morphology through cytoskeleton re-arrangement. In particular, the results of whole transcriptome NGS showed that specific gene clusters were up- or down-regulated depending on the magnitude of applied electrical current of EBFC. In conclusion, this study demonstrates that EBFC-generated electrical stimulation can influence the morphological and gene expression properties of stem cells; such capabilities can be useful for regenerative medicine applications such as bioelectronic devices.

Download Full-text

Estimating Simultaneous Equation Models through an Entropy-Based Incremental Variational Bayes Learning Algorithm

Entropy ◽

10.3390/e23040384 ◽

2021 ◽

Vol 23 (4) ◽

pp. 384

Author(s):

Rocío Hernández-Sanjaime ◽

Martín González ◽

Antonio Peñalver ◽

Jose J. López-Espín

Keyword(s):

Statistical Theory ◽

Learning Algorithm ◽

Real Life ◽

Simulated Data ◽

Simultaneous Equation ◽

Variational Bayes ◽

Parameter Estimates ◽

Step Method ◽

The One ◽

Simultaneous Equation Models

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.

Download Full-text

A deep metric learning algorithm for similarity measure of the gene expression profile

2020 IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM) ◽

10.1109/healthcom49281.2021.9398919 ◽

2021 ◽

Author(s):

Shaoliang Peng ◽

Lei Zhang ◽

Yaning Yang ◽

Wei Liu ◽

Fei Li ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Profile ◽

Similarity Measure ◽

Expression Profile ◽

Learning Algorithm ◽

Metric Learning ◽

Deep Metric Learning

Download Full-text

Spectral Convolution Feature-Based SPD Matrix Representation for Signal Detection Using a Deep Neural Network

Entropy ◽

10.3390/e22090949 ◽

2020 ◽

Vol 22 (9) ◽

pp. 949

Author(s):

Jiangyi Wang ◽

Min Liu ◽

Xinwu Zeng ◽

Xiaoqiang Hua

Keyword(s):

Neural Network ◽

Signal Detection ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Detection Method ◽

Learning Algorithm ◽

Simulated Data ◽

Data Sets ◽

Feature Maps ◽

Simulated Data Sets

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.

Download Full-text

FC1000: normalized gene expression changes of systematically perturbed human cells

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0072 ◽

2017 ◽

Vol 16 (4) ◽

Cited By ~ 1

Author(s):

Ingrid M. Lönnstedt ◽

Sven Nelander

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Estimation Procedure ◽

Human Cells ◽

Biomedical Data ◽

Transcriptional Responses ◽

Statistical Framework ◽

Statistical Measures ◽

Change Response

AbstractThe systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting

Download Full-text

From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

BMC Systems Biology ◽

10.1186/1752-0509-1-37 ◽

2007 ◽

Vol 1 (1) ◽

Cited By ~ 212

Author(s):

Rainer Opgen-Rhein ◽

Korbinian Strimmer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Learning Algorithm ◽

High Dimensional ◽

Expression Data ◽

Plant Gene Expression ◽

Plant Gene

Download Full-text

A deep generative model of 3D single-cell organization

10.1101/2021.06.09.447725 ◽

2021 ◽

Author(s):

Rory Donovan-Maiye ◽

Jackson Brown ◽

Caleb Chan ◽

Liya Ding ◽

Calysta Yan ◽

...

Keyword(s):

Single Cell ◽

Cell Morphology ◽

Image Data ◽

Subcellular Structure ◽

Cell Image ◽

Fluorescent Image ◽

Trade Offs ◽

Cell Organization ◽

Latent Representations ◽

Target Effects

We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional β-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Our model is flexible and can be trained on images of arbitrary subcellular structures and at varying degrees of sparsity and reconstruction fidelity. We train our full model on 3D cell image data and explore design trade-offs in the 2D setting. Once trained, our model can be used to impute structures in cells where they were not imaged and to quantify the variation in the location of all subcellular structures by generating plausible instantiations of each structure in arbitrary cell geometries. We apply our trained model to a small drug perturbation screen to demonstrate its applicability to new data. We show how the latent representations of drugged cells differ from unperturbed cells as expected by on-target effects of the drugs.

Download Full-text

Moderate and intensive mechanical loading differentially modulate the phenotype of tendon stem/progenitor cells in vivo

PLoS ONE ◽

10.1371/journal.pone.0242640 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0242640

Author(s):

Jianying Zhang ◽

Daibang Nie ◽

Kelly Williamson ◽

Arthur McDowell ◽

MaCalus V. Hogan ◽

...

Keyword(s):

Gene Expression ◽

Progenitor Cells ◽

Cell Morphology ◽

Fluorescent Protein ◽

Treadmill Running ◽

Whole Body ◽

Tendon Cells ◽

Elevated Expression ◽

Green Fluorescent

To examine the differential mechanobiological responses of specific resident tendon cells, we developed an in vivo model of whole-body irradiation followed by injection of either tendon stem/progenitor cells (TSCs) expressing green fluorescent protein (GFP-TSCs) or mature tenocytes expressing GFP (GFP-TNCs) into the patellar tendons of wild type C57 mice. Injected mice were subjected to short term (3 weeks) treadmill running, specifically moderate treadmill running (MTR) and intensive treadmill running (ITR). In MTR mice, both GFP-TSC and GFP-TNC injected tendons maintained normal cell morphology with elevated expression of tendon related markers collagen I and tenomodulin. In ITR mice injected with GFP-TNCs, cells also maintained an elongated shape similar to the shape found in normal/untreated control mice, as well as elevated expression of tendon related markers. However, ITR mice injected with GFP-TSCs showed abnormal changes, such as cell morphology transitioning to a round shape, elevated chondrogenic differentiation, and increased gene expression of non-tenocyte related genes LPL, Runx-2, and SOX-9. Increased gene expression data was supported by immunostaining showing elevated expression of SOX-9, Runx-2, and PPARγ. This study provides evidence that while MTR maintains tendon homeostasis by promoting the differentiation of TSCs into TNCs, ITR causes the onset of tendinopathy development by inducing non-tenocyte differentiation of TSCs, which may eventually lead to the formation of non-tendinous tissues in tendon tissue after long term mechanical overloading conditions on the tendon.

Download Full-text

Shrinkage of dispersion parameters in the binomial family, with application to differential exon skipping

10.1101/012823 ◽

2014 ◽

Author(s):

Sean Ruddy ◽

Marla Johnson ◽

Elizabeth Purdom

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Empirical Bayes ◽

Simulated Data ◽

Exon Skipping ◽

Expression Data ◽

Weighted Likelihood ◽

Sequencing Data ◽

Dispersion Parameters ◽

Per Gene

The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is that of gene expression data, where the counts per gene are often modeled as some form of an over-dispersed Poisson. In this case, shrinkage estimates of the per-gene dispersion parameter have led to improved estimation of dispersion in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an over-dispersed binomial model. This is motivated by our interest in testing for differential exon skipping in mRNA-Seq experiments. We introduce a novel method that is developed by modeling the dispersion based on the double binomial distribution proposed by Efron (1986). Our method (WEB-Seq) is an empirical bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data (Robinson and Smyth, 2007; Robinson et al., 2010). We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and gives accurate control of the FDR compared to alternative approaches. We provide implementation of our methods in the R package DoubleExpSeq available on CRAN.

Download Full-text