scholarly journals Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic

2021 ◽  
Author(s):  
Yuen Ler Chow ◽  
Shantanu Singh ◽  
Anne E Carpenter ◽  
Gregory P. Way

A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.

Author(s):  
Rasmus Froberg Brøndum ◽  
Thomas Yssing Michaelsen ◽  
Martin Bøgsted

Abstract Outcome regressed on class labels identified by unsupervised clustering is custom in many applications. However, it is common to ignore the misclassification of class labels caused by the learning algorithm, which potentially leads to serious bias of the estimated effect parameters. Due to their generality we suggest to address the problem by use of regression calibration or the misclassification simulation and extrapolation method. Performance is illustrated by simulated data from Gaussian mixture models, documenting a reduced bias and improved coverage of confidence intervals when adjusting for misclassification with either method. Finally, we apply our method to data from a previous study, which regressed overall survival on class labels derived from unsupervised clustering of gene expression data from bone marrow samples of multiple myeloma patients.


Catalysts ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 62
Author(s):  
Won-Yong Jeon ◽  
Seyoung Mun ◽  
Wei Beng Ng ◽  
Keunsoo Kang ◽  
Kyudong Han ◽  
...  

Enzymatic biofuel cells (EBFCs) have excellent potential as components in bioelectronic devices, especially as active biointerfaces to regulate stem cell behavior for regenerative medicine applications. However, it remains unclear to what extent EBFC-generated electrical stimulation can regulate the functional behavior of human adipose-derived mesenchymal stem cells (hAD-MSCs) at the morphological and gene expression levels. Herein, we investigated the effect of EBFC-generated electrical stimulation on hAD-MSC cell morphology and gene expression using next-generation RNA sequencing. We tested three different electrical currents, 127 ± 9, 248 ± 15, and 598 ± 75 nA/cm2, in mesenchymal stem cells. We performed transcriptome profiling to analyze the impact of EBFC-derived electrical current on gene expression using next generation sequencing (NGS). We also observed changes in cytoskeleton arrangement and analyzed gene expression that depends on the electrical stimulation. The electrical stimulation of EBFC changes cell morphology through cytoskeleton re-arrangement. In particular, the results of whole transcriptome NGS showed that specific gene clusters were up- or down-regulated depending on the magnitude of applied electrical current of EBFC. In conclusion, this study demonstrates that EBFC-generated electrical stimulation can influence the morphological and gene expression properties of stem cells; such capabilities can be useful for regenerative medicine applications such as bioelectronic devices.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 384
Author(s):  
Rocío Hernández-Sanjaime ◽  
Martín González ◽  
Antonio Peñalver ◽  
Jose J. López-Espín

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 949
Author(s):  
Jiangyi Wang ◽  
Min Liu ◽  
Xinwu Zeng ◽  
Xiaoqiang Hua

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.


Author(s):  
Ingrid M. Lönnstedt ◽  
Sven Nelander

AbstractThe systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting


2021 ◽  
Author(s):  
Rory Donovan-Maiye ◽  
Jackson Brown ◽  
Caleb Chan ◽  
Liya Ding ◽  
Calysta Yan ◽  
...  

We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional β-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Our model is flexible and can be trained on images of arbitrary subcellular structures and at varying degrees of sparsity and reconstruction fidelity. We train our full model on 3D cell image data and explore design trade-offs in the 2D setting. Once trained, our model can be used to impute structures in cells where they were not imaged and to quantify the variation in the location of all subcellular structures by generating plausible instantiations of each structure in arbitrary cell geometries. We apply our trained model to a small drug perturbation screen to demonstrate its applicability to new data. We show how the latent representations of drugged cells differ from unperturbed cells as expected by on-target effects of the drugs.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0242640
Author(s):  
Jianying Zhang ◽  
Daibang Nie ◽  
Kelly Williamson ◽  
Arthur McDowell ◽  
MaCalus V. Hogan ◽  
...  

To examine the differential mechanobiological responses of specific resident tendon cells, we developed an in vivo model of whole-body irradiation followed by injection of either tendon stem/progenitor cells (TSCs) expressing green fluorescent protein (GFP-TSCs) or mature tenocytes expressing GFP (GFP-TNCs) into the patellar tendons of wild type C57 mice. Injected mice were subjected to short term (3 weeks) treadmill running, specifically moderate treadmill running (MTR) and intensive treadmill running (ITR). In MTR mice, both GFP-TSC and GFP-TNC injected tendons maintained normal cell morphology with elevated expression of tendon related markers collagen I and tenomodulin. In ITR mice injected with GFP-TNCs, cells also maintained an elongated shape similar to the shape found in normal/untreated control mice, as well as elevated expression of tendon related markers. However, ITR mice injected with GFP-TSCs showed abnormal changes, such as cell morphology transitioning to a round shape, elevated chondrogenic differentiation, and increased gene expression of non-tenocyte related genes LPL, Runx-2, and SOX-9. Increased gene expression data was supported by immunostaining showing elevated expression of SOX-9, Runx-2, and PPARγ. This study provides evidence that while MTR maintains tendon homeostasis by promoting the differentiation of TSCs into TNCs, ITR causes the onset of tendinopathy development by inducing non-tenocyte differentiation of TSCs, which may eventually lead to the formation of non-tendinous tissues in tendon tissue after long term mechanical overloading conditions on the tendon.


2014 ◽  
Author(s):  
Sean Ruddy ◽  
Marla Johnson ◽  
Elizabeth Purdom

The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is that of gene expression data, where the counts per gene are often modeled as some form of an over-dispersed Poisson. In this case, shrinkage estimates of the per-gene dispersion parameter have led to improved estimation of dispersion in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an over-dispersed binomial model. This is motivated by our interest in testing for differential exon skipping in mRNA-Seq experiments. We introduce a novel method that is developed by modeling the dispersion based on the double binomial distribution proposed by Efron (1986). Our method (WEB-Seq) is an empirical bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data (Robinson and Smyth, 2007; Robinson et al., 2010). We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and gives accurate control of the FDR compared to alternative approaches. We provide implementation of our methods in the R package DoubleExpSeq available on CRAN.


Sign in / Sign up

Export Citation Format

Share Document