scholarly journals An Algorithm for the Stochastic Simulation of Gene Expression and Heterogeneous Population Dynamics

2011 ◽  
Vol 9 (1) ◽  
pp. 89-112 ◽  
Author(s):  
Daniel A. Charlebois ◽  
Jukka Intosalmi ◽  
Dawn Fraser ◽  
Mads Kærn

AbstractWe present an algorithm for the stochastic simulation of gene expression and heterogeneous population dynamics. The algorithm combines an exact method to simulate molecular-level fluctuations in single cells and a constant-number Monte Carlo method to simulate time-dependent statistical characteristics of growing cell populations. To benchmark performance, we compare simulation results with steady-state and time-dependent analytical solutions for several scenarios, including steady-state and time-dependent gene expression, and the effects on population heterogeneity of cell growth, division, and DNA replication. This comparison demonstrates that the algorithm provides an efficient and accurate approach to simulate how complex biological features influence gene expression. We also use the algorithm to model gene expression dynamics within “bet-hedging” cell populations during their adaption to environmental stress. These simulations indicate that the algorithm provides a framework suitable for simulating and analyzing realistic models of heterogeneous population dynamics combining molecular-level stochastic reaction kinetics, relevant physiological details and phenotypic variability.

2020 ◽  
Vol 117 (46) ◽  
pp. 28784-28794
Author(s):  
Sisi Chen ◽  
Paul Rivaud ◽  
Jong H. Park ◽  
Tiffany Tsou ◽  
Emeric Charles ◽  
...  

Single-cell measurement techniques can now probe gene expression in heterogeneous cell populations from the human body across a range of environmental and physiological conditions. However, new mathematical and computational methods are required to represent and analyze gene-expression changes that occur in complex mixtures of single cells as they respond to signals, drugs, or disease states. Here, we introduce a mathematical modeling platform, PopAlign, that automatically identifies subpopulations of cells within a heterogeneous mixture and tracks gene-expression and cell-abundance changes across subpopulations by constructing and comparing probabilistic models. Probabilistic models provide a low-error, compressed representation of single-cell data that enables efficient large-scale computations. We apply PopAlign to analyze the impact of 40 different immunomodulatory compounds on a heterogeneous population of donor-derived human immune cells as well as patient-specific disease signatures in multiple myeloma. PopAlign scales to comparisons involving tens to hundreds of samples, enabling large-scale studies of natural and engineered cell populations as they respond to drugs, signals, or physiological change.


2005 ◽  
Vol 03 (05) ◽  
pp. 1191-1205 ◽  
Author(s):  
ENRICO CAPOBIANCO

This paper presents an application of the Independent Component Analysis (ICA) method to genomic data. In particular, experimentally produced perturbation effects over the E.coli bacterium are monitored through the changes of gene expression values observed at regular times, and until steady state has been reached. The aim is to control the response of the SOS system to DNA damage. We might assume that only part of the genetic regulatory network is affected directly by the perturbation conditions, as indirect cascade effects might also be present, and some genes may change just because of randomness. ICA decomposes the gene matrix and identifies groups of genes belonging to a certain estimated component by virtue of co-expression; it is of course of interest to establish co-regulation dynamics, which might underlie the captured correlation. Stronger forms of dependence, like Mutual Information, are thus computed and compared with linear correlation in order to validate the results and establish the role of the identified components in determining the network dynamics.


2019 ◽  
Author(s):  
Z. Cao ◽  
T. Filatova ◽  
D. A. Oyarzún ◽  
R. Grima

AbstractTranscriptional bursting is a major source of noise in gene expression. The telegraph model of gene expression, whereby transcription switches between “on” and “off” states, is the dominant model for bursting. Recently it was shown that the telegraph model cannot explain a number of experimental observations from perturbation data. Here we study an alternative model that is consistent with the data and which explicitly describes RNA polymerase recruitment and polymerase pause release, two steps necessary for mRNA production. We derive the exact steady-state distribution of mRNA numbers and an approximate steady-state distribution of protein numbers which are given by generalized hypergeometric functions. The theory is used to calculate the relative sensitivity of the coefficient of variation of mRNA fluctuations for thousands of genes in mouse fibroblasts. This indicates that the size of fluctuations is mostly sensitive to the rate of burst initiation and the mRNA degradation rate. Furthermore we show that (i) the time-dependent distribution of mRNA numbers is accurately approximated by a modified telegraph model with a Michaelis-Menten like dependence of the effective transcription rate on RNA polymerase abundance. (ii) the model predicts that if the polymerase recruitment rate is comparable or less than the pause release rate, then upon gene replication the mean number of RNA per cell remains approximately constant. This gene dosage compensation property has been experimentally observed and cannot be explained by the telegraph model with constant rates.Statement of SignificanceThe random nature of gene expression is well established experimentally. Mathematical modelling provides a means of understanding the factors leading to the observed stochasticity. There is evidence that the classical two-state model of stochastic mRNA dynamics (the telegraph model) cannot describe perturbation experiments and a new model that includes polymerase dynamics has been proposed. In this paper, we present the first detailed study of this model, deriving an exact solution for the mRNA distribution in steady-state conditions, an approximate time-dependent solution and showing the model can explain gene dosage compensation. As well, we use the theory together with transcriptomic data, to deduce which parameters when perturbed lead to a maximal change in the size of mRNA fluctuations.


2020 ◽  
Author(s):  
H. Medini ◽  
T. Cohen ◽  
D. Mishmar

AbstractMitochondrial gene expression is pivotal to cell metabolism. Nevertheless, it is unknown whether it diverges within a given cell type. Here, we analysed single-cell RNA-seq experiments from ∼4600 human pancreatic alpha and beta cells, as well as ∼900 mouse beta cells. Cluster analysis revealed two distinct human beta cells populations, which diverged by mitochondrial (mtDNA) and nuclear DNA (nDNA)-encoded oxidative phosphorylation (OXPHOS) gene expression in healthy and diabetic individuals, and in newborn but not in adult mice. Insulin gene expression was elevated in beta cells with higher mtDNA gene expression in humans and in young mice. Such human beta cell populations also diverged in mt-RNA mutational repertoire, and in their selective signature, thus implying the existence of two previously overlooked distinct and conserved beta cell populations. While applying our approach to alpha cells, two sub-populations of cells were identified which diverged in mtDNA gene expression, yet these cellular populations did not consistently diverge in nDNA OXPHOS genes expression, nor did they correlate with the expression of glucagon, the hallmark of alpha cells. Thus, pancreatic beta cells within an individual are divided into distinct groups with unique metabolic-mitochondrial signature.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jennifer Ma ◽  
Gary Tran ◽  
Alwin M. D. Wan ◽  
Edmond W. K. Young ◽  
Eugenia Kumacheva ◽  
...  

AbstractGene expression analysis of individual cells enables characterization of heterogeneous and rare cell populations, yet widespread implementation of existing single-cell gene analysis techniques has been hindered due to limitations in scale, ease, and cost. Here, we present a novel microdroplet-based, one-step reverse-transcriptase polymerase chain reaction (RT-PCR) platform and demonstrate the detection of three targets simultaneously in over 100,000 single cells in a single experiment with a rapid read-out. Our customized reagent cocktail incorporates the bacteriophage T7 gene 2.5 protein to overcome cell lysate-mediated inhibition and allows for one-step RT-PCR of single cells encapsulated in nanoliter droplets. Fluorescent signals indicative of gene expressions are analyzed using a probabilistic deconvolution method to account for ambient RNA and cell doublets and produce single-cell gene signature profiles, as well as predict cell frequencies within heterogeneous samples. We also developed a simulation model to guide experimental design and optimize the accuracy and precision of the assay. Using mixtures of in vitro transcripts and murine cell lines, we demonstrated the detection of single RNA molecules and rare cell populations at a frequency of 0.1%. This low cost, sensitive, and adaptable technique will provide an accessible platform for high throughput single-cell analysis and enable a wide range of research and clinical applications.


2021 ◽  
Author(s):  
Tanya Grancharova ◽  
Kaytlyn A. Gerbin ◽  
Alexander B. Rosenberg ◽  
Charles M. Roco ◽  
Joy Arakaki ◽  
...  

AbstractWe performed a comprehensive analysis of the transcriptional changes within and across cell populations during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using the single cell RNA-seq combinatorial barcoding method SPLiT-seq, we sequenced >20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. As expected, differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in differentiation population composition and cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.


2019 ◽  
Author(s):  
Kodai Minoura ◽  
Ko Abe ◽  
Yuka Maeda ◽  
Hiroyoshi Nishikawa ◽  
Teppei Shimamura

AbstractMotivationModern flow cytometry technology has enabled the simultaneous analysis of multiple cell markers at the single-cell level, and it is widely used in a broad field of research. The detection of cell populations in flow cytometry data has long been dependent on “manual gating” by visual inspection. Recently, numerous software have been developed for automatic, computationally guided detection of cell populations; however, they are not designed for time-series flow cytometry data. Time-series flow cytometry data are indispensable for investigating the dynamics of cell populations that could not be elucidated by static time-point analysis.Therefore, there is a great need for tools to systematically analyze time-series flow cytometry data.ResultsWe propose a simple and efficient statistical framework, named CYBERTRACK (CYtometry-Based Estimation and Reasoning for TRACKing cell populations), to perform clustering and cell population tracking for time-series flow cytometry data. CYBERTRACK assumes that flow cytometry data are generated from a multivariate Gaussian mixture distribution with its mixture proportion at the current time dependent on that at a previous timepoint. Using simulation data, we evaluate the performance of CYBERTRACK when estimating parameters for a multivariate Gaussian mixture distribution, tracking time-dependent transitions of mixture proportions, and detecting change-points in the overall mixture proportion. The CYBERTRACK performance is validated using two real flow cytometry datasets, which demonstrate that the population dynamics detected by CYBERTRACK are consistent with our prior knowledge of lymphocyte behavior.ConclusionsOur results indicate that CYBERTRACK offers better understandings of time-dependent cell population dynamics to cytometry users by systematically analyzing time-series flow cytometry data.


2017 ◽  
Vol 14 (136) ◽  
pp. 20170467 ◽  
Author(s):  
Philipp Thomas

Population growth is often ignored when quantifying gene expression levels across clonal cell populations. We develop a framework for obtaining the molecule number distributions in an exponentially growing cell population taking into account its age structure. In the presence of generation time variability, the average acquired across a population snapshot does not obey the average of a dividing cell over time, apparently contradicting ergodicity between single cells and the population. Instead, we show that the variation observed across snapshots with known cell age is captured by cell histories, a single-cell measure obtained from tracking an arbitrary cell of the population back to the ancestor from which it originated. The correspondence between cells of known age in a population with their histories represents an ergodic principle that provides a new interpretation of population snapshot data. We illustrate the principle using analytical solutions of stochastic gene expression models in cell populations with arbitrary generation time distributions. We further elucidate that the principle breaks down for biochemical reactions that are under selection, such as the expression of genes conveying antibiotic resistance, which gives rise to an experimental criterion with which to probe selection on gene expression fluctuations.


2021 ◽  
Author(s):  
Joshua Burton ◽  
Cerys S Manning ◽  
Magnus Rattray ◽  
Nancy Papalopulu ◽  
Jochen Kursawe

Gene expression dynamics, such as stochastic oscillations and aperiodic fluctuations, have been associated with cell fate changes in multiple contexts, including development and cancer. Single cell live imaging of protein expression with endogenous reporters is widely used to observe such gene expression dynamics. However, the experimental investigation of regulatory mechanisms underlying the observed dynamics is challenging, since these mechanisms include complex interactions of multiple processes, including transcription, translation, and protein degradation. Here, we present a Bayesian method to infer kinetic parameters of oscillatory gene expression regulation using an auto-negative feedback motif with delay. Specifically, we use a delay-adapted nonlinear Kalman filter within a Metropolis-adjusted Langevin algorithm to identify posterior probability distributions. Our method can be applied to time series data on gene expression from single cells and is able to infer multiple parameters simultaneously. We apply it to published data on murine neural progenitor cells and show that it outperforms alternative methods. We further analyse how parameter uncertainty depends on the duration and time resolution of an imaging experiment, to make experimental design recommendations. This work demonstrates the utility of parameter inference on time course data from single cells and enables new studies on cell fate changes and population heterogeneity.


2021 ◽  
Vol 18 (182) ◽  
Author(s):  
Joshua Burton ◽  
Cerys S. Manning ◽  
Magnus Rattray ◽  
Nancy Papalopulu ◽  
Jochen Kursawe

Gene expression dynamics, such as stochastic oscillations and aperiodic fluctuations, have been associated with cell fate changes in multiple contexts, including development and cancer. Single cell live imaging of protein expression with endogenous reporters is widely used to observe such gene expression dynamics. However, the experimental investigation of regulatory mechanisms underlying the observed dynamics is challenging, since these mechanisms include complex interactions of multiple processes, including transcription, translation and protein degradation. Here, we present a Bayesian method to infer kinetic parameters of oscillatory gene expression regulation using an auto-negative feedback motif with delay. Specifically, we use a delay-adapted nonlinear Kalman filter within a Metropolis-adjusted Langevin algorithm to identify posterior probability distributions. Our method can be applied to time-series data on gene expression from single cells and is able to infer multiple parameters simultaneously. We apply it to published data on murine neural progenitor cells and show that it outperforms alternative methods. We further analyse how parameter uncertainty depends on the duration and time resolution of an imaging experiment, to make experimental design recommendations. This work demonstrates the utility of parameter inference on time course data from single cells and enables new studies on cell fate changes and population heterogeneity.


Sign in / Sign up

Export Citation Format

Share Document