scholarly journals Metabolic Modeling Combined With Machine Learning Integrates Longitudinal Data and Identifies the Origin of LXR-Induced Hepatic Steatosis

Author(s):  
Natal A. W. van Riel ◽  
Christian A. Tiemann ◽  
Peter A. J. Hilbers ◽  
Albert K. Groen

Temporal multi-omics data can provide information about the dynamics of disease development and therapeutic response. However, statistical analysis of high-dimensional time-series data is challenging. Here we develop a novel approach to model temporal metabolomic and transcriptomic data by combining machine learning with metabolic models. ADAPT (Analysis of Dynamic Adaptations in Parameter Trajectories) performs metabolic trajectory modeling by introducing time-dependent parameters in differential equation models of metabolic systems. ADAPT translates structural uncertainty in the model, such as missing information about regulation, into a parameter estimation problem that is solved by iterative learning. We have now extended ADAPT to include both metabolic and transcriptomic time-series data by introducing a regularization function in the learning algorithm. The ADAPT learning algorithm was (re)formulated as a multi-objective optimization problem in which the estimation of trajectories of metabolic parameters is constrained by the metabolite data and refined by gene expression data. ADAPT was applied to a model of hepatic lipid and plasma lipoprotein metabolism to predict metabolic adaptations that are induced upon pharmacological treatment of mice by a Liver X receptor (LXR) agonist. We investigated the excessive accumulation of triglycerides (TG) in the liver resulting in the development of hepatic steatosis. ADAPT predicted that hepatic TG accumulation after LXR activation originates for 80% from an increased influx of free fatty acids. The model also correctly estimated that TG was stored in the cytosol rather than transferred to nascent very-low density lipoproteins. Through model-based integration of temporal metabolic and gene expression data we discovered that increased free fatty acid influx instead of de novo lipogenesis is the main driver of LXR-induced hepatic steatosis. This study illustrates how ADAPT provides estimates for biomedically important parameters that cannot be measured directly, explaining (side-)effects of pharmacological treatment with LXR agonists.

PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241686
Author(s):  
Nafis Irtiza Tripto ◽  
Mohimenul Kabir ◽  
Md. Shamsuzzoha Bayzid ◽  
Atif Rahman

Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data and focus on clustering the genes based on their expression values. Other domains, such as financial stock and weather prediction, utilize time series data for forecasting purposes. Moreover, many studies have been conducted to classify generic time series data based on trend, seasonality, and other patterns. Therefore, an assessment of these approaches on gene expression data would be of great interest to evaluate their adequacy in this domain. Here, we perform a comprehensive evaluation of different traditional unsupervised and supervised machine learning approaches as well as deep learning based techniques for time series gene expression classification and forecasting on five real datasets. In addition, we propose deep learning based methods for both classification and forecasting, and compare their performances with the state-of-the-art methods. We find that deep learning based methods generally outperform traditional approaches for time series classification. Experiments also suggest that supervised classification on gene expression is more effective than clustering when labels are available. In time series gene expression forecasting, we observe that an autoregressive statistical approach has the best performance for short term forecasting, whereas deep learning based methods are better suited for long term forecasting.


2017 ◽  
Author(s):  
Anthony Szedlak ◽  
Spencer Sims ◽  
Nicholas Smith ◽  
Giovanni Paternostro ◽  
Carlo Piermarocchi

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.


2021 ◽  
Author(s):  
Jingyi Zhang ◽  
Farhan Ibrahim ◽  
Doaa Altarawy ◽  
Lenwood S Heath ◽  
Sarah Tulin

Abstract BackgroundGene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to predict the entire landscape of gene-to-gene interactions with the potential to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development -- representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes -- is one of the most challenging arenas for GRN prediction. ResultsIn this work, we show that successful GRN predictions for developmental systems from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic net. We test our GRN prediction methodology using two gene expression data sets for the purple sea urchin (S. purpuratus) and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results found a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 76.32%). We also generated 838 novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. ConclusionsGRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.


2018 ◽  
Author(s):  
César Capinha

AbstractSpatiotemporal forecasts of ecological phenomena are highly useful and significant in scientific and socio-economic applications. Nevertheless, developing the correlative models to make these forecasts is often stalled by the inadequate availability of the ecological time-series data. On the contrary, considerable amounts of temporally discrete biological records are being stored in public databases, and often include the sites and dates of the observation. While these data are reasonably suitable for the development of spatiotemporal forecast models, this possibility remains mostly untested.In this paper, we test an approach to develop spatiotemporal forecasts based on the dates and locations found in species occurrence records. This approach is based on ‘time-series classification’, a field of machine learning, and involves the application of a machine-learning algorithm to classify between time-series representing the environmental conditions that precede the occurrence records and time-series representing other environmental conditions, such as those that generally occur in the sites of the records. We employed this framework to predict the timing of emergence of fruiting bodies of two mushroom species (Boletus edulis and Macrolepiota procera) in countries of Europe, from 2009 to 2015. We compared the predictions from this approach with those from a ‘null’ model, based on the calendar dates of the records.Forecasts made from the environmental-based approach were consistently superior to those drawn from the date-based approach, averaging an area under the receiver operating characteristic curve (AUC) of 0.9 for B. edulis and 0.88 for M. procera, compared to an average AUC of 0.83 achieved by the null models for both species. Prediction errors were distributed across the study area and along the years, lending support to the spatiotemporal representativeness of the values of accuracy measured.Our approach, based on species occurrence records, was able to provide useful forecasts of the timing of emergence of two mushroom species across Europe. Given the increased availability and information contained in this type of records, particularly those supplemented with photographs, the range of events that could be possible to forecast is vast.


2015 ◽  
Author(s):  
Aman Gupta ◽  
Haohan Wang ◽  
Madhavi Ganapathiraju

Genes play a central role in all biological processes. DNA microarray technology has made it possible to study the expression behavior of thousands of genes in one go. Often, gene expression data is used to generate features for supervised and unsupervised learning tasks. At the same time, advances in the field of deep learning have made available a plethora of architectures. In this paper, we use deep architectures pre-trained in an unsupervised manner using denoising autoencoders as a preprocessing step for a popular unsupervised learning task. Denoising autoencoders (DA) can be used to learn a compact representation of input, and have been used to generate features for further supervised learning tasks. We propose that our deep architectures can be treated as empirical versions of Deep Belief Networks (DBNs). We use our deep architectures to regenerate gene expression time series data for two different data sets. We test our hypothesis on two popular datasets for the unsupervised learning task of clustering and find promising improvements in performance.


2021 ◽  
Vol 13 (1) ◽  
pp. 35-44
Author(s):  
Daniel Vajda ◽  
Adrian Pekar ◽  
Karoly Farkas

The complexity of network infrastructures is exponentially growing. Real-time monitoring of these infrastructures is essential to secure their reliable operation. The concept of telemetry has been introduced in recent years to foster this process by streaming time-series data that contain feature-rich information concerning the state of network components. In this paper, we focus on a particular application of telemetry — anomaly detection on time-series data. We rigorously examined state-of-the-art anomaly detection methods. Upon close inspection of the methods, we observed that none of them suits our requirements as they typically face several limitations when applied on time-series data. This paper presents Alter-Re2, an improved version of ReRe, a state-of-the-art Long Short- Term Memory-based machine learning algorithm. Throughout a systematic examination, we demonstrate that by introducing the concepts of ageing and sliding window, the major limitations of ReRe can be overcome. We assessed the efficacy of Alter-Re2 using ten different datasets and achieved promising results. Alter-Re2 performs three times better on average when compared to ReRe.


Author(s):  
Ching Wei Wang

One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of classifiers. The main discovery is that the ensemble classifier often performs much better than single classifiers that make them up. Recent researches (Dettling, 2004, Tan & Gilbert, 2003) have confirmed the utility of ensemble machine learning algorithms for gene expression analysis. The motivation of this work is to investigate a suitable machine learning algorithm for classification and prediction on gene expression data. The research starts with analyzing the behavior and weaknesses of three popular ensemble machine learning methods—Bagging, Boosting, and Arcing—followed by presentation of a new ensemble machine learning algorithm. The proposed method is evaluated with the existing ensemble machine learning algorithms over 12 gene expression datasets (Alon et al., 1999; Armstrong et al., 2002; Ash et al., 2000; Catherine et al., 2003; Dinesh et al., 2002; Gavin et al., 2002; Golub et al., 1999; Scott et al., 2002; van ’t Veer et al., 2002; Yeoh et al., 2002; Zembutsu et al., 2002). The experimental results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy in classification. The outline of this chapter is as follows: Ensemble machine learning approach and three popular ensembles (i.e., Bagging, Boosting, and Arcing) are introduced first in the Background section; second, the analyses on existing ensembles, details of the proposed algorithm, and experimental results are presented in Method section, followed by discussions on the future trends and conclusion.


Sign in / Sign up

Export Citation Format

Share Document