Learning structure in gene expression data using deep architectures, with an application to gene clustering

Genes play a central role in all biological processes. DNA microarray technology has made it possible to study the expression behavior of thousands of genes in one go. Often, gene expression data is used to generate features for supervised and unsupervised learning tasks. At the same time, advances in the field of deep learning have made available a plethora of architectures. In this paper, we use deep architectures pre-trained in an unsupervised manner using denoising autoencoders as a preprocessing step for a popular unsupervised learning task. Denoising autoencoders (DA) can be used to learn a compact representation of input, and have been used to generate features for further supervised learning tasks. We propose that our deep architectures can be treated as empirical versions of Deep Belief Networks (DBNs). We use our deep architectures to regenerate gene expression time series data for two different data sets. We test our hypothesis on two popular datasets for the unsupervised learning task of clustering and find promising improvements in performance.

Download Full-text

Evaluation of classification and forecasting methods on time series gene expression data

PLoS ONE ◽

10.1371/journal.pone.0241686 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241686

Author(s):

Nafis Irtiza Tripto ◽

Mohimenul Kabir ◽

Md. Shamsuzzoha Bayzid ◽

Atif Rahman

Keyword(s):

Gene Expression ◽

Time Series ◽

Deep Learning ◽

Gene Expression Data ◽

Time Series Data ◽

Weather Prediction ◽

Supervised Machine Learning ◽

Series Data ◽

Expression Data ◽

Time Series Gene Expression

Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data and focus on clustering the genes based on their expression values. Other domains, such as financial stock and weather prediction, utilize time series data for forecasting purposes. Moreover, many studies have been conducted to classify generic time series data based on trend, seasonality, and other patterns. Therefore, an assessment of these approaches on gene expression data would be of great interest to evaluate their adequacy in this domain. Here, we perform a comprehensive evaluation of different traditional unsupervised and supervised machine learning approaches as well as deep learning based techniques for time series gene expression classification and forecasting on five real datasets. In addition, we propose deep learning based methods for both classification and forecasting, and compare their performances with the state-of-the-art methods. We find that deep learning based methods generally outperform traditional approaches for time series classification. Experiments also suggest that supervised classification on gene expression is more effective than clustering when labels are available. In time series gene expression forecasting, we observe that an autoregressive statistical approach has the best performance for short term forecasting, whereas deep learning based methods are better suited for long term forecasting.

Download Full-text

Metabolic Modeling Combined With Machine Learning Integrates Longitudinal Data and Identifies the Origin of LXR-Induced Hepatic Steatosis

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2020.536957 ◽

2021 ◽

Vol 8 ◽

Author(s):

Natal A. W. van Riel ◽

Christian A. Tiemann ◽

Peter A. J. Hilbers ◽

Albert K. Groen

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Time Series ◽

Hepatic Steatosis ◽

Gene Expression Data ◽

Pharmacological Treatment ◽

Time Series Data ◽

Learning Algorithm ◽

Series Data ◽

Expression Data

Temporal multi-omics data can provide information about the dynamics of disease development and therapeutic response. However, statistical analysis of high-dimensional time-series data is challenging. Here we develop a novel approach to model temporal metabolomic and transcriptomic data by combining machine learning with metabolic models. ADAPT (Analysis of Dynamic Adaptations in Parameter Trajectories) performs metabolic trajectory modeling by introducing time-dependent parameters in differential equation models of metabolic systems. ADAPT translates structural uncertainty in the model, such as missing information about regulation, into a parameter estimation problem that is solved by iterative learning. We have now extended ADAPT to include both metabolic and transcriptomic time-series data by introducing a regularization function in the learning algorithm. The ADAPT learning algorithm was (re)formulated as a multi-objective optimization problem in which the estimation of trajectories of metabolic parameters is constrained by the metabolite data and refined by gene expression data. ADAPT was applied to a model of hepatic lipid and plasma lipoprotein metabolism to predict metabolic adaptations that are induced upon pharmacological treatment of mice by a Liver X receptor (LXR) agonist. We investigated the excessive accumulation of triglycerides (TG) in the liver resulting in the development of hepatic steatosis. ADAPT predicted that hepatic TG accumulation after LXR activation originates for 80% from an increased influx of free fatty acids. The model also correctly estimated that TG was stored in the cytosol rather than transferred to nascent very-low density lipoproteins. Through model-based integration of temporal metabolic and gene expression data we discovered that increased free fatty acid influx instead of de novo lipogenesis is the main driver of LXR-induced hepatic steatosis. This study illustrates how ADAPT provides estimates for biomedically important parameters that cannot be measured directly, explaining (side-)effects of pharmacological treatment with LXR agonists.

Download Full-text

An Approach to reduce the large feature space of Microarray Gene Expression Data by gene clustering for efficient sample classification

INTERNATIONAL JOURNAL OF COMPUTER APPLICATION ◽

10.26808/rs.ca.i8v3.01 ◽

2018 ◽

Vol 3 (8) ◽

Author(s):

Sheela T ◽

Lalitha Rangarajan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Feature Space ◽

Microarray Gene Expression Data ◽

Gene Clustering ◽

Expression Data ◽

Microarray Gene Expression ◽

Sample Classification ◽

Microarray Gene

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Extracting compact representation of knowledge from gene expression data for protein-protein interaction

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2017.10006873 ◽

2017 ◽

Vol 17 (4) ◽

pp. 279

Author(s):

Haohan Wang ◽

Ming Xu ◽

Aman Gupta

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Protein Interaction ◽

Compact Representation ◽

Expression Data ◽

Protein Protein Interaction ◽

Representation Of Knowledge

Download Full-text

Learning structure in gene expression data using deep architectures, with an application to gene clustering

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359871 ◽

2015 ◽

Cited By ~ 18

Author(s):

Aman Gupta ◽

Haohan Wang ◽

Madhavi Ganapathiraju

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clustering ◽

Expression Data

Download Full-text

A Joint Optimization Framework Integrated with Biological Knowledge for Clustering Incomplete Gene Expression Data

10.21203/rs.3.rs-1087790/v1 ◽

2021 ◽

Author(s):

Dan Li ◽

Hong Gu ◽

Qiaozhen Chang ◽

Jia Wang ◽

Pan Qin

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Missing Values ◽

Clustering Algorithms ◽

Joint Optimization ◽

Gene Clustering ◽

Biological Knowledge ◽

Data Sets ◽

Expression Data ◽

Optimization Framework

Abstract Clustering algorithms have been successfully applied to identify co-expressed gene groups from gene expression data. Missing values often occur in gene expression data, which presents a challenge for gene clustering. When partitioning incomplete gene expression data into co-expressed gene groups, missing value imputation and clustering are generally performed as two separate processes. These two-stage methods are likely to result in unsuitable imputation values for clustering task and unsatisfying clustering performance. This paper proposes a multi-objective joint optimization framework for clustering incomplete gene expression data that addresses this problem. The proposed framework can impute the missing expression values under the guidance of clustering, and therefore realize the synergistic improvement of imputation and clustering. In addition, gene expression similarity and gene semantic similarity extracted from the Gene Ontology are combined, as the form of functional neighbor interval for each missing expression value, to provide reasonable constraints for the joint optimization framework. Experiments on several benchmark data sets confirm the effectiveness of the proposed framework.

Download Full-text

Extracting compact representation of knowledge from gene expression data for protein-protein interaction

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2017.085711 ◽

2017 ◽

Vol 17 (4) ◽

pp. 279

Author(s):

Haohan Wang ◽

Aman Gupta ◽

Ming Xu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Protein Interaction ◽

Compact Representation ◽

Expression Data ◽

Protein Protein Interaction ◽

Representation Of Knowledge

Download Full-text

A performance analysis of clustering based algorithms for the microarray gene expression data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12172 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 201 ◽

Cited By ~ 2

Author(s):

K Yuvaraj ◽

D Manjula

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Dna Sequences ◽

Clustering Algorithms ◽

Expression Patterns ◽

Microarray Gene Expression Data ◽

Gene Clustering ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Current advancements in microarray technology permit simultaneous observing of the expression levels of huge number of genes over various time points. Microarrays have obtained amazing implication in the field of bioinformatics. It includes an ordered set of huge different Deoxyribonucleic Acid (DNA) sequences that can be used to measure both DNA as well as Ribonucleic Acid (RNA) dissimilarities. The Gene Expression (GE) summary aids in understanding the basic cause of gene activities, the growth of genes, determining recent disorders like cancer and as well analysing their molecular pharmacology. Clustering is a significant tool applied for analyzing such microarray gene expression data. It has developed into a greatest part of gene expression analysis. Grouping the genes having identical expression patterns is known as gene clustering. A number of clustering algorithms have been applied for the analysis of microarray gene expression data. The aim of this paper is to analyze the precision level of the microarray data by using various clustering algorithms.

Download Full-text

Feature selection and gene clustering from gene expression data

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. ◽

10.1109/icpr.2004.1334213 ◽

2004 ◽

Cited By ~ 7

Author(s):

P. Mitra ◽

D.D. Majumder

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Clustering ◽

Expression Data

Download Full-text