Discovering Key Transcriptomic Regulators in Pancreatic Ductal Adenocarcinoma using Dirichlet Process Gaussian Mixture Model

ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC), late detection of which leads to its therapeutic failure. This study aims to find out key regulatory genes and their impact on the progression of the disease helping the etiology of the disease which is still largely unknown. We leverage the landmark advantages of time-series gene expression data of this disease, and thereby the identified key regulators capture the characteristics of gene activity patterns in the progression of the cancer. We have identified the key modules and predicted gene functions of top genes from the compiled gene association network (GAN). Here, we have used the natural cubic spline regression model (splineTimeR) to identify differentially expressed genes (DEG) from the PDAC microarray time-series data downloaded from gene expression omnibus (GEO). First, we have identified key transcriptomic regulators (TR) and DNA binding transcription factors (DbTF). Subsequently, the Dirichlet process and Gaussian process (DPGP) mixture model is utilized to identify the key gene modules. A variation of the partial correlation method is utilized to analyze GAN, which is followed by a process of gene function prediction from the network. Finally, a panel of key genes related to PDAC is highlighted from each of the analyses performed.Please note: Abbreviations should be introduced at the first mention in the main text – no abbreviations lists. Suggested structure of main text (not enforced) is provided below.

Download Full-text

Discovering key transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian mixture model

Scientific Reports ◽

10.1038/s41598-021-87234-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sk Md Mosaddek Hossain ◽

Aanzil Akram Halsana ◽

Lutfunnesa Khatun ◽

Sumanta Ray ◽

Anirban Mukhopadhyay

Keyword(s):

Pancreatic Ductal Adenocarcinoma ◽

Mixture Model ◽

Cancer Progression ◽

Dirichlet Process ◽

Network Inference ◽

Correlation Method ◽

Gaussian Mixture ◽

Ductal Adenocarcinoma ◽

Gene Regulatory Network Inference ◽

Gene Modules

AbstractPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer, late detection leading to its therapeutic failure. This study aims to determine the key regulatory genes and their impacts on the disease’s progression, helping the disease’s etiology, which is still mostly unknown. We leverage the landmark advantages of time-series gene expression data of this disease and thereby identified the key regulators that capture the characteristics of gene activity patterns in the cancer progression. We have identified the key gene modules and predicted the functions of top genes from a reconstructed gene association network (GAN). A variation of the partial correlation method is utilized to analyze the GAN, followed by a gene function prediction task. Moreover, we have identified regulators for each target gene by gene regulatory network inference using the dynamical GENIE3 (dynGENIE3) algorithm. The Dirichlet process Gaussian process mixture model and cubic spline regression model (splineTimeR) are employed to identify the key gene modules and differentially expressed genes, respectively. Our analysis demonstrates a panel of key regulators and gene modules that are crucial for PDAC disease progression.

Download Full-text

Detecting variability in massive astronomical time series data â I. Application of an infinite Gaussian mixture model

Monthly Notices of the Royal Astronomical Society ◽

10.1111/j.1365-2966.2009.15576.x ◽

2009 ◽

Vol 400 (4) ◽

pp. 1897-1910 ◽

Cited By ~ 28

Author(s):

Min-Su Shin ◽

Michael Sekora ◽

Yong-Ik Byun

Keyword(s):

Time Series ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Time Series Data ◽

Gaussian Mixture ◽

Series Data ◽

Astronomical Time ◽

Astronomical Time Series

Download Full-text

Clustering gene expression time series data using an infinite Gaussian process mixture model

10.1101/131151 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ian C. McDowell ◽

Dinesh Manandhar ◽

Christopher M. Vockley ◽

Amy K. Schmid ◽

Timothy E. Reddy ◽

...

Keyword(s):

Time Series ◽

Gaussian Process ◽

Mixture Model ◽

Dirichlet Process ◽

Cellular Response ◽

Time Series Data ◽

Series Data ◽

Nonparametric Model ◽

Cluster Number ◽

Temporal Dependencies

AbstractTranscriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models cluster number with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison with state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal novel regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

Download Full-text

Clustering gene expression time series data using an infinite Gaussian process mixture model

PLoS Computational Biology ◽

10.1371/journal.pcbi.1005896 ◽

2018 ◽

Vol 14 (1) ◽

pp. e1005896 ◽

Cited By ~ 29

Author(s):

Ian C. McDowell ◽

Dinesh Manandhar ◽

Christopher M. Vockley ◽

Amy K. Schmid ◽

Timothy E. Reddy ◽

...

Keyword(s):

Gene Expression ◽

Time Series ◽

Gaussian Process ◽

Mixture Model ◽

Time Series Data ◽

Series Data ◽

Gene Expression Time Series ◽

Expression Time

Download Full-text

An Integrative DTW-based imputation method for gene expression time series data

2012 6th IEEE INTERNATIONAL CONFERENCE INTELLIGENT SYSTEMS ◽

10.1109/is.2012.6335145 ◽

2012 ◽

Cited By ~ 3

Author(s):

Elena Kostadinova ◽

Veselka Boeva ◽

Liliana Boneva ◽

Elena Tsiporkova

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Imputation Method ◽

Series Data ◽

Gene Expression Time Series ◽

Expression Time

Download Full-text

GeneShelf: A Web-based Visual Interface for Large Gene Expression Time-Series Data Repositories

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2009.146 ◽

2009 ◽

Vol 15 (6) ◽

pp. 905-912 ◽

Cited By ~ 9

Author(s):

Bohyoung Kim ◽

Bongshin Lee ◽

S. Knoblach ◽

E. Hoffman ◽

Jinwook Seo

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Repositories ◽

Web Based ◽

Large Gene ◽

Gene Expression Time Series ◽

Visual Interface ◽

Expression Time

Download Full-text

Jonckheere–Terpstra–Kendall-based non-parametric analysis of temporal differential gene expression

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab021 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Hitoshi Iuchi ◽

Michiaki Hamada

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Course ◽

Time Series Data ◽

Expression Patterns ◽

Detection Methods ◽

Series Data ◽

Expression Levels ◽

Over Time ◽

Non Parametric

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Inference of gene regulatory networks based on nonlinear ordinary differential equations

Bioinformatics ◽

10.1093/bioinformatics/btaa032 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4885-4893 ◽

Cited By ~ 2

Author(s):

Baoshan Ma ◽

Mingkun Fang ◽

Xiangtian Jiao

Keyword(s):

Gene Expression ◽

Time Series ◽

Steady State ◽

Differential Equations ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Time Series Data ◽

Series Data ◽

State Data ◽

Gene Regulatory

Abstract Motivation Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. Results In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. Availability and implementation The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Tutorial to Identify Nonlinear Associations in Gene Expression Time Series Data

Transcription Factor Regulatory Networks - Methods in Molecular Biology ◽

10.1007/978-1-4939-0805-9_8 ◽

2014 ◽

pp. 87-95

Author(s):

André Fujita ◽

Satoru Miyano

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Gene Expression Time Series ◽

Nonlinear Associations ◽

Expression Time

Download Full-text

Discovering Key Transcriptomic Regulators in Pancreatic Ductal Adenocarcinoma using Dirichlet Process Gaussian Mixture Model