Jonckheere–Terpstra–Kendall-based non-parametric analysis of temporal differential gene expression

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

Download Full-text

IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN TIME-COURSE MICROARRAY EXPERIMENT WITHOUT REPLICATE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720007002655 ◽

2007 ◽

Vol 05 (02a) ◽

pp. 281-296 ◽

Cited By ~ 5

Author(s):

XU HAN ◽

WING-KIN SUNG ◽

LIN FENG

Keyword(s):

Time Series ◽

Differentially Expressed Genes ◽

Time Course ◽

Time Series Data ◽

Expression Patterns ◽

Microarray Experiment ◽

Differentially Expressed ◽

Series Data ◽

Gene Expressions ◽

Partial Energy

Replication of time series in microarray experiments is costly. To analyze time series data with no replicate, many model-specific approaches have been proposed. However, they fail to identify the genes whose expression patterns do not fit the pre-defined models. Besides, modeling the temporal expression patterns is difficult when the dynamics of gene expression in the experiment is poorly understood. We propose a method called Partial Energy ratio for Microarray (PEM) for the analysis of time course microarray data. In the PEM method, we assume the gene expressions vary smoothly in the temporal domain. This assumption is comparatively weak and hence the method is general enough to identify genes expressed in unexpected patterns. To identify the differentially expressed genes, a new statistic is developed by comparing the energies of two convoluted profiles. We further improve the statistic for microarray analysis by introducing the concept of partial energy. The PEM statistic can be easily incorporated into the SAM framework for significance analysis. We evaluated the PEM method with an artificial dataset and two published time course cDNA microarray datasets on yeast. The experimental results show the robustness and the generality of the PEM method in identifying the genes of interest.

Download Full-text

A non-parametric model for fuzzy forecasting time series data

Computational and Applied Mathematics ◽

10.1007/s40314-021-01534-2 ◽

2021 ◽

Vol 40 (4) ◽

Author(s):

Gholamreza Hesamian ◽

Mohammad Ghasem Akbari

Keyword(s):

Time Series ◽

Time Series Data ◽

Parametric Model ◽

Series Data ◽

Non Parametric

Download Full-text

An Integrative DTW-based imputation method for gene expression time series data

2012 6th IEEE INTERNATIONAL CONFERENCE INTELLIGENT SYSTEMS ◽

10.1109/is.2012.6335145 ◽

2012 ◽

Cited By ~ 3

Author(s):

Elena Kostadinova ◽

Veselka Boeva ◽

Liliana Boneva ◽

Elena Tsiporkova

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Imputation Method ◽

Series Data ◽

Gene Expression Time Series ◽

Expression Time

Download Full-text

GeneShelf: A Web-based Visual Interface for Large Gene Expression Time-Series Data Repositories

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2009.146 ◽

2009 ◽

Vol 15 (6) ◽

pp. 905-912 ◽

Cited By ~ 9

Author(s):

Bohyoung Kim ◽

Bongshin Lee ◽

S. Knoblach ◽

E. Hoffman ◽

Jinwook Seo

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Repositories ◽

Web Based ◽

Large Gene ◽

Gene Expression Time Series ◽

Visual Interface ◽

Expression Time

Download Full-text

Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0011 ◽

2017 ◽

Vol 14 (2) ◽

Author(s):

Qihua Tan ◽

Mads Thomassen ◽

Mark Burton ◽

Kristian Fredløv Mose ◽

Klaus Ejner Andersen ◽

...

Keyword(s):

Gene Expression ◽

Correlation Coefficient ◽

Parametric Analysis ◽

Time Course ◽

Expression Patterns ◽

Gene Expression Patterns ◽

Microarray Time Course ◽

Time Course Data ◽

Generalized Correlation ◽

Non Parametric

AbstractModeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.

Download Full-text

Evidence Graphs: Supporting Transparent and FAIR Computation, with Defeasible Reasoning on Data, Methods and Results

10.1101/2021.03.29.437561 ◽

2021 ◽

Author(s):

Sadnan Al Manir ◽

Justin Niestroy ◽

Maxwell Adam Levinson ◽

Timothy Clark

Keyword(s):

Time Series ◽

Large Scale ◽

Time Series Data ◽

Predictive Analytics ◽

Defeasible Reasoning ◽

Series Data ◽

Inference Rules ◽

Deep Networks ◽

Evidence Graph ◽

Over Time

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Periodicity Detection Method for Small-Sample Time Series Datasets

Bioinformatics and Biology Insights ◽

10.4137/bbi.s5983 ◽

2010 ◽

Vol 4 ◽

pp. BBI.S5983 ◽

Cited By ~ 3

Author(s):

Daisuke Tominaga

Keyword(s):

Time Series ◽

Time Series Data ◽

Small Sample ◽

Detection Methods ◽

Series Data ◽

Computational Time ◽

Signal Pathways ◽

Periodicity Detection ◽

Sampling Points ◽

Sample Time

Time series of gene expression often exhibit periodic behavior under the influence of multiple signal pathways, and are represented by a model that incorporates multiple harmonics and noise. Most of these data, which are observed using DNA microarrays, consist of few sampling points in time, but most periodicity detection methods require a relatively large number of sampling points. We have previously developed a detection algorithm based on the discrete Fourier transform and Akaike's information criterion. Here we demonstrate the performance of the algorithm for small-sample time series data through a comparison with conventional and newly proposed periodicity detection methods based on a statistical analysis of the power of harmonics. We show that this method has higher sensitivity for data consisting of multiple harmonics, and is more robust against noise than other methods. Although “combinatorial explosion” occurs for large datasets, the computational time is not a problem for small-sample datasets. The MATLAB/GNU Octave script of the algorithm is available on the author's web site: http://www.cbrc.jp/%7Etominaga/piccolo/ .

Download Full-text

P–368 Dynamic metabolomic profiling during early implantation period

Human Reproduction ◽

10.1093/humrep/deab130.367 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Y Zhang ◽

Y W Zhao ◽

C C Wang ◽

T C Li

Keyword(s):

Mass Spectrometry ◽

Time Series ◽

Embryo Transfer ◽

Study Design ◽

Empirical Bayes ◽

Time Course ◽

Time Series Data ◽

Series Data ◽

Metabolomic Profiling ◽

Implantation Period

Abstract Study question To investigate the different metabolomic profiling in serum between pregnant and non-pregnant women during early implantation period. Summary answer Metabolomics of progesterone-related hormones enhances from ET day3 for pregnancy women compared with non-pregnancy women. What is known already Metabolomics is based on high-throughput analytical methods to identify and quantify metabolites. Compared to other omics study, metabolomics is the closest one to the phenotype, allowing the observation of dynamic changes in phenotype at specific timepoints. So far there is no published work about the metabolomics profile in human early implantation period. Study design, size, duration: Study design: comparative study. Size: 14 pregnancy women and 14 non-pregnancy women. duration: time-course. Participants/materials, setting, methods Participants: pregnancy women and unpregnancy women after embryo transfer (ET). Setting: university-based study. Methods: Peripheral blood were collected at ET day0, 3, 6 and 9. metabolomic profiling in serum by platforms of capillary electrophoresis-mass spectrometry (CE-MS) and liquid chromatography–mass spectrometry (LC-MS). Main results and the role of chance There were no statistical difference of the age, BMI, basal FSH level, endometrium thickness on the day of embryo transfer, distribution of primary and secondary fertility, embryo transfer cycle as well as the infertile types between the two groups. After deleting those with over 50% missing data, we finally have 310 metabolites into statistical analysis. Among the 310 metabolite, lipid metabolites account the largest percentage, nearly half of all metabolites. The second biggest class of metabolites in our data was organic acids. Combined results in repeated measurement ANOVA (RM-ANOVA) and ANOVA-simultaneous component analysis (ASCA) as well as multivariate empirical Bayes time-series analysis (MEBA), we finally found that progesterone-related hormones were the most important metabolites for the whole time-series data. Those significant metabolites showed a significant down regulation from ET day0 to ET day3 and up regulation from ET day3 to ET day9. Limitations, reasons for caution we have limited sample size for this study and further validation is necessary for confirmation. Wider implications of the findings: The phenomenon of upregulation of progesterone-related hormones from day3 in pregnancy group might be related to the embryo-originated hcg. Because the embryo has entered into endometrium at day3 and produced cytokines, hcg and other interaction with endometrium. Trial registration number NA

Download Full-text

Inference of gene regulatory networks based on nonlinear ordinary differential equations

Bioinformatics ◽

10.1093/bioinformatics/btaa032 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4885-4893 ◽

Cited By ~ 2

Author(s):

Baoshan Ma ◽

Mingkun Fang ◽

Xiangtian Jiao

Keyword(s):

Gene Expression ◽

Time Series ◽

Steady State ◽

Differential Equations ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Time Series Data ◽

Series Data ◽

State Data ◽

Gene Regulatory

Abstract Motivation Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. Results In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. Availability and implementation The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text