scholarly journals Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series

2018 ◽  
Vol 26 (4) ◽  
pp. 253-267 ◽  
Author(s):  
Sura Zaki Alrashid ◽  
Muhammad Arifur Rahman ◽  
Nabeel H Al-Aaraji ◽  
Neil D Lawrence ◽  
Paul R Heath

Clustering of gene expression time series gives insight into which genes may be co-regulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic background. This paper develops
a new clustering method that allows each cluster to be parameterised according to whether the behaviour of the genes across conditions is correlated or anti-correlated. By specifying correlation between such genes,more information is gain within the cluster about how the genes interrelate. Amyotrophic lateral sclerosis (ALS) is an irreversible neurodegenerative disorder that kills the motor neurons and results in death within 2 to 3 years from the symptom onset. Speed of progression for different patients are heterogeneous with significant variability. The SOD1G93A transgenic mice from different backgrounds (129Sv and C57) showed consistent phenotypic differences for disease progression. A hierarchy of Gaussian isused processes to model condition-specific and gene-specific temporal co-variances. This study demonstrated about finding some significant gene expression profiles and clusters of associated or co-regulated gene expressions together from four groups of data (SOD1G93A and Ntg from 129Sv and C57 backgrounds). Our study shows the effectiveness of sharing information between replicates and different model conditions when modelling gene expression time series. Further gene enrichment score analysis and ontology pathway analysis of some specified clusters for a particular group may lead toward identifying features underlying the differential speed of disease progression.

2007 ◽  
Vol 05 (05) ◽  
pp. 1005-1022 ◽  
Author(s):  
ELENA TSIPORKOVA ◽  
VESELKA BOEVA

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.


2003 ◽  
Vol 83 (4) ◽  
pp. 835-858 ◽  
Author(s):  
Harri Lähdesmäki ◽  
Heikki Huttunen ◽  
Tommi Aho ◽  
Marja-Leena Linne ◽  
Jari Niemi ◽  
...  

2008 ◽  
Vol 7 (1) ◽  
pp. 44-55 ◽  
Author(s):  
Zidong Wang* ◽  
Fuwen Yang ◽  
Daniel W. C. Ho ◽  
Stephen Swift ◽  
Allan Tucker ◽  
...  

2007 ◽  
Vol 17 (07) ◽  
pp. 2477-2483 ◽  
Author(s):  
D. REMONDINI ◽  
N. NERETTI ◽  
C. FRANCESCHI ◽  
P. TIERI ◽  
J. M. SEDIVY ◽  
...  

We address the problem of finding large-scale functional and structural relationships between genes, given a time series of gene expression data, namely mRNA concentration values measured from genetically engineered rat fibroblasts cell lines responding to conditional cMyc proto-oncogene activation. We show how it is possible to retrieve suitable information about molecular mechanisms governing the cell response to conditional perturbations. This task is complex because typical high-throughput genomics experiments are performed with high number of probesets (103–104 genes) and a limited number of observations (< 102 time points). In this paper, we develop a deepest analysis of our previous work [Remondini et al., 2005] in which we characterized some of the main features of a gene-gene interaction network reconstructed from temporal correlation of gene expression time series. One first advancement is based on the comparison of the reconstructed network with networks obtained from randomly generated data, in order to characterize which features retrieve real biological information, and which are instead due to the characteristics of the network reconstruction method. The second and perhaps more relevant advancement is the characterization of the global change in co-expression pattern following cMyc activation as compared to a basal unperturbed state. We propose an analogy with a physical system in a critical state close to a phase transition (e.g. Potts ferromagnet), since the cell responds to the stimulus with high susceptibility, such that a single gene activation propagates to almost the entire genome. Our result is relative to temporal properties of gene network dynamics, and there are experimental evidence that this can be related to spatial properties regarding the global organization of chromatine structure [Knoepfler et al., 2006].


Sign in / Sign up

Export Citation Format

Share Document