A resource for analyzing C. elegans’ gene expression data using transcriptional gene modules and module-weighted annotations

Mapping Intimacies ◽

10.1101/678482 ◽

2019 ◽

Author(s):

Michael Cary ◽

Katie Podshivalova ◽

Cynthia Kenyon

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Patterns ◽

Expression Data ◽

Functional Interpretation ◽

C Elegans ◽

Experimental Organism ◽

Gene Modules ◽

Term Analysis ◽

Do So

AbstractIdentification of gene co-expression patterns (gene modules) is widely used for grouping functionally-related genes during transcriptomic data analysis. An organism-wide atlas of high quality fundamental gene modules would provide a powerful tool for unbiased detection of biological signals from gene expression data. Here, using a method of independent component analysis we call DEXICA, we have defined and optimized 209 modules that broadly represent transcriptional wiring of the key experimental organism C. elegans. Interrogation of these modules reveals processes that are activated in long-lived mutants in cases where traditional analyses of differentially-expressed genes fail to do so. Using this resource, users can easily identify active modules in their gene expression data and access detailed descriptions of each module. Additionally, we show that modules can inform the strength of the association between a gene and an annotation (e.g. GO term). Analysis of “module-weighted annotations” improves on several aspects of traditional annotation-enrichment tests and can aid in functional interpretation of poorly annotated genes. Interactive access to the resource is provided at http://genemodules.org/.

Application of Transcriptional Gene Modules to Analysis of Caenorhabditis elegans’ Gene Expression Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401270 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3623-3638

Author(s):

Michael Cary ◽

Katie Podshivalova ◽

Cynthia Kenyon

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Transcriptional Responses ◽

Functional Interpretation ◽

C Elegans ◽

Link Type ◽

Muscle Genes ◽

Gene Modules ◽

Or Gene

Identification of co-expressed sets of genes (gene modules) is used widely for grouping functionally related genes during transcriptomic data analysis. An organism-wide atlas of high-quality gene modules would provide a powerful tool for unbiased detection of biological signals from gene expression data. Here, using a method based on independent component analysis we call DEXICA, we have defined and optimized 209 modules that broadly represent transcriptional wiring of the key experimental organism C. elegans. These modules represent responses to changes in the environment (e.g., starvation, exposure to xenobiotics), genes regulated by transcriptions factors (e.g., ATFS-1, DAF-16), genes specific to tissues (e.g., neurons, muscle), genes that change during development, and other complex transcriptional responses to genetic, environmental and temporal perturbations. Interrogation of these modules reveals processes that are activated in long-lived mutants in cases where traditional analyses of differentially expressed genes fail to do so. Additionally, we show that modules can inform the strength of the association between a gene and an annotation (e.g., GO term). Analysis of “module-weighted annotations” improves on several aspects of traditional annotation-enrichment tests and can aid in functional interpretation of poorly annotated genes. We provide an online interactive resource with tutorials at http://genemodules.org/, in which users can find detailed information on each module, check genes for module-weighted annotations, and use both of these to analyze their own gene expression data (generated using any platform) or gene sets of interest.

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

CURVE-BASED CLUSTERING OF TIME COURSE GENE EXPRESSION DATA USING SELF-ORGANIZING MAPS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004291 ◽

2009 ◽

Vol 07 (04) ◽

pp. 645-661 ◽

Cited By ~ 11

Author(s):

XIN CHEN

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Regulatory Networks ◽

Time Course ◽

Clustering Algorithm ◽

Expression Patterns ◽

Self Organizing Map ◽

Expression Data ◽

Wide Range ◽

Self Organizing

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.

Synaptic polarity and sign-balance prediction using gene expression data in the Caenorhabditis elegans chemical synapse neuronal connectome network

10.1101/2020.05.22.110312 ◽

2020 ◽

Author(s):

Bánk G. Fenyves ◽

Gábor S. Szilágyi ◽

Zsolt Vassy ◽

Csaba Sőti ◽

Péter Csermely

Keyword(s):

Gene Expression ◽

Caenorhabditis Elegans ◽

Gene Expression Data ◽

Temporal Dynamics ◽

Receptor Gene ◽

Inhibitory Synapses ◽

Directed Network ◽

Chemical Synapse ◽

Expression Data ◽

C Elegans

AbstractGraph theoretical analyses of nervous systems usually omit the aspect of connection polarity, due to data insufficiency. The chemical synapse network of Caenorhabditis elegans is a well-reconstructed directed network, but the signs of its connections are yet to be elucidated. Here, we present the gene expression-based sign prediction of the C. elegans connectome, incorporating presynaptic neurotransmitter and postsynaptic receptor gene expression data (3,638 connections and 20,589 synapses total). We made successful predictions for more than two-thirds of all chemical synapses and determined a ratio of excitatory-inhibitory (E:I) interneuronal ionotropic chemical connections close to 4:1 which was found similar to that observed in many real-world networks. Our open source tool (http://EleganSign.linkgroup.hu) is simple but efficient in predicting polarities by integrating neuronal connectome and gene expression data.Author SummaryThe fundamental way neurons communicate is by activating or inhibiting each other via synapses. The balance between the two is crucial for the optimal functioning of a nervous system. However, whole-brain synaptic polarity information is unavailable for any species and experimental validation is challenging. The roundworm Caenorhabditis elegans possesses a fully mapped connectome with a comprehensive gene expression profile of its 302 neurons. Based on the consideration that the polarity of a synapse must be determined by the neurotransmitter(s) expressed in the presynaptic neuron and the receptors expressed in the postsynaptic neuron, we conceptualized and created a tool that predicts synaptic polarities based on connectivity and gene expression information. We were able to show for the first time that the ratio of excitatory and inhibitory synapses in C. elegans is around 4 to 1 which is in line with the balance observed in many natural systems. Our method opens a way to include spatial and temporal dynamics of synaptic polarity that would add a new dimension of plasticity in the excitatory:inhibitory balance. Our tool is freely available to be used on any network accompanied by any expression atlas.

Gene Expression Studies to Identify Significant Genes in AR, MTOR, MAPK Pathways and their Overlapping Regulatory Role in Prostate Cancer

Journal of Integrative Bioinformatics ◽

10.1515/jib-2018-0080 ◽

2019 ◽

Vol 16 (3) ◽

Author(s):

Nimisha Asati ◽

Abhinav Mishra ◽

Ankita Shukla ◽

Tiratha Raj Singh

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Meta Analysis ◽

Expression Patterns ◽

Mitogen Activated Protein Kinase ◽

Expression Data ◽

Mapk Pathways ◽

Expression Studies ◽

Gene Expression Studies

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.

A performance analysis of clustering based algorithms for the microarray gene expression data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12172 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 201 ◽

Cited By ~ 2

Author(s):

K Yuvaraj ◽

D Manjula

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Dna Sequences ◽

Clustering Algorithms ◽

Expression Patterns ◽

Microarray Gene Expression Data ◽

Gene Clustering ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Current advancements in microarray technology permit simultaneous observing of the expression levels of huge number of genes over various time points. Microarrays have obtained amazing implication in the field of bioinformatics. It includes an ordered set of huge different Deoxyribonucleic Acid (DNA) sequences that can be used to measure both DNA as well as Ribonucleic Acid (RNA) dissimilarities. The Gene Expression (GE) summary aids in understanding the basic cause of gene activities, the growth of genes, determining recent disorders like cancer and as well analysing their molecular pharmacology. Clustering is a significant tool applied for analyzing such microarray gene expression data. It has developed into a greatest part of gene expression analysis. Grouping the genes having identical expression patterns is known as gene clustering. A number of clustering algorithms have been applied for the analysis of microarray gene expression data. The aim of this paper is to analyze the precision level of the microarray data by using various clustering algorithms.

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2015010104 ◽

2015 ◽

Vol 2 (1) ◽

pp. 58-69 ◽

Cited By ~ 8

Author(s):

P. K. Nizar Banu ◽

S. Andrews

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Mean Squared Error ◽

Clustering Algorithms ◽

Expression Patterns ◽

Absolute Error ◽

Tumor Formation ◽

Expression Data ◽

Self Organizing Maps ◽

Soft Clustering

Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.

A Fast Quad-Tree Based Two Dimensional Hierarchical Clustering

Bioinformatics and Biology Insights ◽

10.4137/bbi.s10383 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S10383

Author(s):

Priscilla Rajadurai ◽

Swamynathan Sankaranarayanan

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Gene Expression Data ◽

Biological Networks ◽

Processing Time ◽

Clustering Algorithm ◽

Expression Patterns ◽

Expression Data ◽

Important Time ◽

Analogous Expression

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

TimesVector-Web: A Web Service for Analysing Time Course Transcriptome Data with Multiple Conditions

Genes ◽

10.3390/genes13010073 ◽

2021 ◽

Vol 13 (1) ◽

pp. 73

Author(s):

Jaeyeon Jang ◽

Inseung Hwang ◽

Inuk Jung

Keyword(s):

Gene Expression ◽

Web Service ◽

Gene Expression Data ◽

Time Course ◽

Expression Patterns ◽

Differentially Expressed Gene ◽

Expression Data ◽

Transcriptomic Response ◽

Time Course Data ◽

Multiple Conditions

From time course gene expression data, we may identify genes that modulate in a certain pattern across time. Such patterns are advantageous to investigate the transcriptomic response to a certain condition. Especially, it is of interest to compare two or more conditions to detect gene expression patterns that significantly differ between them. Time course analysis can become difficult using traditional differentially expressed gene (DEG) analysis methods since they are based on pair-wise sample comparison instead of a series of time points. Most importantly, the related tools are mostly available as local Software, requiring technical expertise. Here, we present TimesVector-web, which is an easy to use web service for analysing time course gene expression data with multiple conditions. The web-service was developed to (1) alleviate the burden for analyzing multi-class time course data and (2) provide downstream analysis on the results for biological interpretation including TF, miRNA target, gene ontology and pathway analysis. TimesVector-web was validated using three case studies that use both microarray and RNA-seq time course data and showed that the results captured important biological findings from the original studies.

Comprehensive analysis integrating both clinicopathological and gene expression data in more than 1,500 samples: Proliferation captured by gene expression grade index appears to be the strongest prognostic factor in breast cancer (BC)

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.507 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 507-507 ◽

Cited By ~ 3

Author(s):

C. Sotiriou ◽

P. Wirapati ◽

S. Loi ◽

B. Haibe-Kains ◽

C. Desmedt ◽

...

Keyword(s):

Gene Expression ◽

Clinical Outcome ◽

Gene Expression Data ◽

Prognostic Value ◽

Expression Patterns ◽

P53 Mutation ◽

Comprehensive Analysis ◽

Biological Factor ◽

Biological Knowledge ◽

Expression Data

507 Background: Although, the development of high-throughput gene expression technologies has allowed the identification of several “molecular signatures” predicting clinical outcome, no attempt has been made yet to perform a comprehensive analysis integrating both clinicopathological, and gene expression data. Here, we aim to elucidate the relationship between clinical parameters and tumor markers, with gene expression patterns and their interaction with prognosis. Methods: We analyzed gene expression and clinical data from several published studies, including more than 1500 BC patients. We developed several gene expression indices associated with different biological stages of disease characterized by the expression of hormone receptors, HER2 amplification, p53 mutation, angiogenesis, tumor invasion and proliferation. Multivariable analyses were used to characterize the dependency patterns between these indices and their impact on survival. Results: Estrogen receptor (ER) and HER2 indices were the most prominent discriminators dichotomizing tumor samples into two main subsets in agreement with the previously proposed BC subtypes. Tumor proliferation, assessed by our previously reported gene expression index (GGI), was the most strongly associated with prognosis (HR 2.29, CI 1.88–2.78, p<0.0001). Almost all ER- and HER2+ tumors were associated with high GGI scores. In contrast, ER+ and HER2- tumors showed a whole range of GGI values. Within the high proliferation subset, ER- and HER2+ indices did not have any prognostic value. Similar results were found with relation to p53 mutation index. Nodal status and tumor size, which essentially measure the duration of disease, retained prognostic value in addition to proliferation. Conclusions: Proliferation captured by the GGI appears to be a key biological factor, downstream of ER, HER2 and p53. Although understanding the upstream factors is important for advancing biological knowledge and therapeutic interventions, GGI seems to be the most important factor predicting clinical outcome in BC and deserves consideration as stratification factor in clinical trials. No significant financial relationships to disclose.