Mining Frequent and Associated Gene Expression Patterns from Spatial Gene Expression Data: A Proposed Approach

Mapping global shifts in Saccharomyces cerevisiae gene expression across asynchronous time trajectories with diffusion maps

10.1101/2021.02.11.430862 ◽

2021 ◽

Author(s):

Taylor Reiter ◽

Rachel Montpetit ◽

Ron Runnebaum ◽

C. Titus Brown ◽

Ben Montpetit

Keyword(s):

Gene Expression ◽

Saccharomyces Cerevisiae ◽

Expression Patterns ◽

Pinot Noir ◽

Wine Fermentation ◽

Gene Expression Patterns ◽

Expression Data ◽

Site Specific ◽

Time Series Gene Expression ◽

Specific Factors

AbstractGrapes grown in a particular geographic region often produce wines with consistent characteristics, suggesting there are site-specific factors driving recurrent fermentation outcomes. However, our understanding of the relationship between site-specific factors, microbial metabolism, and wine fermentation outcomes are not well understood. Here, we used differences in Saccharomyces cerevisiae gene expression as a biosensor for differences among Pinot noir fermentations from 15 vineyard sites. We profiled time series gene expression patterns of primary fermentations, but fermentations proceeded at different rates, making analyzes of these data with conventional differential expression tools difficult. This led us to develop a novel approach that combines diffusion mapping with continuous differential expression analysis. Using this method, we identified vineyard specific deviations in gene expression, including changes in gene expression correlated with the activity of the non-Saccharomyces yeast Hanseniaspora uvarum, as well as with initial nitrogen concentrations in grape musts. These results highlight novel relationships between site-specific variables and Saccharomyces cerevisiae gene expression that are linked to repeated wine fermentation outcomes. In addition, we demonstrate that our analysis approach can extract biologically relevant gene expression patterns in other contexts (e.g., hypoxic response of Saccharomyces cerevisiae), indicating that this approach offers a general method for investigating asynchronous time series gene expression data.ImportanceWhile it is generally accepted that foods, in particular wine, possess sensory characteristics associated with or derived from their place of origin, we lack knowledge of the biotic and abiotic factors central to this phenomenon. We have used Saccharomyces cerevisiae gene expression as a biosensor to capture differences in fermentations of Pinot noir grapes from 15 vineyards across two vintages. We find that gene expression by non-Saccharomyces yeasts and initial nitrogen content in the grape must correlates with differences in gene expression among fermentations from these vintages. These findings highlight important relationships between site-specific variables and gene expression that can be used to understand, or possibly modify, wine fermentation outcomes. Our work also provides a novel analysis method for investigating asynchronous gene expression data sets that is able to reveal both global shifts and subtle differences in gene expression due to varied cell – environment interactions.

Download Full-text

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Download Full-text

CURVE-BASED CLUSTERING OF TIME COURSE GENE EXPRESSION DATA USING SELF-ORGANIZING MAPS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004291 ◽

2009 ◽

Vol 07 (04) ◽

pp. 645-661 ◽

Cited By ~ 11

Author(s):

XIN CHEN

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Regulatory Networks ◽

Time Course ◽

Clustering Algorithm ◽

Expression Patterns ◽

Self Organizing Map ◽

Expression Data ◽

Wide Range ◽

Self Organizing

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.

Download Full-text

Gene Expression Studies to Identify Significant Genes in AR, MTOR, MAPK Pathways and their Overlapping Regulatory Role in Prostate Cancer

Journal of Integrative Bioinformatics ◽

10.1515/jib-2018-0080 ◽

2019 ◽

Vol 16 (3) ◽

Author(s):

Nimisha Asati ◽

Abhinav Mishra ◽

Ankita Shukla ◽

Tiratha Raj Singh

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Meta Analysis ◽

Expression Patterns ◽

Mitogen Activated Protein Kinase ◽

Expression Data ◽

Mapk Pathways ◽

Expression Studies ◽

Gene Expression Studies

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.

Download Full-text

A performance analysis of clustering based algorithms for the microarray gene expression data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12172 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 201 ◽

Cited By ~ 2

Author(s):

K Yuvaraj ◽

D Manjula

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Dna Sequences ◽

Clustering Algorithms ◽

Expression Patterns ◽

Microarray Gene Expression Data ◽

Gene Clustering ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Current advancements in microarray technology permit simultaneous observing of the expression levels of huge number of genes over various time points. Microarrays have obtained amazing implication in the field of bioinformatics. It includes an ordered set of huge different Deoxyribonucleic Acid (DNA) sequences that can be used to measure both DNA as well as Ribonucleic Acid (RNA) dissimilarities. The Gene Expression (GE) summary aids in understanding the basic cause of gene activities, the growth of genes, determining recent disorders like cancer and as well analysing their molecular pharmacology. Clustering is a significant tool applied for analyzing such microarray gene expression data. It has developed into a greatest part of gene expression analysis. Grouping the genes having identical expression patterns is known as gene clustering. A number of clustering algorithms have been applied for the analysis of microarray gene expression data. The aim of this paper is to analyze the precision level of the microarray data by using various clustering algorithms.

Download Full-text

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2015010104 ◽

2015 ◽

Vol 2 (1) ◽

pp. 58-69 ◽

Cited By ~ 8

Author(s):

P. K. Nizar Banu ◽

S. Andrews

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Mean Squared Error ◽

Clustering Algorithms ◽

Expression Patterns ◽

Absolute Error ◽

Tumor Formation ◽

Expression Data ◽

Self Organizing Maps ◽

Soft Clustering

Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.

Download Full-text

A Fast Quad-Tree Based Two Dimensional Hierarchical Clustering

Bioinformatics and Biology Insights ◽

10.4137/bbi.s10383 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S10383

Author(s):

Priscilla Rajadurai ◽

Swamynathan Sankaranarayanan

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Gene Expression Data ◽

Biological Networks ◽

Processing Time ◽

Clustering Algorithm ◽

Expression Patterns ◽

Expression Data ◽

Important Time ◽

Analogous Expression

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

Download Full-text

TimesVector-Web: A Web Service for Analysing Time Course Transcriptome Data with Multiple Conditions

Genes ◽

10.3390/genes13010073 ◽

2021 ◽

Vol 13 (1) ◽

pp. 73

Author(s):

Jaeyeon Jang ◽

Inseung Hwang ◽

Inuk Jung

Keyword(s):

Gene Expression ◽

Web Service ◽

Gene Expression Data ◽

Time Course ◽

Expression Patterns ◽

Differentially Expressed Gene ◽

Expression Data ◽

Transcriptomic Response ◽

Time Course Data ◽

Multiple Conditions

From time course gene expression data, we may identify genes that modulate in a certain pattern across time. Such patterns are advantageous to investigate the transcriptomic response to a certain condition. Especially, it is of interest to compare two or more conditions to detect gene expression patterns that significantly differ between them. Time course analysis can become difficult using traditional differentially expressed gene (DEG) analysis methods since they are based on pair-wise sample comparison instead of a series of time points. Most importantly, the related tools are mostly available as local Software, requiring technical expertise. Here, we present TimesVector-web, which is an easy to use web service for analysing time course gene expression data with multiple conditions. The web-service was developed to (1) alleviate the burden for analyzing multi-class time course data and (2) provide downstream analysis on the results for biological interpretation including TF, miRNA target, gene ontology and pathway analysis. TimesVector-web was validated using three case studies that use both microarray and RNA-seq time course data and showed that the results captured important biological findings from the original studies.

Download Full-text

Comprehensive analysis integrating both clinicopathological and gene expression data in more than 1,500 samples: Proliferation captured by gene expression grade index appears to be the strongest prognostic factor in breast cancer (BC)

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.507 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 507-507 ◽

Cited By ~ 3

Author(s):

C. Sotiriou ◽

P. Wirapati ◽

S. Loi ◽

B. Haibe-Kains ◽

C. Desmedt ◽

...

Keyword(s):

Gene Expression ◽

Clinical Outcome ◽

Gene Expression Data ◽

Prognostic Value ◽

Expression Patterns ◽

P53 Mutation ◽

Comprehensive Analysis ◽

Biological Factor ◽

Biological Knowledge ◽

Expression Data

507 Background: Although, the development of high-throughput gene expression technologies has allowed the identification of several “molecular signatures” predicting clinical outcome, no attempt has been made yet to perform a comprehensive analysis integrating both clinicopathological, and gene expression data. Here, we aim to elucidate the relationship between clinical parameters and tumor markers, with gene expression patterns and their interaction with prognosis. Methods: We analyzed gene expression and clinical data from several published studies, including more than 1500 BC patients. We developed several gene expression indices associated with different biological stages of disease characterized by the expression of hormone receptors, HER2 amplification, p53 mutation, angiogenesis, tumor invasion and proliferation. Multivariable analyses were used to characterize the dependency patterns between these indices and their impact on survival. Results: Estrogen receptor (ER) and HER2 indices were the most prominent discriminators dichotomizing tumor samples into two main subsets in agreement with the previously proposed BC subtypes. Tumor proliferation, assessed by our previously reported gene expression index (GGI), was the most strongly associated with prognosis (HR 2.29, CI 1.88–2.78, p<0.0001). Almost all ER- and HER2+ tumors were associated with high GGI scores. In contrast, ER+ and HER2- tumors showed a whole range of GGI values. Within the high proliferation subset, ER- and HER2+ indices did not have any prognostic value. Similar results were found with relation to p53 mutation index. Nodal status and tumor size, which essentially measure the duration of disease, retained prognostic value in addition to proliferation. Conclusions: Proliferation captured by the GGI appears to be a key biological factor, downstream of ER, HER2 and p53. Although understanding the upstream factors is important for advancing biological knowledge and therapeutic interventions, GGI seems to be the most important factor predicting clinical outcome in BC and deserves consideration as stratification factor in clinical trials. No significant financial relationships to disclose.

Download Full-text

Discovering the molecular differences between right- and left-sided colon cancer using machine learning methods

BMC Cancer ◽

10.1186/s12885-020-07507-8 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Yimei Jiang ◽

Xiaowei Yan ◽

Kun Liu ◽

Yiqing Shi ◽

Changgang Wang ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Colon Cancer ◽

Gene Mutation ◽

Gene Expression Data ◽

Expression Patterns ◽

High Accuracy ◽

Braf V600e ◽

Expression Data ◽

Mutation Data

Abstract Background In recent years, the differences between left-sided colon cancer (LCC) and right-sided colon cancer (RCC) have received increasing attention due to the clinicopathological variation between them. However, some of these differences have remained unclear and conflicting results have been reported. Methods From The Cancer Genome Atlas (TCGA), we obtained RNA sequencing data and gene mutation data on 323 and 283 colon cancer patients, respectively. Differential analysis was firstly done on gene expression data and mutation data between LCC and RCC, separately. Machine learning (ML) methods were then used to select key genes or mutations as features to construct models to classify LCC and RCC patients. Finally, we conducted correlation analysis to identify the correlations between differentially expressed genes (DEGs) and mutations using logistic regression (LR) models. Results We found distinct gene mutation and expression patterns between LCC and RCC patients and further selected the 30 most important mutations and 17 most important gene expression features using ML methods. The classification models created using these features classified LCC and RCC patients with high accuracy (areas under the curve (AUC) of 0.8 and 0.96 for mutation and gene expression data, respectively). The expression of PRAC1 and BRAF V600E mutation (rs113488022) were the most important feature for each model. Correlations of mutations and gene expression data were also identified using LR models. Among them, rs113488022 was found to have significance relevance to the expression of four genes, and thus should be focused on in further study. Conclusions On the basis of ML methods, we found some key molecular differences between LCC and RCC, which could differentiate these two groups of patients with high accuracy. These differences might be key factors behind the variation in clinical features between LCC and RCC and thus help to improve treatment, such as determining the appropriate therapy for patients.

Download Full-text