scholarly journals System Approach to Understanding the Metabolic Diversity in Melon

Author(s):  
Asaph Aharoni ◽  
Zhangjun Fei ◽  
Efraim Lewinsohn ◽  
Arthur Schaffer ◽  
Yaakov Tadmor

Fruit quality is determined by numerous genetic factors that affect taste, aroma, ‎color, texture, nutritional value and shelf life. To unravel the genetic components ‎involved in the metabolic pathways behind these traits, the major goal of the project was to identify novel genes that are involved in, or that regulate, these pathways using correlation analysis between genotype, metabolite and gene expression data. The original and specific research objectives were: (1) Collection of replicated fruit from a population of 96 RI lines derived from parents distinguished by great diversity in fruit development and quality phenotypes, (2) Phenotypic and metabolic profiling of mature fruit from all 96 RI lines and their parents, (3) 454 pyrosequencing of cDNA representing mRNA of mature fruit from each line to facilitate gene expression analysis based on relative EST abundance, (4) Development of a database modeled after an existing database developed for tomato introgression lines (ILs) to facilitate online data analysis by members of this project and by researchers around the world. The main functions of the database will be to store and present metabolite and gene expression data so that correlations can be drawn between variation in target traits or metabolites across the RI population members and variation in gene expression to identify candidate genes which may impact phenotypic and chemical traits of interest, (5) Selection of RI lines for segregation and/or hybridization (crosses) analysis to ascertain whether or not genes associated with traits through gene expression/metabolite correlation analysis are indeed contributors to said traits. The overall research strategy was to utilize an available recombinant inbred population of melon (Cucumis melo L.) derived from phenotypically diverse parents and for which over 800 molecular markers have been mapped for the association of metabolic trait and gene expression QTLs. Transcriptomic data were obtained by high throughput sequencing using the Illumina platform instead of the originally planned 454 platform. The change was due to the fast advancement and proven advantages of the Illumina platform, as explained in the first annual scientific report. Metabolic data were collected using both targeted (sugars, organic acids, carotenoids) and non-targeted metabolomics analysis methodologies. Genes whose expression patterns were associated with variation of particular metabolites or fruit quality traits represent candidates for the molecular mechanisms that underlie them. Candidate genes that may encode enzymes catalyzingbiosynthetic steps in the production of volatile compounds of interest, downstream catabolic processes of aromatic amino acids and regulatory genes were selected and are in the process of functional analyses. Several of these are genes represent unanticipated effectors of compound accumulation that could not be identified using traditional approaches. According to the original plan, the Cucurbit Genomics Network (http://www.icugi.org/), developed through an earlier BARD project (IS-3333-02), was expanded to serve as a public portal for the extensive metabolomics and transcriptomic data resulting from the current project. Importantly, this database was also expanded to include genomic and metabolomic resources of all the cucurbit crops, including genomes of cucumber and watermelon, EST collections, genetic maps, metabolite data and additional information. In addition, the database provides tools enabling researchers to identify genes, the expression patterns of which correlate with traits of interest. The project has significantly expanded the existing EST resource for melon and provides new molecular tools for marker-assisted selection. This information will be opened to the public by the end of 2013, upon the first publication describing the transcriptomic and metabolomics resources developed through the project. In addition, well-characterized RI lines are available to enable targeted breeding for genes of interest. Segregation of the RI lines for specific metabolites of interest has been shown, demonstrating the utility in these lines and our new molecular and metabolic data as a basis for selection targeting specific flavor, quality, nutritional and/or defensive compounds. To summarize, all the specific goals of the project have been achieved and in many cases exceeded. Large scale trascriptomic and metabolomic resources have been developed for melon and will soon become available to the community. The usefulness of these has been validated. A number of novel genes involved in fruit ripening have been selected and are currently being functionally analyzed. We thus fully addressed our obligations to the project. In our view, however, the potential value of the project outcomes as ultimately manifested may be far greater than originally anticipated. The resources developed and expanded under this project, and the tools created for using them will enable us, and others, to continue to employ resulting data and discoveries in future studies with benefits both in basic and applied agricultural - scientific research.

2020 ◽  
Vol 15 ◽  
Author(s):  
Chen-An Tsai ◽  
James J. Chen

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2009 ◽  
Vol 07 (04) ◽  
pp. 645-661 ◽  
Author(s):  
XIN CHEN

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.


2019 ◽  
Vol 16 (3) ◽  
Author(s):  
Nimisha Asati ◽  
Abhinav Mishra ◽  
Ankita Shukla ◽  
Tiratha Raj Singh

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 201 ◽  
Author(s):  
K Yuvaraj ◽  
D Manjula

Current advancements in microarray technology permit simultaneous observing of the expression levels of huge number of genes over various time points. Microarrays have obtained amazing implication in the field of bioinformatics. It includes an ordered set of huge different Deoxyribonucleic Acid (DNA) sequences that can be used to measure both DNA as well as Ribonucleic Acid (RNA) dissimilarities. The Gene Expression (GE) summary aids in understanding the basic cause of gene activities, the growth of genes, determining recent disorders like cancer and as well analysing their molecular pharmacology. Clustering is a significant tool applied for analyzing such microarray gene expression data.  It has developed into a greatest part of gene expression analysis. Grouping the genes having identical expression patterns is known as gene clustering. A number of clustering algorithms have been applied for the analysis of microarray gene expression data. The aim of this paper is to analyze the precision level of the microarray data by using various clustering algorithms. 


2015 ◽  
Vol 2 (1) ◽  
pp. 58-69 ◽  
Author(s):  
P. K. Nizar Banu ◽  
S. Andrews

Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.


2012 ◽  
Vol 6 ◽  
pp. BBI.S10383
Author(s):  
Priscilla Rajadurai ◽  
Swamynathan Sankaranarayanan

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.


Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 73
Author(s):  
Jaeyeon Jang ◽  
Inseung Hwang ◽  
Inuk Jung

From time course gene expression data, we may identify genes that modulate in a certain pattern across time. Such patterns are advantageous to investigate the transcriptomic response to a certain condition. Especially, it is of interest to compare two or more conditions to detect gene expression patterns that significantly differ between them. Time course analysis can become difficult using traditional differentially expressed gene (DEG) analysis methods since they are based on pair-wise sample comparison instead of a series of time points. Most importantly, the related tools are mostly available as local Software, requiring technical expertise. Here, we present TimesVector-web, which is an easy to use web service for analysing time course gene expression data with multiple conditions. The web-service was developed to (1) alleviate the burden for analyzing multi-class time course data and (2) provide downstream analysis on the results for biological interpretation including TF, miRNA target, gene ontology and pathway analysis. TimesVector-web was validated using three case studies that use both microarray and RNA-seq time course data and showed that the results captured important biological findings from the original studies.


2006 ◽  
Vol 24 (18_suppl) ◽  
pp. 507-507 ◽  
Author(s):  
C. Sotiriou ◽  
P. Wirapati ◽  
S. Loi ◽  
B. Haibe-Kains ◽  
C. Desmedt ◽  
...  

507 Background: Although, the development of high-throughput gene expression technologies has allowed the identification of several “molecular signatures” predicting clinical outcome, no attempt has been made yet to perform a comprehensive analysis integrating both clinicopathological, and gene expression data. Here, we aim to elucidate the relationship between clinical parameters and tumor markers, with gene expression patterns and their interaction with prognosis. Methods: We analyzed gene expression and clinical data from several published studies, including more than 1500 BC patients. We developed several gene expression indices associated with different biological stages of disease characterized by the expression of hormone receptors, HER2 amplification, p53 mutation, angiogenesis, tumor invasion and proliferation. Multivariable analyses were used to characterize the dependency patterns between these indices and their impact on survival. Results: Estrogen receptor (ER) and HER2 indices were the most prominent discriminators dichotomizing tumor samples into two main subsets in agreement with the previously proposed BC subtypes. Tumor proliferation, assessed by our previously reported gene expression index (GGI), was the most strongly associated with prognosis (HR 2.29, CI 1.88–2.78, p<0.0001). Almost all ER- and HER2+ tumors were associated with high GGI scores. In contrast, ER+ and HER2- tumors showed a whole range of GGI values. Within the high proliferation subset, ER- and HER2+ indices did not have any prognostic value. Similar results were found with relation to p53 mutation index. Nodal status and tumor size, which essentially measure the duration of disease, retained prognostic value in addition to proliferation. Conclusions: Proliferation captured by the GGI appears to be a key biological factor, downstream of ER, HER2 and p53. Although understanding the upstream factors is important for advancing biological knowledge and therapeutic interventions, GGI seems to be the most important factor predicting clinical outcome in BC and deserves consideration as stratification factor in clinical trials. No significant financial relationships to disclose.


Sign in / Sign up

Export Citation Format

Share Document