Optimizing the use of gene expression data to predict plant metabolic pathway memberships

2020 ◽  
Author(s):  
Peipei Wang ◽  
Bethany M. Moore ◽  
Sahra Uygun ◽  
Melissa D. Lehti-Shiu ◽  
Cornelius S. Barry ◽  
...  

AbstractPlant metabolites produced via diverse pathways are important for plant survival, human nutrition and medicine. However, the pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilizing >600 expression values and similarity data combinations from tomato, three strategies for predicting membership in 85 pathways were explored: naive prediction (identifying pathways with the most similarly expressed genes), unsupervised and supervised learning. Optimal predictions for different pathways require distinct data combinations that, in some cases, are indicative of biological processes relevant to pathway functions. Naive prediction produced higher error rates compared with machine learning methods. In 52 pathways, unsupervised learning performed better than a supervised approach, which may be due to the limited availability of training data. Furthermore, using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on gene expression levels. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximize the accuracy of metabolic pathway membership assignment. We anticipate that the prediction framework outlined here can be applied to other species and also be used to improve plant pathway annotation.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Marcelo P. Segura-Lepe ◽  
Hector C. Keun ◽  
Timothy M. D. Ebbels

Abstract Background Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a ‘pathway space’. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity. Results Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases. Conclusions Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.


2019 ◽  
Author(s):  
Anna Mikhaylova ◽  
Timothy Thornton

AbstractPredicting gene expression with genetic data has garnered significant attention in recent years. PrediXcan is one of the most widely used gene-based association methods for testing imputed gene expression values with a phenotype due to the invaluable insight the method has shown into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The prediction models for PrediXcan, however, were obtained using supervised machine learning methods and training data from the Depression and Gene Network (DGN) and the Genotype-Tissue Expression (GTEx) data, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we assess the accuracy of gene expression predictions with PrediXcan in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Health and Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European populations. Prediction results are obtained using a range of models from PrediXcan weight databases, and Pearson’s correlation coefficient is used to measure prediction accuracy. We demonstrate that the predictive performance of PrediXcan varies across populations (F-test p-value < 0.001), where prediction accuracy is the worst in the Yoruban sample compared to European samples. Moreover, the performance of PrediXcan varies not only among distant populations, but also among closely related populations as well. We also find that the qualitative performance of PrediXcan for the populations considered is consistent across all weight databases used.


2019 ◽  
Author(s):  
Yu-Hao Zhao ◽  
Jin-Peng Wang ◽  
Jia-Qing Yuan ◽  
Jing Li ◽  
Wei-Na Ge ◽  
...  

Abstract Background Autopolyploids refer to the increase in the genome from the same species, usually produced by direct doubling of diploid chromosomes. The polyploid formed by chromosome doubling of the same species is called homologous polyploid. Results In order to further check whether the Salicaceae-common tetraploid is homologous or heterologous, with grape as the outer group, by performing gene collinearity analysis, we explored whether two sets of poplar chromosomes or chromosomal regions have balanced gene expression levels and similar gene function. Paired T-test showed that duplicated genes in colinearity were balanced in expression, which is expected if the tetraploid ancestor was homologous whole-genome duplication, or autopolyploidization. Moreover, KEGG enrichment analysis and pathway annotation showed that most of the differentially expressed genes were related to metabolism. A comparison of different groups of flowering plants suggests that autopolyploidization may not provide comparable biological and evolutionary vigor to establish large plant groups, as observed in poaceae and brassicaceae families. The present analysis contributes to understanding the biology and evolution of Salicacea plants and beyond. Conclusions There was no significant difference in gene expression and gene function between two sets of genomes of poplar.


Genes ◽  
2019 ◽  
Vol 10 (4) ◽  
pp. 319 ◽  
Author(s):  
Bhise ◽  
Elsayed ◽  
Cao ◽  
Pounds ◽  
Lamba

Nucleoside analog, cytarabine (ara-C) is the mainstay of acute myeloid leukemia (AML) chemotherapy. Cytarabine and other nucleoside analogs require activation to the triphosphate form (ara-CTP). Intracellular ara-CTP levels demonstrate significant inter-patient variation and have been related to therapeutic response in AML patients. Inter-patient variation in expression levels of drug transporters or enzymes involved in their activation or inactivation of cytarabine and other analogs is a prime mechanism contributing to development of drug resistance. Since microRNAs (miRNAs) are known to regulate gene-expression, the aim of this study was to identify miRNAs involved in regulation of messenger RNA expression levels of cytarabine pathway genes. We evaluated miRNA and gene-expression levels of cytarabine metabolic pathway genes in 8 AML cell lines and The Cancer Genome Atlas (TCGA) data base. Using correlation analysis and functional validation experiments, our data demonstrates that miR-34a-5p and miR-24-3p regulate DCK, an enzyme involved in activation of cytarabine and DCDT, an enzyme involved in metabolic inactivation of cytarabine expression, respectively. Further our results from gel shift assays confirmed binding of these mRNA-miRNA pairs. Our results show miRNA mediated regulation of gene expression levels of nucleoside metabolic pathway genes can impact interindividual variation in expression levels which in turn may influence treatment outcomes.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Dan Zhang ◽  
Yan Guo ◽  
Ni Xie

Abstract Background Abnormal metabolic pathways have been considered as one of the hallmarks of cancer. While numerous metabolic pathways have been studied in various cancers, the direct link between metabolic pathway gene expression and cancer prognosis has not been established. Results Using two recently developed bioinformatics analysis methods, we evaluated the prognosis potential of metabolic pathway expression and tumor-vs-normal dysregulations for up to 29 metabolic pathways in 33 cancer types. Results show that increased metabolic gene expression within tumors corresponds to poor cancer prognosis. Meta differential co-expression analysis identified four metabolic pathways with significant global co-expression network disturbance between tumor and normal samples. Differential expression analysis of metabolic pathways also demonstrated strong gene expression disturbance between paired tumor and normal samples. Conclusion Taken together, these results strongly suggested that metabolic pathway gene expressions are disturbed after tumorigenesis. Within tumors, many metabolic pathways are upregulated for tumor cells to activate corresponding metabolisms to sustain the required energy for cell division.


2021 ◽  
Vol 7 (7) ◽  
pp. eabe1767
Author(s):  
Tatyana E. Saleski ◽  
Meng Ting Chung ◽  
David N. Carruthers ◽  
Azzaya Khasbaatar ◽  
Katsuo Kurabayashi ◽  
...  

Chromosomal integration of recombinant genes is desirable compared with expression from plasmids due to increased stability, reduced cell-to-cell variability, and elimination of the need for antibiotics for plasmid maintenance. Here, we present a new approach for tuning pathway gene expression levels via random integration and high-throughput screening. We demonstrate multiplexed gene integration and expression-level optimization for isobutanol production in Escherichia coli. The integrated strains could, with far lower expression levels than plasmid-based expression, produce high titers (10.0 ± 0.9 g/liter isobutanol in 48 hours) and yields (69% of the theoretical maximum). Close examination of pathway expression in the top-performing, as well as other isolates, reveals the complexity of cellular metabolism and regulation, underscoring the need for precise optimization while integrating pathway genes into the chromosome. We expect this method for pathway integration and optimization can be readily extended to a wide range of pathways and chassis to create robust and efficient production strains.


2021 ◽  
Author(s):  
Pavel V. Mazin ◽  
Philipp Khaitovich ◽  
Margarida Cardoso-Moreira ◽  
Henrik Kaessmann

AbstractAlternative splicing (AS) is pervasive in mammalian genomes, yet cross-species comparisons have been largely restricted to adult tissues and the functionality of most AS events remains unclear. We assessed AS patterns across pre- and postnatal development of seven organs in six mammals and a bird. Our analyses revealed that developmentally dynamic AS events, which are especially prevalent in the brain, are substantially more conserved than nondynamic ones. Cassette exons with increasing inclusion frequencies during development show the strongest signals of conserved and regulated AS. Newly emerged cassette exons are typically incorporated late in testis development, but those retained during evolution are predominantly brain specific. Our work suggests that an intricate interplay of programs controlling gene expression levels and AS is fundamental to organ development, especially for the brain and heart. In these regulatory networks, AS affords substantial functional diversification of genes through the generation of tissue- and time-specific isoforms from broadly expressed genes.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 854
Author(s):  
Yishu Wang ◽  
Lingyun Xu ◽  
Dongmei Ai

DNA methylation is an important regulator of gene expression that can influence tumor heterogeneity and shows weak and varying expression levels among different genes. Gastric cancer (GC) is a highly heterogeneous cancer of the digestive system with a high mortality rate worldwide. The heterogeneous subtypes of GC lead to different prognoses. In this study, we explored the relationships between DNA methylation and gene expression levels by introducing a sparse low-rank regression model based on a GC dataset with 375 tumor samples and 32 normal samples from The Cancer Genome Atlas database. Differences in the DNA methylation levels and sites were found to be associated with differences in the expressed genes related to GC development. Overall, 29 methylation-driven genes were found to be related to the GC subtypes, and in the prognostic model, we explored five prognoses related to the methylation sites. Finally, based on a low-rank matrix, seven subgroups were identified with different methylation statuses. These specific classifications based on DNA methylation levels may help to account for heterogeneity and aid in personalized treatments.


Sign in / Sign up

Export Citation Format

Share Document