scholarly journals E.PAGE: A curated database and enrichment tool to predict modules associated with gene-environment interactions

2022 ◽  
Author(s):  
Sachin Muralidharan ◽  
Farah Zahir ◽  
Ahmed M. Mehdi

Aims/hypothesis: The purpose of this study is to manually and semi-automatically curate a database and develop an R package that will act as a comprehensive resource to understand how biological processes are dysregulated due to interactions with environmental factors. Methods: We followed a two-step process to achieve the objectives of this study. First, we conducted a systematic review of the existing gene expression datasets to identify the integrated genomic and environmental factors used in available studies. This enabled us to curate a comprehensive genomic-environmental database for four key environmental factors (smoking, diet, infections and toxic chemicals) associated with various autoimmune and chronic conditions. Second, we developed a statistical analysis package that allows users to understand the relationships between differentially expressed genes and environmental factors under different disease conditions. Results: The initial database search run on the Gene Expression Omnibus (GEO) and the Molecular Signature Database (MSigDB) retrieved a total of 90,018 articles. After title and abstract screening against pre-set criteria, a total of 186 studies were selected. From those, 243 individual sets of genes, or gene modules, were obtained. We then curated a database containing four environmental factors, namely cigarette smoking, diet, infections and toxic chemicals, along with a total of 25789 genes that had an association with one or more of these factors. In 6 case studies, the database and statistical analysis package were then tested with lists of differentially expressed genes obtained from the published literature related to type 1 diabetes, rheumatoid arthritis, small cell lung cancer, cobalt exposure, COVID-19 and smoking. On testing, we uncovered statistically enriched biological processes, which could help us understand the pathways associated with environmental factors and gene modules. Conclusions: A novel curated database and software tool is provided as an R Package. Users can enter a list of genes to discover associated environmental factors under various disease conditions.

2020 ◽  
Vol 14 ◽  
pp. 117793222090616
Author(s):  
Badreddine Nouadi ◽  
Yousra Sbaoui ◽  
Mariame El Messal ◽  
Faiza Bennis ◽  
Fatima Chegdani

Nowadays, the integration of biological data is a major challenge for bioinformatics. Many studies have examined gene expression in the epithelial tissue in the intestines of infants born to term and breastfed, generating a large amount of data. The integration of these data is important to understand the biological processes involved during bacterial colonization of the newborns intestine, particularly through breast milk. This work aims to exploit the bioinformatics approaches, to provide a new representation and interpretation of the interactions between differentially expressed genes in the host intestine induced by the microbiota.


2020 ◽  
Author(s):  
Wei Han ◽  
Guo-liang Shen

Abstract Background: Skin Cutaneous Melanoma (SKCM) is known as an aggressive malignant cancer, which could be directly derived from melanocytic nevi. However, the molecular mechanisms underlying malignant transformation of melanocytes and melanoma tumor progression still remain unclear. Increasing researches showed significant roles of epigenetic modifications, especially DNA methylation, in melanoma. This study focused on identification and analysis of methylation-regulated differentially expressed genes (MeDEGs) between melanocytic nevus and malignant melanoma in genome-wide profiles. Methods: The gene expression profiling datasets (GSE3189 and GSE114445) and gene methylation profiling datasets (GSE86355 and GSE120878) were downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) and differentially methylated genes (DMGs) were identified via GEO2R. MeDEGs were obtained by integrating the DEGs and DMGs. Then, functional enrichment analysis of MeDEGs were performed. STRING and Cytoscape were used to describe protein-protein interaction(PPI) network. Furthermore, survival analysis was implemented to select the prognostic hub genes. Finally, we conducted gene set enrichment analysis (GSEA) of hub genes. Results: We identified 237 hypomethylated, upregulated genes and 182 hypermethylated, downregulated genes. Hypomethylation-upregulated genes were enriched in biological processes of the oxidation-reduction process, cell proliferation, cell division, phosphorylation, extracellular matrix disassembly and protein sumoylation. Pathway enrichment showed selenocompound metabolism, small cell lung cancer and lysosome. Hypermethylation-downregulated genes were enriched in biological processes of positive regulation of transcription from RNA polymerase II promoter, cell adhesion, cell proliferation, positive regulation of transcription, DNA-templated and angiogenesis. The most significantly enriched pathways involved the transcriptional misregulation in cancer, circadian rhythm, tight junction, protein digestion and absorption and Hippo signaling pathway. After PPI establishment and survival analysis, seven prognostic hub genes were CKS2, DTL, KIF2C, KPNA2, MYBL2, TPX2 and FBL. Moreover, the most involved hallmarks obtained by GSEA were E2F targets, G2M checkpoint and mitotic spindle. Conclusions: Our study identified potential aberrantly methylated-differentially expressed genes participating in the process of malignant transformation from nevus to melanoma tissues based on comprehensive genomic profiles. Transcription profiles of CKS2, DTL, KIF2C, KPNA2, MYBL2, TPX2 and FBL provided clues of aberrantly methylation-based biomarkers, which might improve the development of precise medicine.


2021 ◽  
Author(s):  
Cailin xue ◽  
Peng gao ◽  
Xudong zhang ◽  
Xiaohan cui ◽  
Lei jin ◽  
...  

Abstract Background: Abnormal methylation of DNA sequences plays an important role in the development and progression of pancreatic cancer (PC). The purpose of this study was to identify abnormal methylation genes and related signaling pathways in PC by comprehensive bioinformatic analysis of three datasets in the Gene Expression Omnibus (GEO). Methods: Datasets of gene expression microarrays (GSE91035, GSE15471) and gene methylation microarrays (GSE37480) were downloaded from the GEO database. Aberrantly methylated-differentially expressed genes (DEGs) were analysis by GEO2R software. GO and KEGG enrichment analyses of selected genes were performed using DAVID database. A protein–protein interaction (PPI) network was constructed by STRING and visualized in Cytoscape. Core module analysis was performed by Mcode in Cytoscape. Hub genes were obtained by CytoHubba app. in Cytoscape software. Results: A total of 267 hypomethylation-high expression genes, which were enriched in biological processes of cell adhesion, biological adhesion and regulation of signaling were obtained. KEGG pathway enrichment showed ECM-receptor interaction, Focal adhesion and PI3K-Akt signaling pathway. The top 5 hub genes of PPI network were EZH2, CCNA2, CDC20, KIF11, UBE2C. As for hypermethylation-low expression genes, 202 genes were identified, which were enriched in biological processes of cellular amino acid biosynthesis process and positive regulation of PI3K activity, etc. The pathways enriched were the pancreatic secretion and biosynthesis of amino acids pathways, etc. The five significant hub genes were DLG3, GPT2, PLCB1, CXCL12 and GNG7. In addition, five genes, including CCNA2, KIF11, UBE2C, PLCB1 and GNG7, significantly associated with patient's prognosis were also identified. Conclusion: Novel genes with abnormal expression were identified, which will help us further understand the molecular mechanism and related signaling pathways of PC, and these aberrant genes could possibly serve as biomarkers for precise diagnosis and treatment of PC.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Guanyi Wang ◽  
Yibin Jia ◽  
Yuqin Ye ◽  
Enming Kang ◽  
Huijun Chen ◽  
...  

Abstract Background Posterior fossa ependymoma (EPN-PF) can be classified into Group A posterior fossa ependymoma (EPN-PFA) and Group B posterior fossa ependymoma (EPN-PFB) according to DNA CpG island methylation profile status and gene expression. EPN-PFA usually occurs in children younger than 5 years and has a poor prognosis. Methods Using epigenome and transcriptome microarray data, a multi-component weighted gene co-expression network analysis (WGCNA) was used to systematically identify the hub genes of EPN-PF. We downloaded two microarray datasets (GSE66354 and GSE114523) from the Gene Expression Omnibus (GEO) database. The Limma R package was used to identify differentially expressed genes (DEGs), and ChAMP R was used to analyze the differential methylation genes (DMGs) between EPN-PFA and EPN-PFB. GO and KEGG enrichment analyses were performed using the Metascape database. Results GO analysis showed that enriched genes were significantly enriched in the extracellular matrix organization, adaptive immune response, membrane raft, focal adhesion, NF-kappa B pathway, and axon guidance, as suggested by KEGG analysis. Through WGCNA, we found that MEblue had a significant correlation with EPN-PF (R = 0.69, P = 1 × 10–08) and selected the 180 hub genes in the blue module. By comparing the DEGs, DMGs, and hub genes in the co-expression network, we identified five hypermethylated, lower expressed genes in EPN-PFA (ATP4B, CCDC151, DMKN, SCN4B, and TUBA4B), and three of them were confirmed by IHC. Conclusion ssGSEA and GSVA analysis indicated that these five hub genes could lead to poor prognosis by inducing hypoxia, PI3K-Akt-mTOR, and TNFα-NFKB pathways. Further study of these dysmethylated hub genes in EPN-PF and the pathways they participate in may provides new ideas for EPN-PF treatment.


2003 ◽  
Vol 12 (2) ◽  
pp. 159-162 ◽  
Author(s):  
Chiara Romualdi ◽  
Stefania Bortoluzzi ◽  
Fabio d’Alessi ◽  
Gian Antonio Danieli

Here we present a novel web tool for the statistical analysis of gene expression data in multiple tag sampling experiments. Differentially expressed genes are detected by using six different test statistics. Result tables, linked to the GenBank, UniGene, or LocusLink database, can be browsed or searched in different ways. Software is freely available at the site: http://telethon.bio.unipd.it/bioinfo/IDEG6_form/ , together with additional information on statistical methodologies.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Brandon Monier ◽  
Jing Zhao ◽  
Qin Ma

AbstractDifferential gene expression (DGE) is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes (DEGs) across two or more conditions. Interpretation of the DGE results can be non-intuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we present an R package, ViDGER (Visualization of Differential Gene Expression Results using R), which contains nine functions that generate information-rich visualizations for the interpretation of DGE results from three widely-used tools, Cuffdiff, DESeq2, and edgeR.


2021 ◽  
Author(s):  
Guanyi Wang ◽  
Yibin Jia ◽  
Yuqing Ye ◽  
Enming Kang ◽  
Huijun Chen ◽  
...  

Abstract BackgroundPosterior fossa ependymoma (EPN-PF) can be classified into Group A posterior fossa ependymoma(EPN-PFA) and Group B posterior fossa ependymoma (EPN-PFB) according to DNA CpG island methylation profile status and gene expression. EPN-PFA usually occurs in children younger than 5 years and has a poor prognosis. MethodsUsing epigenome and transcriptome microarray data, a multi-component weighted gene co-expression network analysis (WGCNA) was used to systematically identify the hub genes of EPN-PF. We downloaded two microarray datasets (GSE66354 and GSE114523) from the Gene Expression Omnibus (GEO) database. The Limma R package was used to identify differentially expressed genes (DEGs), and ChAMP R was used to analyze the differential methylation genes (DMGs) between EPN-PFA and EPN-PFB. GO and KEGG enrichment analyses were performed using the Metascape database. ResultsGO analysis showed that enriched genes were significantly enriched in the extracellular matrix organization, adaptive immune response, membrane raft, focal adhesion, NF-kappa B pathway, and axon guidance, as suggested by KEGG analysis. Through WGCNA, we found that MEblue had a significant correlation with EPN-PF (R=0.69, P=1 x 10-08) and selected the 180 hub genes in the blue module. By comparing the DEGs, DMGs, and hub genes in the co-expression network, we identified five hypermethylated, lower expressed genes in EPN-PFA (ATP4B, CCDC151, DMKN, SCN4B, and TUBA4B), and three of them were confirmed by IHC. ConclusionssGSEA and GSVA analysis indicated that these five hub genes could lead to poor prognosis by inducing hypoxia, PI3K-Akt-mTOR, and TNFα-NFKB pathways. Further study of these dysmethylated hub genes in EPN-PF and the pathways they participate in may provides new ideas for EPN-PF treatment.


2007 ◽  
Vol 35 (04) ◽  
pp. 609-620 ◽  
Author(s):  
Liping Yang ◽  
Miqu Wang ◽  
Wei Wu ◽  
Louxin Zhang

Microarrays are widely used to study changes in gene expression in diseases. In this paper, we use this technology to discover gene expression patterns in the cold syndrome in Chinese medicine. We identify differentially expressed genes and extracted gene modules that are enriched with differentially expressed genes in the cold syndrome by analyzing cDNA samples, which are purified from blood taken from a pedigree. Our results suggest that the cold syndrome might be caused by the physiological imbalance and/or the disorder of metabolite processes. The study confirms the hypotheses about molecular pathways responsible to human metabolic-related diseases.


2021 ◽  
Author(s):  
Yanzhou Zhang ◽  
Qing Zhu ◽  
Xiufeng Cao ◽  
Bin Ni

Abstract Background and objective: Esophageal cancer(ESCA) ranks eleventh in incidence and eighth in mortality among malignant tumors in the world. Due to the paucity of effective early diagnostic approach, a lot of patients have missed the first-rank treatment time frame and were already in the advanced phase at their first diagnosis. The continuous reforming of high-throughput sequencing technologies and analytical techniques has provided novel concepts and approaches for the study of cancer biomarkers in esophageal cancer. The development of cancer is a complex biological process with multi-gene concernment, multi-factor mutual effect and multi-phase development. This process includes the mutations in proto-oncogenes, changes in transcript expression profiles, and abnormalities of protein structure, function, or expression levels. The study of the molecular mechanism of ESCA using high-throughput sequencing technology will lay theoretic foundation for the early diagnosis and targeted therapy of ESCA.Materials and methods: In this study, a search was conducted in tow commonly used public databases, UCSC XENA and GEO, one UCSC XENA RNA-seq data and tow GEO datasets were included in this study. Differential expression analysis was implemented by using limma in R software.Weighted gene co-expression network analysis (WGCNA) was used to analyze the gene transcriptome expression profile consisting of 181 ESCA tissues and 181 normal tissues as controls to construct topology network. We constructed gene modules and searched for gene modules that were closely participant to ESCA, and gene ontology (GO) and KEGG pathway enrichment analysis were implemented to probe into the functions of the DEGs and differentially expressed hub genes in key modules. By combining the consequences of differential gene expression analysis with WGCNA consequences(hub genes), we procured a 30 of differentially expressed genes in module that were closely participant to ESCA. Next, we procured the expression data of these genes from normalized transcriptome expression data to construct ESCA predictive model. Then, ten-fold cross validation combining with machine learning algorithms were used to construct prediction models for ESCA. Finally, we also verified the four screened biomarkers which used to build the predictive model with the GEO data sets.Results: Analysis of differentially expressed genes were conducted by using the limma packages and differentially expressed genes were defined as |log2FC|>1 and adj.P.Val < 0.01. After comparison the results from limma, a total of 15814 genes were up-regulated in ESCA, a total of 6176 gene were down-regulated in ESCA.A total of 7 gene modules were identified from WGCNA, 2 modules of them are strongly corelative with ESCA (Brown module: R2=0.87, Lightcyan module: R2=-0.75, both P <0.001). Brown module is closely related to ESCA.The consequences of WGCNA analysis combined with differentially expressed genes revealed that there were 4419 differentially expressed genes in the brown module which were closely related to ESCA. 30 hub gene were screened by kWithin top 30 from brown module, and all of them are differentially expressed.GO analysis of differetially expressed genes from brown module revealed that these genes are from immunoglobulin complex, “chromosome, centromeric region”, condensed chromosome, “immunoglobulin complex, circulating”, condensed chromosome, centromeric region, and other components, and they participated in biological function such as antigen binding, immunoglobulin receptor binding, ATPase activity, cadherin binding, DNA helicase activity, etc., involved in biological processes such as adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains, mitotic nuclear division, lymphocyte mediated immunity, nuclear division, and DNA replication; KEGG pathway analysis shows the brown module differentially expressed genes are mainly enriched in signal pathways such as cell cycle, pathogenic escherichia coli infection, DNA replication, IL-17 signaling pathway and human T-cell leukemia virus 1 infection. This shed new light on molecular mechanisms of the development of ESCA.Twelve ESCA prediction models constructed from 30 gene expression matrices from 362 subjects by using 10-fold cross-validation combined with machine learning algorithms revealed good prediction performance in validation dataset, among which models from gbm, BoostGLM, C5.0 algorithms revealed higher accuracy than from other algorithms. Although the transparent or semi-transparent models constructed by JRip, PART, and Rpart algorithms have acceptable accuracy in validation dataset, their sensitivity are lower. From a comprehensive perspective, two black box algorithm models including gbm and BoostGLM models are selected as the final model. This study has successfully constructed ESCA prediction models with accuracies higher than 0.97.Finally, three of the four screened biomarkers were validated.Conclusions: In current study, differential expression analysis and WGCNA of ESCA participant RNA-seq data available in public database were used to screen DEGs and genes that were closely participant with ESCA. Consequences from GO and KEGG analysis further revealed the underlying mechanisms of ESCA. Normalized gene expression data was feed to several different machine learning techniques and 10-fold cross validation was used to construct high accuracy ESCA predictive models. Eventually, several ESCA predictive models with accuracy higher than 0.96 in validation group were constructed. At the meantime, three biomarkers(G3BP1, CHEK1 and MOB1A) were screened and validated, in particular, G3BP1 may be a potential therapeutic target, as overall survival analysis have shown it to be an adverse prognostic factor. Current study has lay the basis of applying RNA-seq data in the early genetic diagnosis of ESCA, and a prognostic marker that might contribute to treatment of ESCA.


2009 ◽  
Vol 52 (1) ◽  
pp. 65-78
Author(s):  
A. Hartmann ◽  
G. Nuernberg ◽  
D. Repsilber ◽  
P. Janczyk ◽  
C. Walz ◽  
...  

Abstract. Global gene expression studies using microarray technology are widely employed to identify biological processes which are influenced by a treatment e.g. a specific diet. Affected processes are characterized by a significant enrichment of differentially expressed genes (functional annotation analysis). However, different choices of statistical thresholds to select candidates for differential expression will alter the resulting candidates list. This study was conducted to investigate the effect of applying a False Discovery Rate (FDR) correction and different fold change thresholds in statistical analysis of microarray data on diet-affected biological processes based on a significantly increased proportion of differentially expressed genes. In a model feeding experiment with rats fed genetically modified food additives, animals received a supplement of either lyophilized inactivated recombinant VP60 baculovirus (rBV-VP60) or lyophilized inactivated wild type baculovirus (wtBV). Comparative expression profiling was done in spleen, liver and small intestine mucosa. We demonstrated the extent to which threshold choice can affect the biological processes identified as significantly regulated and thus the conclusion drawn from the microarray data. In our study, the combined application of a moderate fold change threshold (FC≥1.5) and a stringent FDR threshold (q≤0.05) exhibited high reliability of biological processes identified as differentially regulated. The application of a stringent FDR threshold of q≤0.05 seems to be an essential prerequisite to reduce considerably the number of false positives. Microarray results of selected differentially expressed molecules were validated successfully by using real-time RT-PCR.


Sign in / Sign up

Export Citation Format

Share Document