scholarly journals BioMethyl: an R package for biological interpretation of DNA methylation data

2019 ◽  
Vol 35 (19) ◽  
pp. 3635-3641 ◽  
Author(s):  
Yue Wang ◽  
Jennifer M Franks ◽  
Michael L Whitfield ◽  
Chao Cheng

AbstractMotivationThe accumulation of publicly available DNA methylation datasets has resulted in the need for tools to interpret the specific cellular phenotypes in bulk tissue data. Current approaches use either single differentially methylated CpG sites or differentially methylated regions that map to genes. However, these approaches may introduce biases in downstream analyses of biological interpretation, because of the variability in gene length. There is a lack of approaches to interpret DNA methylation effectively. Therefore, we have developed computational models to provide biological interpretation of relevant gene sets using DNA methylation data in the context of The Cancer Genome Atlas.ResultsWe illustrate that Biological interpretation of DNA Methylation (BioMethyl) utilizes the complete DNA methylation data for a given cancer type to reflect corresponding gene expression profiles and performs pathway enrichment analyses, providing unique biological insight. Using breast cancer as an example, BioMethyl shows high consistency in the identification of enriched biological pathways from DNA methylation data compared to the results calculated from RNA sequencing data. We find that 12 out of 14 pathways identified by BioMethyl are shared with those by using RNA-seq data, with a Jaccard score 0.8 for estrogen receptor (ER) positive samples. For ER negative samples, three pathways are shared in the two enrichments with a slight lower similarity (Jaccard score = 0.6). Using BioMethyl, we can successfully identify those hidden biological pathways in DNA methylation data when gene expression profile is lacking.Availability and implementationBioMethyl R package is freely available in the GitHub repository (https://github.com/yuewangpanda/BioMethyl).Supplementary informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Haiwei Wang ◽  
Xinrui Wang ◽  
Liangpu Xu ◽  
Ji Zhang ◽  
Hua Cao

Abstract Background: For a specific cancer type, the transcriptional profile is determined by the combination of innate transcriptional features of the original normal tissue and the acquired transcriptional characteristics mediated by genomic and epigenetic aberrations in the tumor development. However, the classification of innate normal tissue specific genes and acquired tumor specific genes is not studied in a pan-cancer manner. Methods: The innate and acquired gene expression profiles in each tumor type were studied using The Cancer Genome Atlas (TCGA) RNA-seq dataset. The prognostic effects of the tumor acquired genes were determined by “survival” package in R software. The methylation of the tumor acquired genes was delineated using TCGA HumanMethylation450 microarray data. Results: 90% liver hepatocellular carcinoma (LIHC) specific genes are derived from innate normal liver specific genes. On the contrary, 90.3% kidney clear cell carcinoma (KIRC) specific genes and 90.9 % lung squamous cell carcinoma (LUSC) specific genes are acquired in the tumor developmental progress. The innate normal tissue specific genes are down regulated in tumor tissues, while, the tumor acquired specific genes are up regulated in the tumor tissues. The innate normal tissue specific genes and the tumors acquired specific genes are both associated with the tumor overall survival in some tumor types. The hyper-DNA methylation of normal tissue specific genes is contributing to the inhibition of normal tissue specific genes expression in cancer cells. And the tumor acquired specific genes are activated by hypo-DNA methylation and genomic aberrations. Conclusions: Our results provide descriptions of the specific transcriptional features across cancer types and suggest that the tumor acquired specific genes are potential targets for anti-cancer therapy.


2020 ◽  
Vol 40 (5) ◽  
Author(s):  
Xinhua Liu ◽  
Yonglin Peng ◽  
Ju Wang

Abstract Breast cancer is a common malignant tumor among women whose prognosis is largely determined by the period and accuracy of diagnosis. We here propose to identify a robust DNA methylation-based breast cancer-specific diagnostic signature. Genome-wide DNA methylation and gene expression profiles of breast cancer patients along with their adjacent normal tissues from the Cancer Genome Atlas (TCGA) were obtained as the training set. CpGs that with significantly elevated methylation level in breast cancer than not only their adjacent normal tissues and the other ten common cancers from TCGA but also the healthy breast tissues from the Gene Expression Omnibus (GEO) were finally remained for logistic regression analysis. Another independent breast cancer DNA methylation dataset from GEO was used as the testing set. Lots of CpGs were hyper-methylated in breast cancer samples compared with adjacent normal tissues, which tend to be negatively correlated with gene expressions. Eight CpGs located at RIIAD1, ENPP2, ESPN, and ETS1, were finally retained. The diagnostic model was reliable in separating BRCA from normal samples. Besides, chromatin accessibility status of RIIAD1, ENPP2, ESPN and ETS1 showed great differences between MCF-7 and MDA-MB-231 cell lines. In conclusion, the present study should be helpful for breast cancer early and accurate diagnosis.


2018 ◽  
Author(s):  
Hong-Dong Li ◽  
Yunpei Xu ◽  
Xiaoshu Zhu ◽  
Quan Liu ◽  
Gilbert S. Omenn ◽  
...  

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Qian Yan ◽  
Baoqian Ye ◽  
Boqing Wang ◽  
Wenjiang Zheng ◽  
Xiongwen Wang

Abstract The purpose of this study is to analyze the DNA methylation and gene expression profiles of immune-related CpG sites to identify the molecular subtypes and CpG sites related to the prognosis of HCC. In this study, the DNA methylation and gene expression datasets were downloaded from The Cancer Genome Atlas database, together with immune-related genes downloaded from the immunology database and analysis portal database to explore the prognostic molecular subtypes of HCC. By performing consistent clustering analysis on 830 immune-related CpG sites, we identified seven subgroups with significant differences in overall survival. Finally, 16 classifiers of immune-related CpG sites were constructed and used in the testing set to verify the prognosis of DNA methylation subgroups, and the results were consistent with the training set. Using the TIMER database, we analyzed 16 immune-related CpG sites expression with the abundance of six types of immune infiltrating cells and found that most are positively correlated with the level of infiltration of multiple immune cells in HCC. This study screened potential immune-related prognostic methylation sites and established a new prognosis model of HCC based on DNA methylation molecular subtype, which may help in the early diagnosis of HCC and developing more effective personalized treatments.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 854
Author(s):  
Yishu Wang ◽  
Lingyun Xu ◽  
Dongmei Ai

DNA methylation is an important regulator of gene expression that can influence tumor heterogeneity and shows weak and varying expression levels among different genes. Gastric cancer (GC) is a highly heterogeneous cancer of the digestive system with a high mortality rate worldwide. The heterogeneous subtypes of GC lead to different prognoses. In this study, we explored the relationships between DNA methylation and gene expression levels by introducing a sparse low-rank regression model based on a GC dataset with 375 tumor samples and 32 normal samples from The Cancer Genome Atlas database. Differences in the DNA methylation levels and sites were found to be associated with differences in the expressed genes related to GC development. Overall, 29 methylation-driven genes were found to be related to the GC subtypes, and in the prognostic model, we explored five prognoses related to the methylation sites. Finally, based on a low-rank matrix, seven subgroups were identified with different methylation statuses. These specific classifications based on DNA methylation levels may help to account for heterogeneity and aid in personalized treatments.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ji Li ◽  
Chen Zhu ◽  
Peipei Yue ◽  
Tianyu Zheng ◽  
Yan Li ◽  
...  

Abstract Background Abnormal energy metabolism is one of the characteristics of tumor cells, and it is also a research hotspot in recent years. Due to the complexity of digestive system structure, the frequency of tumor is relatively high. We aim to clarify the prognostic significance of energy metabolism in digestive system tumors and the underlying mechanisms. Methods Gene set variance analysis (GSVA) R package was used to establish the metabolic score, and the score was used to represent the metabolic level. The relationship between the metabolism and prognosis of digestive system tumors was explored using the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Volcano plots and gene ontology (GO) analyze were used to show different genes and different functions enriched between different glycolysis levels, and GSEA was used to analyze the pathway enrichment. Nomogram was constructed by R package based on gene characteristics and clinical parameters. qPCR and Western Blot were applied to analyze gene expression. All statistical analyses were conducted using SPSS, GraphPad Prism 7, and R software. All validated experiments were performed three times independently. Results High glycolysis metabolism score was significantly associated with poor prognosis in pancreatic adenocarcinoma (PAAD) and liver hepatocellular carcinoma (LIHC). The STAT3 (signal transducer and activator of transcription 3) and YAP1 (Yes1-associated transcriptional regulator) pathways were the most critical signaling pathways in glycolysis modulation in PAAD and LIHC, respectively. Interestingly, elevated glycolysis levels could also enhance STAT3 and YAP1 activity in PAAD and LIHC cells, respectively, forming a positive feedback loop. Conclusions Our results may provide new insights into the indispensable role of glycolysis metabolism in digestive system tumors and guide the direction of future metabolism–signaling target combined therapy.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Katherine R. Dobbs ◽  
Paula Embury ◽  
Emmily Koech ◽  
Sidney Ogolla ◽  
Stephen Munga ◽  
...  

Abstract Background Age-related changes in adaptive and innate immune cells have been associated with a decline in effective immunity and chronic, low-grade inflammation. Epigenetic, transcriptional, and functional changes in monocytes occur with aging, though most studies to date have focused on differences between young adults and the elderly in populations with European ancestry; few data exist regarding changes that occur in circulating monocytes during the first few decades of life or in African populations. We analyzed DNA methylation profiles, cytokine production, and inflammatory gene expression profiles in monocytes from young adults and children from western Kenya. Results We identified several hypo- and hyper-methylated CpG sites in monocytes from Kenyan young adults vs. children that replicated findings in the current literature of differential DNA methylation in monocytes from elderly persons vs. young adults across diverse populations. Differentially methylated CpG sites were also noted in gene regions important to inflammation and innate immune responses. Monocytes from Kenyan young adults vs. children displayed increased production of IL-8, IL-10, and IL-12p70 in response to TLR4 and TLR2/1 stimulation as well as distinct inflammatory gene expression profiles. Conclusions These findings complement previous reports of age-related methylation changes in isolated monocytes and provide novel insights into the role of age-associated changes in innate immune functions.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Guoliang Jia ◽  
Zheyu Song ◽  
Zhonghang Xu ◽  
Youmao Tao ◽  
Yuanyu Wu ◽  
...  

Abstract Background Bioinformatics was used to analyze the skin cutaneous melanoma (SKCM) gene expression profile to provide a theoretical basis for further studying the mechanism underlying metastatic SKCM and the clinical prognosis. Methods We downloaded the gene expression profiles of 358 metastatic and 102 primary (nonmetastatic) CM samples from The Cancer Genome Atlas (TCGA) database as a training dataset and the GSE65904 dataset from the National Center for Biotechnology Information database as a validation dataset. Differentially expressed genes (DEGs) were screened using the limma package of R3.4.1, and prognosis-related feature DEGs were screened using Logit regression (LR) and survival analyses. We also used the STRING online database, Cytoscape software, and Database for Annotation, Visualization and Integrated Discovery software for protein–protein interaction network, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses based on the screened DEGs. Results Of the 876 DEGs selected, 11 (ZNF750, NLRP6, TGM3, KRTDAP, CAMSAP3, KRT6C, CALML5, SPRR2E, CD3G, RTP5, and FAM83C) were screened using LR analysis. The survival prognosis of nonmetastatic group was better compared to the metastatic group between the TCGA training and validation datasets. The 11 DEGs were involved in 9 KEGG signaling pathways, and of these 11 DEGs, CALML5 was a feature DEG involved in the melanogenesis pathway, 12 targets of which were collected. Conclusion The feature DEGs screened, such as CALML5, are related to the prognosis of metastatic CM according to LR. Our results provide new ideas for exploring the molecular mechanism underlying CM metastasis and finding new diagnostic prognostic markers.


Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

Abstract Motivation Cancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g. when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations. Results We have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens and breast cancer samples from The Cancer Genome Atlas. Availability and implementation The method is available as the R-package nempi at https://github.com/cbg-ethz/nempi and http://bioconductor.org/packages/nempi. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document