Multi-view Clustering for the Integration Analysis of Gene Expression and Methylation Data

Author(s):  
Xiaowei Gao ◽  
Xiaogang Liu ◽  
Xiaoke Ma
2020 ◽  
Author(s):  
zengyan zong ◽  
dayang chen ◽  
WEI WU ◽  
xiaowen dou ◽  
mengmeng wang ◽  
...  

Abstract Background: The pathogenesis of Nasopharyngeal carcinoma (NPC) is very complicated. The present study aimed to identity some candidate genes as biomarkers for NPC diagnosis and pathogenesis.Methods: Three Microarray datasets GSE53819, GSE64634 and GSE12452 and a methylation array (GSE52068) were re-analyzed. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the differentially expressed genes (DEGs) were applied. STRING software was used to construct a protein-protein interaction (PPI) network of DEGs and visualized by Cytoscape. Random Forest (RF) algorithm was performed to construct classifiers and identified key genes.Results: A total of 91 DEGs were screened from the three datasets. GO term and KEGG pathway analysis suggested that the DEGs were predominantly enriched in drug metabolism-cytochrome P450 pathway, metabolism of xenobiotics by cytochrome P450 pathway, chemical carcinogenesis pathway, ciliary part, motile cilium, axoneme, microtubule and ciliary plasm. We obtained nine hub genes and one significant module. We constructed a classifier based on DEGs and found CLIC6 and CLU have the best classification ability. Finally, five hypermethylated and downregulated genes (hyper-down) were identified by integrating methylation data.Conclusions: With gene expression and methylation data integration analysis, several key genes were identified may be potential biomarkers for NPC diagnosis and pathogenesis.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Xiaoling Wang ◽  
Yue Pan ◽  
Haidong Zhu ◽  
Guang Hao ◽  
Xin Wang ◽  
...  

Background: Several large-scale epigenome wide association studies on obesity-related DNA methylation changes have been published and in total identified 46 CpG sites. These studies were conducted in middle-aged and older adults of Caucasians and African Americans (AAs) using leukocytes. To what extend these signals are independent of cell compositions as well as to what extend they may influence gene expression have not been systematically investigated. Furthermore, the high prevalence of obesity comorbidities in middle-aged or older population may hide or bias obesity itself related DNA methylation changes. Methods: In this study of healthy AA youth and young adults, genome wide DNA methylation data from leukocytes were obtained from three independent studies: EpiGO study (96 obese cases vs. 92 lean controls, aged 14-21, 50% females, test of interest is obesity status), LACHY study (284 participants from general population, aged 14-18, 50% females, test of interest is BMI), and Georgia Stress and Heart study (298 participants from general population, aged 18-38, 52% females, test of interest is BMI) using the Infinium HumanMethylation450 BeadChip. Genome wide DNA methylation data from purified neutrophils as well as genome wide gene expression data from leukocytes using Illumina HT12 V4 array were also obtained for the EpiGO samples. Results: The meta-analysis on the 3 cohorts identified 76 obesity related CpG sites in leukocytes with p<1х10 -7 . Out of the 46 previously identified CpG sites, 36 can be replicated in this AA youth and young adult sample with same direction and p<0.05. Out of the 107 CpG sites including the 36 replicated ones and the 71 newly identified ones, 71 CpG sites (66%) had their relationship with obesity replicated in purified neutrophils (p<0.05). The analysis on the cis regulation of the 107 CpG sites on gene expression showed that 59 CpG sites had at least one gene within 250kb having expression difference between obese cases and lean controls. Furthermore, out of the 59 CpG sites, 6 showed significantly negative correlations and 1 showed significantly positive correlation with the differentially expressed genes. These CpG sites located in SOCS3, CISH, ABCG1, PIM3 and PTGDS genes. Conclusion: In this study of AA youth and young adults, we identified novel CpG sites associated with obesity and replicated majority of the CpG sites previously identified in middle-aged and older adults. For the first time, we showed that majority of the obesity related CpG sites identified from leukocytes are not driven by cell compositions and provided the direct link between DNA methylation-gene expression-obesity status for 7 CpG sites in 5 genes.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Jun Li ◽  
Siyuan Li ◽  
Ying Hu ◽  
Guolei Cao ◽  
Siyao Wang ◽  
...  

Objective. We investigated the expression levels of both FOSL2 mRNA and protein as well as evaluating DNA methylation in the blood of type 2 diabetes mellitus (T2DM) Uyghur patients from Xinjiang. This study also evaluated whether FOSL2 gene expression had demonstrated any associations with clinical and biochemical indicators of T2DM. Methods. One hundred Uyghur subjects where divided into two groups, T2DM and nonimpaired glucose tolerance (NGT) groups. DNA methylation of FOSL2 was also analyzed by MassARRAY Spectrometry and methylation data of individual units were generated by the EpiTyper v1.0.5 software. The expression levels of FOS-like antigen 2 (FOSL2) and the protein expression levels were analyzed. Results. Significant differences were observed in mRNA and protein levels when compared with the NGT group, while methylation rates of eight CpG units within the FOSL2 gene were higher in the T2DM group. Methylation of CpG sites was found to inversely correlate with expression of other markers. Conclusions. Results show that a correlation between mRNA, protein, and DNA methylation of FOSL2 gene exists among T2DM patients from Uyghur. FOSL2 protein and mRNA were downregulated and the DNA became hypermethylated, all of which may be involved in T2DM pathogenesis in this population.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Xindong Zhang ◽  
Lin Gao ◽  
Zhi-Ping Liu ◽  
Songwei Jia ◽  
Luonan Chen

As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 2395-2395
Author(s):  
Karen Dybkaer ◽  
Magdalena Julia Dabrowska ◽  
Ditte Ejegod ◽  
Louise Berkhoudt Lassen ◽  
Hans Erik Johnsen ◽  
...  

Abstract Abstract 2395 Human T-cell lymphoblastic lymphomas (T-LBLs) are neoplasms of immature T-cells and constitute a group of rare, heterogeneous and clinically very aggressive tumors. The molecular pathogenesis that contributes to T-LBL development is not fully elucidated. Since murine T-LBLs are histopathologically and phenotypically comparable to human T-LBLs, mouse models of T-LBLs are ideal to obtain additional insight into the mechanism of T-LBL development in humans. When injected into newborn mice of the NMRI inbred strain, the SL3-3 murine leukemia virus (MLV) induces various types of hematological malignancies, including T-LBLs. The oncogenic effects of the SL3-3 MLV are caused by integration of the viral genome into the host cell DNA through multiple rounds of infection, and subsequent deregulation of nearby cellular genes – a process defined as insertional mutagenesis. If the integration occurs near or within a gene of importance for cancer development, the cell in which the virus has integrated, may gain a growth advantage, eventually leading to malignant transformation and development of a full blown tumor. Screening the murine genome for resulting integration sites in the end-stage tumors, is therefore an efficient method for identifying genes involved in murine and potentially also human T-cell lymphomagenic processes. In a search for genes and pathways implicated in T-cell lymphoblastic lymphoma (T-LBL) development, we used a murine lymphoma model, where mice of the NMRI inbred strain were inoculated with mutants of SL3-3 MLV. The mutants were affected in the glucocorticoid response element and an overlapping E-box of the viral enhancer in the long termial releat. By performing integration analysis on 19 and global gene expression profiling on 22 of the resulting T-LBL tumors, we determined both the effect of the retroviral integrations on the summarized expression of the nearby genes, and the deregulated pathways in the tumors. Fifty two different genes were identified within a 10 kb distance of the retroviral integrations, whereof 15 were specifically involved in G1/S phase transition. Gene expression dot-plots showed an activating effect of the retrovirus on Mr1, Stx6, Cask and Sh3gl3. Gene expression profiling identified increased expression of genes involved in the minichromosome maintenance (Mcm) and origin of recognition (Orc) pathway as well as downregulation in negative regulators of G1/S transition, indicating that murine T-LBLs have increased S-phase initiation. In conclusion, both the integration analysis and patterns of mRNA expression identified by gene expression profiling in the mouse models of T-LBL strongly indicate that genes involved in G1/S phase transition and/or S-phase initiation are deregulated suggesting similar mechanisms to be of importance in human T-LBL pathogenesis. Disclosures: No relevant conflicts of interest to declare.


2014 ◽  
Vol 11 (2) ◽  
pp. 1-14 ◽  
Author(s):  
Markus List ◽  
Anne-Christin Hauschild ◽  
Qihua Tan ◽  
Torben A. Kruse ◽  
Jan Baumbach ◽  
...  

Summary Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.


Genes ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 571 ◽  
Author(s):  
Ze-Jia Cui ◽  
Xiong-Hui Zhou ◽  
Hong-Yu Zhang

Achieving cancer prognosis and molecular typing is critical for cancer treatment. Previous studies have identified some gene signatures for the prognosis and typing of cancer based on gene expression data. Some studies have shown that DNA methylation is associated with cancer development, progression, and metastasis. In addition, DNA methylation data are more stable than gene expression data in cancer prognosis. Therefore, in this work, we focused on DNA methylation data. Some prior researches have shown that gene modules are more reliable in cancer prognosis than are gene signatures and that gene modules are not isolated. However, few studies have considered cross-talk among the gene modules, which may allow some important gene modules for cancer to be overlooked. Therefore, we constructed a gene co-methylation network based on the DNA methylation data of cancer patients, and detected the gene modules in the co-methylation network. Then, by permutation testing, cross-talk between every two modules was identified; thus, the module network was generated. Next, the core gene modules in the module network of cancer were identified using the K-shell method, and these core gene modules were used as features to study the prognosis and molecular typing of cancer. Our method was applied in three types of cancer (breast invasive carcinoma, skin cutaneous melanoma, and uterine corpus endometrial carcinoma). Based on the core gene modules identified by the constructed DNA methylation module networks, we can distinguish not only the prognosis of cancer patients but also use them for molecular typing of cancer. These results indicated that our method has important application value for the diagnosis of cancer and may reveal potential carcinogenic mechanisms.


Sign in / Sign up

Export Citation Format

Share Document