Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers

2018 ◽  
Vol 16 (02) ◽  
pp. 1850006 ◽  
Author(s):  
Myungjin Moon ◽  
Kenta Nakai

Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box–Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.

2021 ◽  
Author(s):  
Qingxian Li ◽  
Weiyi Cai ◽  
Jianhong Chen ◽  
Jie Ning

Abstract Background: DNA methylation has been reported as one of the most critical epigenetic aberrations during the tumorigenesis and development of papillary thyroid carcinoma (PTC). Although PTC has been explored by gene expression and DNA methylation studies, the regulatory mechanisms of the methylation on the gene expression was poorly clarified.Results: In this study, the comparisons between PTC and NT revealed 4995 methylated probes and 1446 differentially expressed transcripts cross-validated by The Cancer Genome Atlas (TCGA) database. The integrative analysis between DNA methylation and gene expression revealed 123 and 29 genes with hypomethylation/overexpression and hypermethylation/downexpression correlation, respectively. The DNA methylation pattern of seven selected CpGs (A: UNC80-cg04507925; B: TPO-cg09757588; C: LHX8-cg11842415; D: DLG2-cg16986720; E: FOXJ1-cg20373432; F: PALM2-cg21204870; G: IPCEF1-cg24635109, of which the candidate promoter CpG sites were preliminarily identified with the least absolute shrinkage and selection operator (LASSO) regression analysis. Then, the risk prognosis model was constructed by stepwise regression analysis. Furthermore, the receiver operating characteristic (ROC) and nomogram based on the verified independent prognostic factors was established for the prognostic prediction showed that it was able to predict 3-, 5-, and 7-year survival accurately. Kaplan-Meier survival estimate demonstrated that low DLG2 expression and DLG2-cg16986720 hypermethylation were independent biomarkers for OS. From the comprehensive meta-analysis, the combined Standardised Mean Difference (SMD) of DLG2 was 0.94 with 95% CI of (0.46,1.43), indicating that less DLG2 was expressed in the PTC tissue than in the normal tissue (P<0.05). Bisulfite sequencing PCR also showed that DLG2 methylation was higher in tumor group than in normal group. Components of immune microenvironment were analyzed using TIMER, and the correlation between immune cells and DLG2 was found to be distinct across cancer types. Based on Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, DLG2 was implicated in pathways involved in immunity, metabolism, cancer, and infectious diseases. PCT patients with DLG2-cg16986720 hypermethylation showed significantly short survival rates in progression- free survival concomitant with reduced infiltration of myeloid dendritic cells.Conclusions: The current study validated that DLG2 was lowly expressed in PTC. More importantly, DLG2 hypermethylation might function as a latent tumor biomarker in the prognosis prediction for PTC. The results of bioinformatics analyses may present a new method for investigating the pathogenesis of PTC. DNA methylation loss in non-promoter, poor CGI and enhancer-enriched regions was a significant event in PTC. In addition to the promoter region, gene body and 3’UTR methylation have also the potential to influence the gene expression levels (both, repressing and inducing). The integrative analysis revealed genes potentially regulated by DNA methylation pointing out potential drivers and biomarkers related to PTC development.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1442
Author(s):  
Y-H. Taguchi ◽  
Turki Turki

Analysis of single-cell multiomics datasets is a novel topic and is considerably challenging because such datasets contain a large number of features with numerous missing values. In this study, we implemented a recently proposed tensor-decomposition (TD)-based unsupervised feature extraction (FE) technique to address this difficult problem. The technique can successfully integrate single-cell multiomics data composed of gene expression, DNA methylation, and accessibility. Although the last two have large dimensions, as many as ten million, containing only a few percentage of nonzero values, TD-based unsupervised FE can integrate three omics datasets without filling in missing values. Together with UMAP, which is used frequently when embedding single-cell measurements into two-dimensional space, TD-based unsupervised FE can produce two-dimensional embedding coincident with classification when integrating single-cell omics datasets. Genes selected based on TD-based unsupervised FE are also significantly related to reasonable biological roles.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Xiao-Liang Xing ◽  
Zhi-Yong Yao ◽  
Chaoqun Xing ◽  
Zhi Huang ◽  
Jing Peng ◽  
...  

Abstract Background Colorectal cancer (CRC) is the second most prevalent cancer, as it accounts for approximately 10% of all annually diagnosed cancers. Studies have indicated that DNA methylation is involved in cancer genesis. The purpose of this study was to investigate the relationships among DNA methylation, gene expression and the tumor-immune microenvironment of CRC, and finally, to identify potential key genes related to immune cell infiltration in CRC. Methods In the present study, we used the ChAMP and DESeq2 packages, correlation analyses, and Cox regression analyses to identify immune-related differentially expressed genes (IR-DEGs) that were correlated with aberrant methylation and to construct a risk assessment model. Results Finally, we found that HSPA1A expression and CCRL2 expression were positively and negatively associated with the risk score of CRC, respectively. Patients in the high-risk group were more positively correlated with some types of tumor-infiltrating immune cells, whereas they were negatively correlated with other tumor-infiltrating immune cells. After the patients were regrouped according to the median risk score, we could more effectively distinguish them based on survival outcome, clinicopathological characteristics, specific tumor-immune infiltration status and highly expressed immune-related biomarkers. Conclusion This study suggested that the risk assessment model constructed by pairing immune-related differentially expressed genes correlated with aberrant DNA methylation could predict the outcome of CRC patients and might help to identify those patients who could benefit from antitumor immunotherapy.


Polymers ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 4117
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

The development of the medical applications for substances or materials that contact cells is important. Hence, it is necessary to elucidate how substances that surround cells affect gene expression during incubation. In the current study, we compared the gene expression profiles of cell lines that were in contact with collagen–glycosaminoglycan mesh and control cells. Principal component analysis-based unsupervised feature extraction was applied to identify genes with altered expression during incubation in the treated cell lines but not in the controls. The identified genes were enriched in various biological terms. Our method also outperformed a conventional methodology, namely, gene selection based on linear regression with time course.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


2018 ◽  
Vol 234 (7) ◽  
pp. 11942-11950 ◽  
Author(s):  
Gan‐xun Li ◽  
Ze‐yang Ding ◽  
Yu‐wei Wang ◽  
Tong‐tong Liu ◽  
Wei‐xun Chen ◽  
...  

2020 ◽  
Vol 14 ◽  
pp. 117793222090616
Author(s):  
Badreddine Nouadi ◽  
Yousra Sbaoui ◽  
Mariame El Messal ◽  
Faiza Bennis ◽  
Fatima Chegdani

Nowadays, the integration of biological data is a major challenge for bioinformatics. Many studies have examined gene expression in the epithelial tissue in the intestines of infants born to term and breastfed, generating a large amount of data. The integration of these data is important to understand the biological processes involved during bacterial colonization of the newborns intestine, particularly through breast milk. This work aims to exploit the bioinformatics approaches, to provide a new representation and interpretation of the interactions between differentially expressed genes in the host intestine induced by the microbiota.


2020 ◽  
Vol 11 ◽  
Author(s):  
Yan Zhang ◽  
Dianjing Guo

As one of the most common malignant tumors worldwide, gastric adenocarcinoma (GC) and its prognosis are still poorly understood. Various genetic and epigenetic factors have been indicated in GC carcinogenesis. However, a comprehensive and in-depth investigation of epigenetic alteration in gastric cancer is still missing. In this study, we systematically investigated some key epigenetic features in GC, including DNA methylation and five core histone modifications. Data from The Cancer Genome Atlas Program and other studies (Gene Expression Omnibus) were collected, analyzed, and validated with multivariate statistical analysis methods. The landscape of epi-modifications in gastric cancer was described. Chromatin state transition analysis showed a histone marker shift in gastric cancer genome by employing a Hidden-Markov-Model based approach, indicated that histone marks tend to label different sets of genes in GC compared to control. An additive effect of these epigenetic marks was observed by integrated analysis with gene expression data, suggesting epigenetic modifications may cooperatively regulate gene expression. However, the effect of DNA methylation was found more significant without the presence of the five histone modifications in our study. By constructing a PPI network, key genes to distinguish GC from normal samples were identified, and distinct patterns of oncogenic pathways in GC were revealed. Some of these genes can also serve as potential biomarkers to classify various GC molecular subtypes. Our results provide important insights into the epigenetic regulation in gastric cancer and other cancers in general. This study describes the aberrant epigenetic variation pattern in GC and provides potential direction for epigenetic biomarker discovery.


Sign in / Sign up

Export Citation Format

Share Document