scholarly journals A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data

Genes ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 931 ◽  
Author(s):  
Saurav Mallik ◽  
Soumita Seth ◽  
Tapas Bhadra ◽  
Zhongming Zhao

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weijing Wang ◽  
Weilong Li ◽  
Yili Wu ◽  
Xiaocao Tian ◽  
Haiping Duan ◽  
...  

AbstractDepression is currently the leading cause of disability around the world. We conducted an epigenome-wide association study (EWAS) in a sample of 58 depression score-discordant monozygotic twin pairs, aiming to detect specific epigenetic variants potentially related to depression and further integrate with gene expression profile data. Association between the methylation level of each CpG site and depression score was tested by applying a linear mixed effect model. Weighted gene co-expression network analysis (WGCNA) was performed for gene expression data. The association of DNA methylation levels of 66 CpG sites with depression score reached the level of P < 1 × 10−4. These top CpG sites were located at 34 genes, especially PTPRN2, HES5, GATA2, PRDM7, and KCNIP1. Many ontology enrichments were highlighted, including Notch signaling pathway, Huntington disease, p53 pathway by glucose deprivation, hedgehog signaling pathway, DNA binding, and nucleic acid metabolic process. We detected 19 differentially methylated regions (DMRs), some of which were located at GRIK2, DGKA, and NIPA2. While integrating with gene expression data, HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed. In WGCNA, one specific module was positively correlated with depression score (r = 0.62, P = 0.002). Some common genes (including BMP2, PRDM7, KCNIP1, and GRIK2) and enrichment terms (including complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, and thyroid gland development) were both identified in methylation analysis and WGCNA. Our study identifies specific epigenetic variations which are significantly involved in regions, functional genes, biological function, and pathways that mediate depression disorder.


2020 ◽  
Vol 14 ◽  
Author(s):  
Mette Soerensen ◽  
Dominika Marzena Hozakowska-Roszkowska ◽  
Marianne Nygaard ◽  
Martin J. Larsen ◽  
Veit Schwämmle ◽  
...  

2015 ◽  
Vol 11 (7) ◽  
pp. 1786-1793 ◽  
Author(s):  
Yuanyuan Zhang ◽  
Junying Zhang

DNA methylation is essential not only in cellular differentiation but also in diseases.


Symmetry ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 154 ◽  
Author(s):  
Ho Sun Shon ◽  
Erdenebileg Batbaatar ◽  
Kyoung Ok Kim ◽  
Eun Jong Cha ◽  
Kyung-Ah Kim

Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.


BMC Genomics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Behrooz Torabi Moghadam ◽  
Neda Zamani ◽  
Jan Komorowski ◽  
Manfred Grabherr

Sign in / Sign up

Export Citation Format

Share Document