A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.

Download Full-text

Genome-wide DNA methylation and gene expression analyses in monozygotic twins identify potential biomarkers of depression

Translational Psychiatry ◽

10.1038/s41398-021-01536-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Weijing Wang ◽

Weilong Li ◽

Yili Wu ◽

Xiaocao Tian ◽

Haiping Duan ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Dna Binding ◽

Signaling Pathway ◽

Gene Expression Data ◽

Depression Score ◽

Linear Mixed Effect Model ◽

Expression Data ◽

Mixed Effect ◽

Cpg Sites

AbstractDepression is currently the leading cause of disability around the world. We conducted an epigenome-wide association study (EWAS) in a sample of 58 depression score-discordant monozygotic twin pairs, aiming to detect specific epigenetic variants potentially related to depression and further integrate with gene expression profile data. Association between the methylation level of each CpG site and depression score was tested by applying a linear mixed effect model. Weighted gene co-expression network analysis (WGCNA) was performed for gene expression data. The association of DNA methylation levels of 66 CpG sites with depression score reached the level of P < 1 × 10−4. These top CpG sites were located at 34 genes, especially PTPRN2, HES5, GATA2, PRDM7, and KCNIP1. Many ontology enrichments were highlighted, including Notch signaling pathway, Huntington disease, p53 pathway by glucose deprivation, hedgehog signaling pathway, DNA binding, and nucleic acid metabolic process. We detected 19 differentially methylated regions (DMRs), some of which were located at GRIK2, DGKA, and NIPA2. While integrating with gene expression data, HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed. In WGCNA, one specific module was positively correlated with depression score (r = 0.62, P = 0.002). Some common genes (including BMP2, PRDM7, KCNIP1, and GRIK2) and enrichment terms (including complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, and thyroid gland development) were both identified in methylation analysis and WGCNA. Our study identifies specific epigenetic variations which are significantly involved in regions, functional genes, biological function, and pathways that mediate depression disorder.

Download Full-text

Gene Expression Data Based Deep Learning Model for Accurate Prediction of Drug-Induced Liver Injury in Advance

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00143 ◽

2019 ◽

Vol 59 (7) ◽

pp. 3240-3250 ◽

Cited By ~ 3

Author(s):

Chunlai Feng ◽

Hengwei Chen ◽

Xianqin Yuan ◽

Mengqiu Sun ◽

Kexin Chu ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Liver Injury ◽

Gene Expression Data ◽

Learning Model ◽

Accurate Prediction ◽

Expression Data ◽

Drug Induced ◽

Drug Induced Liver Injury ◽

Deep Learning Model

Download Full-text

Imaging Biomarkers and Gene Expression Data Correlation Framework for Lung Cancer Radiogenomics Analysis Based on Deep Learning

IEEE Access ◽

10.1109/access.2021.3071466 ◽

2021 ◽

pp. 1-1

Author(s):

Dong Sui ◽

Maozu Guo ◽

Xiaoxuan Ma ◽

Julian Baptiste ◽

Lei Zhang

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Deep Learning ◽

Gene Expression Data ◽

Imaging Biomarkers ◽

Expression Data ◽

Data Correlation

Download Full-text

A Genome-Wide Integrative Association Study of DNA Methylation and Gene Expression Data and Later Life Cognitive Functioning in Monozygotic Twins

Frontiers in Neuroscience ◽

10.3389/fnins.2020.00233 ◽

2020 ◽

Vol 14 ◽

Cited By ~ 1

Author(s):

Mette Soerensen ◽

Dominika Marzena Hozakowska-Roszkowska ◽

Marianne Nygaard ◽

Martin J. Larsen ◽

Veit Schwämmle ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Cognitive Functioning ◽

Association Study ◽

Gene Expression Data ◽

Monozygotic Twins ◽

Later Life ◽

Expression Data ◽

Genome Wide ◽

A Genome

Download Full-text

Microarray Gene Expression Data for Detection Alzheimer’s Disease Using k-means and Deep Learning

2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic" (IEC) ◽

10.1109/iec52205.2021.9476128 ◽

2021 ◽

Author(s):

Heba M. AL-Bermany ◽

Sura Z. AL-Rashid

Keyword(s):

Gene Expression ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Deep Learning ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Identification of functionally methylated regions based on discriminant analysis through integrating methylation and gene expression data

Molecular BioSystems ◽

10.1039/c5mb00141b ◽

2015 ◽

Vol 11 (7) ◽

pp. 1786-1793 ◽

Cited By ~ 4

Author(s):

Yuanyuan Zhang ◽

Junying Zhang

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Discriminant Analysis ◽

Gene Expression Data ◽

Cellular Differentiation ◽

Expression Data

DNA methylation is essential not only in cellular differentiation but also in diseases.

Download Full-text

Deep learning for stage prediction in neuroblastoma using gene expression data

Genomics & Informatics ◽

10.5808/gi.2019.17.3.e30 ◽

2019 ◽

Vol 17 (3) ◽

pp. e30 ◽

Cited By ~ 1

Author(s):

Aron Park ◽

Seungyoon Nam

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Gene Expression Data ◽

Expression Data

Download Full-text

Classification of Kidney Cancer Data Using Cost-Sensitive Hybrid Deep Learning Approach

Symmetry ◽

10.3390/sym12010154 ◽

2020 ◽

Vol 12 (1) ◽

pp. 154 ◽

Cited By ~ 5

Author(s):

Ho Sun Shon ◽

Erdenebileg Batbaatar ◽

Kyoung Ok Kim ◽

Eun Jong Cha ◽

Kyung-Ah Kim

Keyword(s):

Gene Expression ◽

Data Mining ◽

Deep Learning ◽

Kidney Cancer ◽

Gene Expression Data ◽

Genomic Data ◽

Classification Model ◽

Expression Data ◽

Cancer Data ◽

Prognosis Prediction

Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.

Download Full-text