scholarly journals Correlating predicted epigenetic marks with expression data to find interactions between SNPs and genes

2020 ◽  
Author(s):  
Antoine Despinasse ◽  
Yongjin Park ◽  
Michael Lapi ◽  
Manolis Kellis

ABSTRACTDespite all the work done, mapping GWAS SNPs in non-coding regions to their target genes remains a challenge. The SNP can be associated with target genes by eQTL analysis. Here we introduce a method to make these eQTLs more robust. Instead of correlating the gene expression with the SNP value like in eQTLs, we correlate it with epigenomic data. This epigenomic data is very expensive and noisy. We therefore predict the epigenomic data from the DNA sequence using the deep learning framework DeepSEA (Zhou and Troyanskaya, 2015).

Genes ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 807 ◽  
Author(s):  
Pan ◽  
Liu ◽  
Wen ◽  
Liu ◽  
Zhang ◽  
...  

Whole-genome bisulfite sequencing generates a comprehensive profiling of the gene methylation levels, but is limited by a high cost. Recent studies have partitioned the genes into landmark genes and target genes and suggested that the landmark gene expression levels capture adequate information to reconstruct the target gene expression levels. This inspired us to propose that the methylation level of the promoters in landmark genes might be adequate to reconstruct the promoter methylation level of target genes, which would eventually reduce the cost of promoter methylation profiling. Here, we propose a deep learning model called Deep-Gene Promoter Methylation (D-GPM) to predict the whole-genome promoter methylation level based on the promoter methylation profile of the landmark genes from The Cancer Genome Atlas (TCGA). D-GPM-15%-7000 × 5, the optimal architecture of D-GPM, acquires the least overall mean absolute error (MAE) and the highest overall Pearson correlation coefficient (PCC), with values of 0.0329 and 0.8186, respectively, when testing data. Additionally, the D-GPM outperforms the regression tree (RT), linear regression (LR), and the support vector machine (SVM) in 95.66%, 92.65%, and 85.49% of the target genes by virtue of its relatively lower MAE and in 98.25%, 91.00%, and 81.56% of the target genes based on its relatively higher PCC, respectively. More importantly, the D-GPM predominates in predicting 79.86% and 78.34% of the target genes according to the model distribution of the least MAE and the highest PCC, respectively.


Symmetry ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 154 ◽  
Author(s):  
Ho Sun Shon ◽  
Erdenebileg Batbaatar ◽  
Kyoung Ok Kim ◽  
Eun Jong Cha ◽  
Kyung-Ah Kim

Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.


2019 ◽  
Vol 16 (12) ◽  
pp. 5078-5088 ◽  
Author(s):  
Rahul Shahane ◽  
Md. Ismail ◽  
C. S. R. Prabhu

The gene expression classification and identification from DNA microarray data is efficient technique for cancer diagnosis and prognosis for specific cancer subtypes. DNA microarray technology has great potential to discover information from expression levels of thousands of gene. The collection of significant genes which can improve the accuracy can give proper direction in early diagnosis of cancer. Cancer may be of different subtypes. Cancer detection from microarray gene expression data has major challenge of low sample size, high dimensionality and complexity of the data. There is a need for fast and computationally efficient method to deal with these kind of challenges. Deep Learning has succeeded in numerous fields such as image, video, speech, and text processing. Gene expression analysis is a unique challenge to Deep Learning for various cancer detection and prediction tasks in order to set specific biomarkers for different cancer subtypes. In this paper, we briefly discuss the strengths of different Deep Learning architectures for a cancer detection and prediction of various types of cancer through gene expression analysis.


Sign in / Sign up

Export Citation Format

Share Document