scholarly journals Predicting gene expression using DNA methylation in two human populations

Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples. Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes. Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.

2018 ◽  
Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples. Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes. Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6757 ◽  
Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative correlation in the promoter region. However, its correlation with gene expression across genome at human population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples other than RNA samples. Results We examined DNA methylation in the gene region for predicting gene expression across individuals in non-cancer tissues of three human population datasets, adipose tissue of the Multiple Tissue Human Expression Resource Projects (MuTHER), peripheral blood mononuclear cell (PBMC) from Asthma and normal control study participates, and lymphoblastoid cell lines (LCL) from healthy individuals. Three prediction models were investigated, single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, the prediction power is generally low and varies across datasets. Only 30 and 42 genes were found to have cross-validation R2 greater than 0.3 in the PBMC and Adipose datasets, respectively. A substantially larger number of genes (258) were identified in the LCL dataset, which was generated from a more homogeneous cell line sample source. We also demonstrated that it gives better prediction power not to exclude any CpG probe due to cross hybridization or SNP effect. Conclusion In our three population analyses DNA methylation of CpG sites at gene region have limited prediction power for gene expression across individuals with linear regression models. The prediction power potentially varies depending on tissue, cell type, and data sources. In our analyses, the combination of LASSO regression and all probes not excluding any probe on the methylation array provides the best prediction for gene expression.


Genes ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 931 ◽  
Author(s):  
Saurav Mallik ◽  
Soumita Seth ◽  
Tapas Bhadra ◽  
Zhongming Zhao

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Katherine R. Dobbs ◽  
Paula Embury ◽  
Emmily Koech ◽  
Sidney Ogolla ◽  
Stephen Munga ◽  
...  

Abstract Background Age-related changes in adaptive and innate immune cells have been associated with a decline in effective immunity and chronic, low-grade inflammation. Epigenetic, transcriptional, and functional changes in monocytes occur with aging, though most studies to date have focused on differences between young adults and the elderly in populations with European ancestry; few data exist regarding changes that occur in circulating monocytes during the first few decades of life or in African populations. We analyzed DNA methylation profiles, cytokine production, and inflammatory gene expression profiles in monocytes from young adults and children from western Kenya. Results We identified several hypo- and hyper-methylated CpG sites in monocytes from Kenyan young adults vs. children that replicated findings in the current literature of differential DNA methylation in monocytes from elderly persons vs. young adults across diverse populations. Differentially methylated CpG sites were also noted in gene regions important to inflammation and innate immune responses. Monocytes from Kenyan young adults vs. children displayed increased production of IL-8, IL-10, and IL-12p70 in response to TLR4 and TLR2/1 stimulation as well as distinct inflammatory gene expression profiles. Conclusions These findings complement previous reports of age-related methylation changes in isolated monocytes and provide novel insights into the role of age-associated changes in innate immune functions.


2017 ◽  
Vol 121 (suppl_1) ◽  
Author(s):  
Mark E Pepin ◽  
David K Crossman ◽  
Joseph P Barchue ◽  
Salpy V Pamboukian ◽  
Steven M Pogwizd ◽  
...  

To identify the role of glucose in the development of diabetic cardiomyopathy, we had directly assessed glucose delivery to the intact heart on alterations of DNA methylation and gene expression using both an inducible heart-specific transgene (glucose transporter 4; mG4H) and streptozotocin-induced diabetes (STZ) mouse models. We aimed to determine whether long-lasting diabetic complications arise from prior transient exposure to hyperglycemia via a process termed “glycemic memory.” We had identified DNA methylation changes associated with significant gene expression regulation. Comparing our results from STZ, mG4H, and the modifications which persist following transgene silencing, we now provide evidence for cardiac DNA methylation as a persistent epigenetic mark contributing to glycemic memory. To begin to determine which changes contribute to human heart failure, we measured both RNA transcript levels and whole-genome DNA methylation in heart failure biopsy samples (n = 12) from male patients collected at left ventricular assist device placement using RNA-sequencing and Methylation450 assay, respectively. We hypothesized that epigenetic changes such as DNA methylation distinguish between heart failure etiologies. Our findings demonstrated that type 2 diabetic heart failure patients (n = 6) had an overall signature of hypomethylation, whereas patients listed as ischemic (n = 5) had a distinct hypermethylation signature for regulated transcripts. The focus of this initial analysis was on promoter-associated CpG islands with inverse changes in gene transcript levels, from which diabetes (14 genes; e.g. IGFBP4) and ischemic (12 genes; e.g. PFKFB3) specific targets emerged with significant regulation of both measures. By combining our mouse and human molecular analyses, we provide evidence that diabetes mellitus governs direct regulation of cellular function by DNA methylation and the corresponding gene expression in diabetic mouse and human hearts. Importantly, many of the changes seen in either mouse type 1 diabetes or human type 2 diabetes were similar supporting a consistent mechanism of regulation. These studies are some of the first steps at defining mechanisms of epigenetic regulation in diabetic cardiomyopathy.


2021 ◽  
Author(s):  
Jumpei Yamazaki ◽  
Yuki Matsumoto ◽  
Jaroslav Jelinek ◽  
Teita Ishizaki ◽  
Shingo Maeda ◽  
...  

Abstract Background: DNA methylation plays important functions in gene expression regulation that is involved in individual development and various diseases. DNA methylation has been well studied in human and model organisms, but only limited data exist in companion animals like dog. Results: Using methylation-sensitive restriction enzyme-based next generation sequencing (Canine DREAM), we obtained canine DNA methylation maps from 16 somatic tissues. In total, we evaluated 130,861 CpG sites. The majority of CpG sites were either highly methylated (>70%, 52.5%-64.6% of all CpG sites analyzed) or unmethylated (<30%, 22.5%-28.0% of all CpG sites analyzed) which are methylation patterns similar to other species. The overall methylation status of CpG sites across the 32 methylomes were remarkably similar. However, the tissue types were clearly defined by principle component analysis and hierarchical clustering analysis with DNA methylome. We found 6416 CpG sites located closely at promoter region of genes and inverse correlation between DNA methylation and gene expression of these genes. Conclusions: Our study provides basic dataset for DNA methylation profiles in dogs.


2020 ◽  
Author(s):  
Paras Garg ◽  
Alejandro Martin-Trujillo ◽  
Oscar L. Rodriguez ◽  
Scott J. Gies ◽  
Bharati Jadhav ◽  
...  

ABSTRACTVariable Number Tandem Repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, due to their large size and repetitive nature, they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole genome sequencing to perform association analysis between copy number of ~70,000 VNTRs (motif size ≥10bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using large-scale replication analysis in an independent cohort we validated 73-80% of signals observed in the two discovery cohorts, providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we defined sets of VNTRs that show highly divergent copy numbers among human populations, show that these are enriched for regulatory effects on gene expression and epigenetics, and preferentially associate with genes that have been linked with human phenotypes through GWAS. Our study provides strong evidence supporting functional variation at thousands of VNTRs, and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Xiaoling Wang ◽  
Yue Pan ◽  
Haidong Zhu ◽  
Guang Hao ◽  
Xin Wang ◽  
...  

Background: Several large-scale epigenome wide association studies on obesity-related DNA methylation changes have been published and in total identified 46 CpG sites. These studies were conducted in middle-aged and older adults of Caucasians and African Americans (AAs) using leukocytes. To what extend these signals are independent of cell compositions as well as to what extend they may influence gene expression have not been systematically investigated. Furthermore, the high prevalence of obesity comorbidities in middle-aged or older population may hide or bias obesity itself related DNA methylation changes. Methods: In this study of healthy AA youth and young adults, genome wide DNA methylation data from leukocytes were obtained from three independent studies: EpiGO study (96 obese cases vs. 92 lean controls, aged 14-21, 50% females, test of interest is obesity status), LACHY study (284 participants from general population, aged 14-18, 50% females, test of interest is BMI), and Georgia Stress and Heart study (298 participants from general population, aged 18-38, 52% females, test of interest is BMI) using the Infinium HumanMethylation450 BeadChip. Genome wide DNA methylation data from purified neutrophils as well as genome wide gene expression data from leukocytes using Illumina HT12 V4 array were also obtained for the EpiGO samples. Results: The meta-analysis on the 3 cohorts identified 76 obesity related CpG sites in leukocytes with p<1х10 -7 . Out of the 46 previously identified CpG sites, 36 can be replicated in this AA youth and young adult sample with same direction and p<0.05. Out of the 107 CpG sites including the 36 replicated ones and the 71 newly identified ones, 71 CpG sites (66%) had their relationship with obesity replicated in purified neutrophils (p<0.05). The analysis on the cis regulation of the 107 CpG sites on gene expression showed that 59 CpG sites had at least one gene within 250kb having expression difference between obese cases and lean controls. Furthermore, out of the 59 CpG sites, 6 showed significantly negative correlations and 1 showed significantly positive correlation with the differentially expressed genes. These CpG sites located in SOCS3, CISH, ABCG1, PIM3 and PTGDS genes. Conclusion: In this study of AA youth and young adults, we identified novel CpG sites associated with obesity and replicated majority of the CpG sites previously identified in middle-aged and older adults. For the first time, we showed that majority of the obesity related CpG sites identified from leukocytes are not driven by cell compositions and provided the direct link between DNA methylation-gene expression-obesity status for 7 CpG sites in 5 genes.


2020 ◽  
Vol 21 (12) ◽  
pp. 4476
Author(s):  
Marcela A S Pinhel ◽  
Natália Y Noronha ◽  
Carolina F Nicoletti ◽  
Vanessa AB Pereira ◽  
Bruno AP de Oliveira ◽  
...  

Weight regulation and the magnitude of weight loss after a Roux-en-Y gastric bypass (RYGB) can be genetically determined. DNA methylation patterns and the expression of some genes can be altered after weight loss interventions, including RYGB. The present study aimed to evaluate how the gene expression and DNA methylation of PIK3R1, an obesity and insulin-related gene, change after RYGB. Blood samples were obtained from 13 women (35.9 ± 9.2 years) with severe obesity before and six months after surgical procedure. Whole blood transcriptome and epigenomic patterns were assessed by microarray-based, genome-wide technologies. A total of 1966 differentially expressed genes were identified in the pre- and postoperative periods of RYGB. From these, we observed that genes involved in obesity and insulin pathways were upregulated after surgery. Then, the PIK3R1 gene was selected for further RT-qPCR analysis and cytosine-guanine nucleotide (CpG) sites methylation evaluation. We observed that the PI3KR1 gene was upregulated, and six DNA methylation CpG sites were differently methylated after bariatric surgery. In conclusion, we found that RYGB upregulates genes involved in obesity and insulin pathways.


2019 ◽  
Vol 35 (19) ◽  
pp. 3786-3793 ◽  
Author(s):  
Pietro Di Lena ◽  
Claudia Sala ◽  
Andrea Prodi ◽  
Christine Nardini

Abstract Motivation DNA methylation is a stable epigenetic mark with major implications in both physiological (development, aging) and pathological conditions (cancers and numerous diseases). Recent research involving methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge). An increasing number of studies indicate that divergences between mAge and chronological age may be associated to age-related diseases. Current advances in high-throughput technologies have allowed the characterization of DNA methylation levels throughout the human genome. However, experimental methylation profiles often contain multiple missing values that can affect the analysis of the data and also mAge estimation. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. Specific methods for imputing missing methylation data are therefore needed. Results We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of the impact imputed values have on mAge estimation. In comparison to existing methods, our linear regression model proves to perform equally or better and with good computational efficiency. The results of our analysis provide recommendations for accurate estimation of missing methylation values. Availability and implementation The R-package methyLImp is freely available at https://github.com/pdilena/methyLImp. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document