scholarly journals Predicting gene expression using DNA methylation in three human populations

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6757 ◽  
Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative correlation in the promoter region. However, its correlation with gene expression across genome at human population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples other than RNA samples. Results We examined DNA methylation in the gene region for predicting gene expression across individuals in non-cancer tissues of three human population datasets, adipose tissue of the Multiple Tissue Human Expression Resource Projects (MuTHER), peripheral blood mononuclear cell (PBMC) from Asthma and normal control study participates, and lymphoblastoid cell lines (LCL) from healthy individuals. Three prediction models were investigated, single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, the prediction power is generally low and varies across datasets. Only 30 and 42 genes were found to have cross-validation R2 greater than 0.3 in the PBMC and Adipose datasets, respectively. A substantially larger number of genes (258) were identified in the LCL dataset, which was generated from a more homogeneous cell line sample source. We also demonstrated that it gives better prediction power not to exclude any CpG probe due to cross hybridization or SNP effect. Conclusion In our three population analyses DNA methylation of CpG sites at gene region have limited prediction power for gene expression across individuals with linear regression models. The prediction power potentially varies depending on tissue, cell type, and data sources. In our analyses, the combination of LASSO regression and all probes not excluding any probe on the methylation array provides the best prediction for gene expression.

2018 ◽  
Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples. Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes. Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.


2018 ◽  
Author(s):  
Huan Zhong ◽  
Soyeon Kim ◽  
Degui Zhi ◽  
Xiangqin Cui

Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples. Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes. Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.


2017 ◽  
Vol 121 (suppl_1) ◽  
Author(s):  
Mark E Pepin ◽  
David K Crossman ◽  
Joseph P Barchue ◽  
Salpy V Pamboukian ◽  
Steven M Pogwizd ◽  
...  

To identify the role of glucose in the development of diabetic cardiomyopathy, we had directly assessed glucose delivery to the intact heart on alterations of DNA methylation and gene expression using both an inducible heart-specific transgene (glucose transporter 4; mG4H) and streptozotocin-induced diabetes (STZ) mouse models. We aimed to determine whether long-lasting diabetic complications arise from prior transient exposure to hyperglycemia via a process termed “glycemic memory.” We had identified DNA methylation changes associated with significant gene expression regulation. Comparing our results from STZ, mG4H, and the modifications which persist following transgene silencing, we now provide evidence for cardiac DNA methylation as a persistent epigenetic mark contributing to glycemic memory. To begin to determine which changes contribute to human heart failure, we measured both RNA transcript levels and whole-genome DNA methylation in heart failure biopsy samples (n = 12) from male patients collected at left ventricular assist device placement using RNA-sequencing and Methylation450 assay, respectively. We hypothesized that epigenetic changes such as DNA methylation distinguish between heart failure etiologies. Our findings demonstrated that type 2 diabetic heart failure patients (n = 6) had an overall signature of hypomethylation, whereas patients listed as ischemic (n = 5) had a distinct hypermethylation signature for regulated transcripts. The focus of this initial analysis was on promoter-associated CpG islands with inverse changes in gene transcript levels, from which diabetes (14 genes; e.g. IGFBP4) and ischemic (12 genes; e.g. PFKFB3) specific targets emerged with significant regulation of both measures. By combining our mouse and human molecular analyses, we provide evidence that diabetes mellitus governs direct regulation of cellular function by DNA methylation and the corresponding gene expression in diabetic mouse and human hearts. Importantly, many of the changes seen in either mouse type 1 diabetes or human type 2 diabetes were similar supporting a consistent mechanism of regulation. These studies are some of the first steps at defining mechanisms of epigenetic regulation in diabetic cardiomyopathy.


2020 ◽  
Author(s):  
Paras Garg ◽  
Alejandro Martin-Trujillo ◽  
Oscar L. Rodriguez ◽  
Scott J. Gies ◽  
Bharati Jadhav ◽  
...  

ABSTRACTVariable Number Tandem Repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, due to their large size and repetitive nature, they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole genome sequencing to perform association analysis between copy number of ~70,000 VNTRs (motif size ≥10bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using large-scale replication analysis in an independent cohort we validated 73-80% of signals observed in the two discovery cohorts, providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we defined sets of VNTRs that show highly divergent copy numbers among human populations, show that these are enriched for regulatory effects on gene expression and epigenetics, and preferentially associate with genes that have been linked with human phenotypes through GWAS. Our study provides strong evidence supporting functional variation at thousands of VNTRs, and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.


2019 ◽  
Vol 35 (19) ◽  
pp. 3786-3793 ◽  
Author(s):  
Pietro Di Lena ◽  
Claudia Sala ◽  
Andrea Prodi ◽  
Christine Nardini

Abstract Motivation DNA methylation is a stable epigenetic mark with major implications in both physiological (development, aging) and pathological conditions (cancers and numerous diseases). Recent research involving methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge). An increasing number of studies indicate that divergences between mAge and chronological age may be associated to age-related diseases. Current advances in high-throughput technologies have allowed the characterization of DNA methylation levels throughout the human genome. However, experimental methylation profiles often contain multiple missing values that can affect the analysis of the data and also mAge estimation. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. Specific methods for imputing missing methylation data are therefore needed. Results We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of the impact imputed values have on mAge estimation. In comparison to existing methods, our linear regression model proves to perform equally or better and with good computational efficiency. The results of our analysis provide recommendations for accurate estimation of missing methylation values. Availability and implementation The R-package methyLImp is freely available at https://github.com/pdilena/methyLImp. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 116 (14) ◽  
pp. 6938-6943 ◽  
Author(s):  
Alain Pacis ◽  
Florence Mailhot-Léonard ◽  
Ludovic Tailleux ◽  
Haley E. Randolph ◽  
Vania Yotova ◽  
...  

DNA methylation is considered to be a relatively stable epigenetic mark. However, a growing body of evidence indicates that DNA methylation levels can change rapidly; for example, in innate immune cells facing an infectious agent. Nevertheless, the causal relationship between changes in DNA methylation and gene expression during infection remains to be elucidated. Here, we generated time-course data on DNA methylation, gene expression, and chromatin accessibility patterns during infection of human dendritic cells withMycobacterium tuberculosis. We found that the immune response to infection is accompanied by active demethylation of thousands of CpG sites overlapping distal enhancer elements. However, virtually all changes in gene expression in response to infection occur before detectable changes in DNA methylation, indicating that the observed losses in methylation are a downstream consequence of transcriptional activation. Footprinting analysis revealed that immune-related transcription factors (TFs), such as NF-κB/Rel, are recruited to enhancer elements before the observed losses in methylation, suggesting that DNA demethylation is mediated by TF binding to cis-acting elements. Collectively, our results show that DNA demethylation plays a limited role to the establishment of the core regulatory program engaged upon infection.


Author(s):  
Chubing Zeng ◽  
Duncan Campbell Thomas ◽  
Juan Pablo Lewinger

AbstractMotivationAssociated with genomic features like gene expression, methylation, and genotypes, used in statistical modeling of health outcomes, there is a rich set of meta-features like functional annotations, pathway information, and knowledge from previous studies, that can be used post-hoc to facilitate the interpretation of a model. However, using this meta-feature information a-priori rather than post-hoc can yield improved prediction performance as well as enhanced model interpretation.ResultsWe propose a new penalized regression approach that allows a-priori integration of external meta-features. The method extends LASSO regression by incorporating individualized penalty parameters for each regression coefficient. The penalty parameters are in turn modeled as a log-linear function of the meta-features and are estimated from the data using an approximate empirical Bayes approach. Optimization of the marginal likelihood on which the empirical Bayes estimation is based is performed using a fast and stable majorization-minimization procedure. Through simulations, we show that the proposed regression with individualized penalties can outperform the standard LASSO in terms of both parameters estimation and prediction performance when the external data is informative. We further demonstrate our approach with applications to gene expression studies of bone density and breast cancer.Availability and implementationThe methods have been implemented in the R package xtune freely available for download from CRAN.


2020 ◽  
Author(s):  
Sean K. Maden ◽  
Reid F. Thompson ◽  
Kasper D. Hansen ◽  
Abhinav Nellore

AbstractWhile DNA methylation (DNAm) is the most-studied epigenetic mark, few recent studies probe the breadth of publicly available DNAm array samples. We collectively analyzed 35,360 Illumina Infinium HumanMethylation450K DNAm array samples published on the Gene Expression Omnibus (GEO). We learned a controlled vocabulary of sample labels by applying regular expressions to metadata and used existing models to predict various sample properties including epigenetic age. We found approximately two-thirds of samples were from blood, one-quarter were from brain, and one-third were from cancer patients. 19% of samples failed at least one of Illumina’s 17 prescribed quality assessments; signal distributions across samples suggest modifying manufacturer-recommended thresholds for failure would make these assessments more informative. We further analyzed DNAm variances in seven tissues (adipose, nasal, blood, brain, buccal, sperm, and liver) and characterized specific probes distinguishing them. Finally, we compiled DNAm array data and metadata, including our learned and predicted sample labels, into database files accessible via the recountmethylation R/Bioconductor companion package. Its vignettes walk the user through some analyses contained in this paper.


2021 ◽  
Author(s):  
Kurosh S Mehershahi ◽  
Swaine Chen

DNA methylation is a common epigenetic mark that influences transcriptional regulation, and therefore cellular phenotype, across all domains of life, extending also to bacterial virulence. Both orphan methyltransferases and those from restriction modification systems (RMSs) have been co-opted to regulate virulence epigenetically in many bacteria. However, the potential regulatory role of DNA methylation mediated by archetypal Type I systems in Escherichia coli has never been studied. We demonstrated that removal of DNA methylated mediated by three different Escherichia coli Type I RMSs in three distinct E. coli strains had no detectable effect on gene expression or growth in a screen of 1190 conditions. Additionally, deletion of the Type I RMS EcoUTI in UTI89, a prototypical cystitis strain of E. coli , which led to loss of methylation at >750 sites across the genome, had no detectable effect on virulence in a murine model of ascending urinary tract infection (UTI). Finally, introduction of two heterologous Type I RMSs into UTI89 also resulted in no detectable change in gene expression or growth phenotypes. These results stand in sharp contrast with many reports of RMSs regulating gene expression in other bacteria, leading us to propose the concept of “regulation avoidance” for these E. coli Type I RMSs. We hypothesize that regulation avoidance is a consequence of evolutionary adaptation of both the RMSs and the E. coli genome. Our results provide a clear and (currently) rare example of regulation avoidance for Type I RMSs in multiple strains of E. coli , further study of which may provide deeper insights into the evolution of gene regulation and horizontal gene transfer.


Sign in / Sign up

Export Citation Format

Share Document