scholarly journals Human Age Prediction Based on DNA Methylation Using a Gradient Boosting Regressor

Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 424 ◽  
Author(s):  
Xingyan Li ◽  
Weidong Li ◽  
Yan Xu

All tissues of organisms will become old as time goes on. In recent years, epigenetic investigations have found that there is a close correlation between DNA methylation and aging. With the development of DNA methylation research, a quantitative statistical relationship between DNA methylation and different ages was established based on the change rule of methylation with age, it is then possible to predict the age of individuals. All the data in this work were retrieved from the Illumina HumanMethylation BeadChip platform (27K or 450K). We analyzed 16 sets of healthy samples and 9 sets of diseased samples. The healthy samples included a total of 1899 publicly available blood samples (0–103 years old) and the diseased samples included 2395 blood samples. Six age-related CpG sites were selected through calculating Pearson correlation coefficients between age and DNA methylation values. We built a gradient boosting regressor model for these age-related CpG sites. 70% of the data was randomly selected as training data and the other 30% as independent data in each dataset for 25 runs in total. In the training dataset, the healthy samples showed that the correlation between predicted age and DNA methylation was 0.97, and the mean absolute deviation (MAD) was 2.72 years. In the independent dataset, the MAD was 4.06 years. The proposed model was further tested using the diseased samples. The MAD was 5.44 years for the training dataset and 7.08 years for the independent dataset. Furthermore, our model worked well when it was applied to saliva samples. These results illustrated that the age prediction based on six DNA methylation markers is very effective using the gradient boosting regressor.

Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 870
Author(s):  
Jiansheng Zhang ◽  
Hongli Fu ◽  
Yan Xu

In recent years, scientists have found a close correlation between DNA methylation and aging in epigenetics. With the in-depth research in the field of DNA methylation, researchers have established a quantitative statistical relationship to predict the individual ages. This work used human blood tissue samples to study the association between age and DNA methylation. We built two predictors based on healthy and disease data, respectively. For the health data, we retrieved a total of 1191 samples from four previous reports. By calculating the Pearson correlation coefficient between age and DNA methylation values, 111 age-related CpG sites were selected. Gradient boosting regression was utilized to build the predictive model and obtained the R2 value of 0.86 and MAD of 3.90 years on testing dataset, which were better than other four regression methods as well as Horvath’s results. For the disease data, 354 rheumatoid arthritis samples were retrieved from a previous study. Then, 45 CpG sites were selected to build the predictor and the corresponded MAD and R2 were 3.11 years and 0.89 on the testing dataset respectively, which showed the robustness of our predictor. Our results were better than the ones from other four regression methods. Finally, we also analyzed the twenty-four common CpG sites in both healthy and disease datasets which illustrated the functional relevance of the selected CpG sites.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 969
Author(s):  
Zahra Momeni ◽  
Mohammad Saniee Abadeh

Genomic biomarkers such as DNA methylation (DNAm) are employed for age prediction. In recent years, several studies have suggested the association between changes in DNAm and its effect on human age. The high dimensional nature of this type of data significantly increases the execution time of modeling algorithms. To mitigate this problem, we propose a two-stage parallel algorithm for selection of age related CpG-sites. The algorithm first attempts to cluster the data into similar age ranges. In the next stage, a parallel genetic algorithm (PGA), based on the MapReduce paradigm (MR-based PGA), is used for selecting age-related features of each individual age range. In the proposed method, the execution of the algorithm for each age range (data parallel), the evaluation of chromosomes (task parallel) and the calculation of the fitness function (data parallel) are performed using a novel parallel framework. In this paper, we consider 16 different healthy DNAm datasets that are related to the human blood tissue and that contain the relevant age information. These datasets are combined into a single unioned set, which is in turn randomly divided into two sets of train and test data with a ratio of 7:3, respectively. We build a Gradient Boosting Regressor (GBR) model on the selected CpG-sites from the train set. To evaluate the model accuracy, we compared our results with state-of-the-art approaches that used these datasets, and observed that our method performs better on the unseen test dataset with a Mean Absolute Deviation (MAD) of 3.62 years, and a correlation (R2) of 95.96% between age and DNAm. In the train data, the MAD and R2 are 1.27 years and 99.27%, respectively. Finally, we evaluate our method in terms of the effect of parallelization in computation time. The algorithm without parallelization requires 4123 min to complete, whereas the parallelized execution on 3 computing machines having 32 processing cores each, only takes a total of 58 min. This shows that our proposed algorithm is both efficient and scalable.


2021 ◽  
Author(s):  
Lucas Paulo de Lima ◽  
Louis R Lapierre ◽  
Ritambhara Singh

Several age predictors based on DNA methylation, dubbed epigenetic clocks, have been created in recent years. Their accuracy and potential for generalization vary widely based on the training data. Here, we gathered 143 publicly available data sets from several human tissues to develop AltumAge, a highly accurate and precise age predictor based on deep learning. Compared to Horvath's 2013 model, AltumAge performs better across both normal and malignant tissues and is more generalizable to new data sets. Interestingly, it can predict gestational week from placental tissue with low error. Lastly, we used deep learning interpretation methods to learn which methylation sites contributed to the final model predictions. We observed that while most important CpG sites are linearly related to age, some highly-interacting CpG sites can influence the relevance of such relationships. We studied the associated genes of these CpG sites and found literary evidence of their involvement in age-related gene regulation. Using chromatin annotations, we observed that the CpG sites with the highest contribution to the model predictions were related to heterochromatin and gene regulatory regions in the genome. We also found age-related KEGG pathways for genes containing these CpG sites. In general, neural networks are better predictors due to their ability to capture complex feature interactions compared to the typically used regularized linear regression. Altogether, our neural network approach provides significant improvement and flexibility to current epigenetic clocks without sacrificing model interpretability.


2019 ◽  
Vol 117 (38) ◽  
pp. 23329-23335 ◽  
Author(s):  
Lisa M. McEwen ◽  
Kieran J. O’Donnell ◽  
Megan G. McGill ◽  
Rachel D. Edgar ◽  
Meaghan J. Jones ◽  
...  

The development of biological markers of aging has primarily focused on adult samples. Epigenetic clocks are a promising tool for measuring biological age that show impressive accuracy across most tissues and age ranges. In adults, deviations from the DNA methylation (DNAm) age prediction are correlated with several age-related phenotypes, such as mortality and frailty. In children, however, fewer such associations have been made, possibly because DNAm changes are more dynamic in pediatric populations as compared to adults. To address this gap, we aimed to develop a highly accurate, noninvasive, biological measure of age specific to pediatric samples using buccal epithelial cell DNAm. We gathered 1,721 genome-wide DNAm profiles from 11 different cohorts of typically developing individuals aged 0 to 20 y old. Elastic net penalized regression was used to select 94 CpG sites from a training dataset (n= 1,032), with performance assessed in a separate test dataset (n= 689). DNAm at these 94 CpG sites was highly predictive of age in the test cohort (median absolute error = 0.35 y). The Pediatric-Buccal-Epigenetic (PedBE) clock was characterized in additional cohorts, showcasing the accuracy in longitudinal data, the performance in nonbuccal tissues and adult age ranges, and the association with obstetric outcomes. The PedBE tool for measuring biological age in children might help in understanding the environmental and contextual factors that shape the DNA methylome during child development, and how it, in turn, might relate to child health and disease.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Katherine R. Dobbs ◽  
Paula Embury ◽  
Emmily Koech ◽  
Sidney Ogolla ◽  
Stephen Munga ◽  
...  

Abstract Background Age-related changes in adaptive and innate immune cells have been associated with a decline in effective immunity and chronic, low-grade inflammation. Epigenetic, transcriptional, and functional changes in monocytes occur with aging, though most studies to date have focused on differences between young adults and the elderly in populations with European ancestry; few data exist regarding changes that occur in circulating monocytes during the first few decades of life or in African populations. We analyzed DNA methylation profiles, cytokine production, and inflammatory gene expression profiles in monocytes from young adults and children from western Kenya. Results We identified several hypo- and hyper-methylated CpG sites in monocytes from Kenyan young adults vs. children that replicated findings in the current literature of differential DNA methylation in monocytes from elderly persons vs. young adults across diverse populations. Differentially methylated CpG sites were also noted in gene regions important to inflammation and innate immune responses. Monocytes from Kenyan young adults vs. children displayed increased production of IL-8, IL-10, and IL-12p70 in response to TLR4 and TLR2/1 stimulation as well as distinct inflammatory gene expression profiles. Conclusions These findings complement previous reports of age-related methylation changes in isolated monocytes and provide novel insights into the role of age-associated changes in innate immune functions.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhonghui Thong ◽  
Jolena Ying Ying Tan ◽  
Eileen Shuzhen Loo ◽  
Yu Wei Phua ◽  
Xavier Liang Shun Chan ◽  
...  

An amendment to this paper has been published and can be accessed via a link at the top of the paper.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yunsung Lee ◽  
Kristine L. Haftorn ◽  
William R. P. Denault ◽  
Haakon E. Nustad ◽  
Christian M. Page ◽  
...  

Abstract Background Epigenetic clocks have been recognized for their precise prediction of chronological age, age-related diseases, and all-cause mortality. Existing epigenetic clocks are based on CpGs from the Illumina HumanMethylation450 BeadChip (450 K) which has now been replaced by the latest platform, Illumina MethylationEPIC BeadChip (EPIC). Thus, it remains unclear to what extent EPIC contributes to increased precision and accuracy in the prediction of chronological age. Results We developed three blood-based epigenetic clocks for human adults using EPIC-based DNA methylation (DNAm) data from the Norwegian Mother, Father and Child Cohort Study (MoBa) and the Gene Expression Omnibus (GEO) public repository: 1) an Adult Blood-based EPIC Clock (ABEC) trained on DNAm data from MoBa (n = 1592, age-span: 19 to 59 years), 2) an extended ABEC (eABEC) trained on DNAm data from MoBa and GEO (n = 2227, age-span: 18 to 88 years), and 3) a common ABEC (cABEC) trained on the same training set as eABEC but restricted to CpGs common to 450 K and EPIC. Our clocks showed high precision (Pearson correlation between chronological and epigenetic age (r) > 0.94) in independent cohorts, including GSE111165 (n = 15), GSE115278 (n = 108), GSE132203 (n = 795), and the Epigenetics in Pregnancy (EPIPREG) study of the STORK Groruddalen Cohort (n = 470). This high precision is unlikely due to the use of EPIC, but rather due to the large sample size of the training set. Conclusions Our ABECs predicted adults’ chronological age precisely in independent cohorts. As EPIC is now the dominant platform for measuring DNAm, these clocks will be useful in further predictions of chronological age, age-related diseases, and mortality.


2014 ◽  
Vol 11 ◽  
pp. 117-125 ◽  
Author(s):  
Shao Hua Yi ◽  
Long Chang Xu ◽  
Kun Mei ◽  
Rong Zhi Yang ◽  
Dai Xin Huang

2020 ◽  
Author(s):  
Katherine Rose Dobbs ◽  
Paula Embury ◽  
Emmily Koech ◽  
Sidney Ogolla ◽  
Stephen Munga ◽  
...  

Abstract Background: Age-related changes in adaptive and innate immune cells have been associated with a decline in effective immunity and chronic, low-grade inflammation. Epigenetic, transcriptional, and functional changes in monocytes occur with aging, though most studies to date have focused on differences between young adults and the elderly in populations with European ancestry; few data exist regarding changes that occur in circulating monocytes during the first few decades of life or in African populations. We analyzed DNA methylation profiles, cytokine production, and inflammatory gene expression profi 24 les in monocytes from young adults and children from western Kenya.Results: We identified several hypo- and hyper-methylated CpG sites in monocytes from Kenyan young adults vs. children that replicated findings in the current literature of differential DNA methylation in monocytes from elderly persons vs. young adults across diverse populations. Differentially methylated CpG sites were also noted in gene regions important to inflammation and innate immune responses. Monocytes from Kenyan young adults vs. children displayed increased production of IL-8, IL-10, and IL-12p70 in response to TLR4 and TLR2/1 stimulation as well as distinct inflammatory gene expression profiles.Conclusions: These findings complement previous reports of age-related methylation changes in isolated monocytes and provide novel insights into the role of age-associated changes in innate immune functions.


Sign in / Sign up

Export Citation Format

Share Document