scholarly journals Comparing the genetic and environmental architecture of blood count, blood biochemistry and urine biochemistry biological ages with machine learning

Author(s):  
Alan Le Goallec ◽  
Samuel Diai ◽  
Theo Vincent ◽  
Chirag J Patel

While a large number of biological age predictors have been built from blood samples, a blood count-based biological age predictor is lacking, and the genetic and environmental factors associated with blood-measured accelerated aging remain elusive. In the following, we leveraged 31 blood count biomarkers measured from 489,079 blood samples, 28 blood biochemistry biomarkers measured from 245,147 blood samples, and four urine biochemistry biomarkers measured from 158,381 samples to build three distinct biological age predictors by training machine learning models to predict age. Blood biochemistry significantly outperformed blood count and urine biochemistry in terms of age prediction (RMSE: 5.92+-0.02 vs. 7.60+-0.02 years and 7.72+-0.04 years). We performed genome wide association studies [GWASs], and found accelerated blood biochemistry, blood count and urine biochemistry aging to be respectively 26.2+-0.3%, 18.1+-0.2% and 10.5+-0.5% GWAS-heritable. We identified 1,081 single nucleotide polymorphisms [SNPs] associated with accelerated blood biochemistry aging, 2,636 SNPs associated with accelerated blood cells aging and 24 SNPs associated with accelerated urine biochemistry aging. Similarly, we identified biomarkers, clinical phenotypes, diseases, environmental and socioeconomic factors associated with accelerated blood biochemistry, blood cells and urine biochemistry aging.

2021 ◽  
Vol 11 (3) ◽  
pp. 195
Author(s):  
Yitang Sun ◽  
Jingqi Zhou ◽  
Kaixiong Ye

Increasing evidence shows that white blood cells are associated with the risk of coronavirus disease 2019 (COVID-19), but the direction and causality of this association are not clear. To evaluate the causal associations between various white blood cell traits and the COVID-19 susceptibility and severity, we conducted two-sample bidirectional Mendelian Randomization (MR) analyses with summary statistics from the largest and most recent genome-wide association studies. Our MR results indicated causal protective effects of higher basophil count, basophil percentage of white blood cells, and myeloid white blood cell count on severe COVID-19, with odds ratios (OR) per standard deviation increment of 0.75 (95% CI: 0.60–0.95), 0.70 (95% CI: 0.54–0.92), and 0.85 (95% CI: 0.73–0.98), respectively. Neither COVID-19 severity nor susceptibility was associated with white blood cell traits in our reverse MR results. Genetically predicted high basophil count, basophil percentage of white blood cells, and myeloid white blood cell count are associated with a lower risk of developing severe COVID-19. Individuals with a lower genetic capacity for basophils are likely at risk, while enhancing the production of basophils may be an effective therapeutic strategy.


2012 ◽  
Vol 215 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Georg Homuth ◽  
Alexander Teumer ◽  
Uwe Völker ◽  
Matthias Nauck

The metabolome, defined as the reflection of metabolic dynamics derived from parameters measured primarily in easily accessible body fluids such as serum, plasma, and urine, can be considered as the omics data pool that is closest to the phenotype because it integrates genetic influences as well as nongenetic factors. Metabolic traits can be related to genetic polymorphisms in genome-wide association studies, enabling the identification of underlying genetic factors, as well as to specific phenotypes, resulting in the identification of metabolome signatures primarily caused by nongenetic factors. Similarly, correlation of metabolome data with transcriptional or/and proteome profiles of blood cells also produces valuable data, by revealing associations between metabolic changes and mRNA and protein levels. In the last years, the progress in correlating genetic variation and metabolome profiles was most impressive. This review will therefore try to summarize the most important of these studies and give an outlook on future developments.


2018 ◽  
Vol 39 (2) ◽  
pp. 875 ◽  
Author(s):  
Herica Makino ◽  
Daphine Ariadne Jesus de Paula ◽  
Valéria Regia Franco Sousa ◽  
Adriane Jorge Mendonça ◽  
Valéria Dutra ◽  
...  

The aim of this research was to investigate natural hemoplasma infection in cats treated at the Veterinary Hospital of the Federal University of Mato Grosso, and the factors associated with infection. Blood samples from 151 cats of different sexes, breeds, and ages were analyzed by PCR and blood count. The overall occurrence of hemoplasma was 25.8%. Mycoplasma haemofelis (Mhf), ‘Candidatus Mycoplasma haemominutum (CMhm)’, and ‘Candidatus Mycoplasma turicensis’ (CMt) were observed in 15.2%, 14.6% and 2.6% of cats, respectively. In 6.6 % of cases, co-infection was observed. Male felines or mixed breed cats were associated with infection by CMhm (P = 0.02 and 0.04, respectively). The data obtained demonstrated an occurrence of 25.8% for hemoplasma infection in felines coming from clinical care in the city of Cuiabá, where males were at higher risk of acquiring the infection by these agents, in addition to a higher risk for CMhm in felines with no specific breed.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Jiamei Liu ◽  
Cheng Xu ◽  
Weifeng Yang ◽  
Yayun Shu ◽  
Weiwei Zheng ◽  
...  

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.


2020 ◽  
Author(s):  
Yu Xu ◽  
Dragana Vuckovic ◽  
Scott C Ritchie ◽  
Parsa Akbari ◽  
Tao Jiang ◽  
...  

AbstractPolygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.


2019 ◽  
Vol 35 (24) ◽  
pp. 5182-5190 ◽  
Author(s):  
Luis G Leal ◽  
Alessia David ◽  
Marjo-Riita Jarvelin ◽  
Sylvain Sebert ◽  
Minna Männikkö ◽  
...  

Abstract Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document