scholarly journals Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 991
Author(s):  
Erik Widen ◽  
Timothy G. Raben ◽  
Louis Lello ◽  
Stephen D. H. Hsu

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.

2021 ◽  
Author(s):  
Erik Widen ◽  
Timothy G. Raben ◽  
Louis Lello ◽  
Stephen D.H. Hsu

We use UK Biobank data to train predictors for 48 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, ... from SNP genotype. For example, our predictor correlates ∼ 0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information). Individuals who are at high risk (e.g., odds ratio of > 5x population average) can be identified for conditions such as coronary artery disease (AUC ∼ 0.75), diabetes (AUC ∼ 0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼ 10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: (risk score | SNPs)) for common diseases to the risk predictors which result from the concatenation of learned functions (risk score | biomarkers) and (biomarker | SNPs).


2021 ◽  
Author(s):  
Melis Anatürk ◽  
Raihaan Patel ◽  
Georgios Georgiopoulos ◽  
Danielle Newby ◽  
Anya Topiwala ◽  
...  

INTRODUCTION: Current prognostic models of dementia have had limited success in consistently identifying at-risk individuals. We aimed to develop and validate a novel dementia risk score (DRS) using the UK Biobank cohort.METHODS: After randomly dividing the sample into a training (n=166,487, 80%) and test set (n=41,621, 20%), logistic LASSO regression and standard logistic regression were used to develop the UKB-DRS.RESULTS: The score consisted of age, sex, education, apolipoprotein E4 genotype, a history of diabetes, stroke, and depression, and a family history of dementia. The UKB-DRS had good-to-strong discrimination accuracy in the UKB hold-out sample (AUC [95%CI]=0.79 [0.77, 0.82]) and in an external dataset (Whitehall II cohort, AUC [95%CI]=0.83 [0.79,0.87]). The UKB-DRS also significantly outperformed four published risk scores (i.e., Australian National University Alzheimer’s Disease Risk Index (ANU-ADRI), Cardiovascular Risk Factors, Aging, and Dementia score (CAIDE), Dementia Risk Score (DRS), and the Framingham Cardiovascular Risk Score (FRS) across both test sets.CONCLUSION: The UKB-DRS represents a novel easy-to-use tool that could be used for routine care or targeted selection of at-risk individuals into clinical trials.


Author(s):  
Haoyu Wu ◽  
Jian’an Luan ◽  
Vincenzo Forgetta ◽  
James C. Engert ◽  
George Thanassoulis ◽  
...  

Background: Current lipid guidelines suggest measurement of Lp(a) (lipoprotein[a]) and ApoB (apolipoprotein B) for atherosclerotic cardiovascular disease risk assessment. Polygenic risk scores (PRSs) for Lp(a) and ApoB may identify individuals unlikely to have elevated Lp(a) or ApoB and thus reduce such suggested testing. Methods: PRSs were developed using LASSO regression among 273 222 and 356 958 UK Biobank participants of white British ancestry for Lp(a) and ApoB, respectively, and validated in separate sets of 60 771 UK Biobank and 15 050 European Prospective Investigation into Cancer and Nutrition-Norfolk participants. We then assessed the proportion of participants who, based on these PRSs, were unlikely to benefit from Lp(a) or ApoB measurements, according to current lipid guidelines. Results: In the UK Biobank and European Prospective Investigation into Cancer and Nutrition-Norfolk cohorts, the area under the receiver operating curve for the PRS-predicted Lp(a) and ApoB to identify individuals with elevated Lp(a) and ApoB was at least 0.91 (95% CI, 0.90–0.92) and 0.74 (95% CI, 0.73–0.75), respectively. The Lp(a) PRS and measured Lp(a) showed comparable association with atherosclerotic cardiovascular disease incidence, whereas the ApoB PRS was in general less predictive of atherosclerotic cardiovascular disease risk than measured ApoB. In the context of the ESC/EAS lipid guidelines, at a 95% sensitivity to identify individuals with elevated Lp(a) and ApoB levels, at least 54% of Lp(a) and 24% of ApoB testing could be reduced by prescreening with a PRS while maintaining a low false-negative rate. Conclusions: A substantial proportion of suggested testing for elevated Lp(a) and a modest proportion of testing for elevated ApoB could potentially be reduced by prescreening individuals with PRSs.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Julian N Acosta ◽  
Cameron Both ◽  
Natalia Szejko ◽  
Stacy Brown ◽  
Nils H Petersen ◽  
...  

Introduction: Blood pressure (BP) is a highly heritable trait with numerous related genetic risk variants identified. While prior studies showed that polygenic susceptibility to hypertension (PSH) is associated with elevated BP, uncontrolled hypertension (UHTN), resistant hypertension (RHTN), and risk of stroke, its role after a cerebrovascular event remains unknown. We tested the hypothesis that PSH leads to higher BP and increased risk of UHTN and RHTN in stroke survivors. Methods: We conducted a nested study within the UK Biobank, including individuals of European ancestry with a prevalent ischemic or hemorrhagic stroke. To model PSH, we created polygenic risk scores (PRS) for systolic, diastolic, and pulse BP using 732 previously discovered loci. We divided the PRS into quintiles and used linear and logistic regression to test whether higher PSH led to higher observed BP as well as increased risk of UHTN (SBP >140 mmHg or DBP >90 mmHg) and RHTN (UHTN despite being on >=3 antihypertensive drugs) in stroke survivors. Results: Of the 502,536 participants enrolled in the UK Biobank, 5,815 (1.2%) with a prevalent stroke at enrollment were included. We found the following results across quintiles 1 through 5 of the systolic BP-based PRS: mean systolic BP 138.4, 140.6, 141.8, 142.9 and 145.8 mmHg (unadjusted p<0.0001, Figure’s left panel); risk of UHTN 46%, 51%, 52%, 56% and 59% (unadjusted p<0.0001, Figure’s center panel); and risk of RHTN 1.9%, 3.8%, 4.7%, 5.8% and 6.7% (unadjusted p<0.0001, Figure’s right panel). We obtained similar results when both evaluating diastolic and pulse BP-based PRSs and using adjusted multivariable models (all p<0.0001). Conclusion: PSH is associated with observed BP and the risk of UHTN and RHTN in stroke survivors. Follow up research should evaluate whether precision medicine strategies based on BP-related genetic information can help identify patients that could benefit from aggressive diagnostic and/or therapeutic interventions.


2020 ◽  
Author(s):  
Michael D.E. Sewell ◽  
Xueyi Shen ◽  
Lorena Jiménez-Sánchez ◽  
Amelia J. Edmondson-Stait ◽  
Claire Green ◽  
...  

AbstractBackgroundMajor depressive disorder (MDD), schizophrenia (SCZ), and bipolar disorder (BD) have both shared and discrete genetic risk factors and abnormalities in blood-based measures of inflammation and blood-brain barrier (BBB) permeability. The relationships between such genetic architectures and blood-based markers are however unclear. We investigated relationships between polygenic risk scores for these disorders and peripheral biomarkers in the UK Biobank cohort.MethodsWe calculated polygenic risk scores (PRS) for samples of n = 367,329 (MDD PRS), n = 366,465 (SCZ PRS), and n = 366,383 (BD PRS) individuals from the UK Biobank cohort. We examined associations between each disorder PRS and 62 blood markers, using two generalized linear regression models: ‘minimally adjusted’ controlling for variables including age and sex, and ‘fully adjusted’ including additional lifestyle covariates such as alcohol and smoking status.Results12/62, 13/62 and 9/62 peripheral markers were significantly associated with MDD, SCZ and BD PRS respectively for both models. Most associations were disorder PRS-specific, including several immune-related markers for MDD and SCZ. We also identified several BBB-permeable marker associations, including vitamin D for all three disorder PRS, IGF-1 and triglycerides for MDD PRS, testosterone for SCZ PRS, and HDL cholesterol for BD PRS.ConclusionsThis study suggests that MDD, SCZ and BD have shared and distinct peripheral markers associated with disorder-specific genetic risk. The results implicate BBB permeability disruptions in all three disorders and inflammatory dysfunction in MDD and SCZ, and enrich our understanding of potential underlying pathophysiological mechanisms in major psychiatric disorders.


2020 ◽  
Author(s):  
Kenneth E. Westerman ◽  
Jenkai Miao ◽  
Daniel I. Chasman ◽  
Jose C. Florez ◽  
Han Chen ◽  
...  

ABSTRACTDiet is a significant modifiable risk factor for type 2 diabetes (T2D), and its effect on disease risk is under partial genetic control. Identification of specific gene-diet interactions (GDIs) influencing risk biomarkers such as glycated hemoglobin (HbA1c) is a critical step towards developing precision nutrition for T2D prevention, but progress has been slow due to limitations in sample size and accuracy of dietary exposure measurement. We leveraged the large sample size of the UK Biobank (UKB) cohort and a diverse group of dietary exposures, including 30 individual dietary traits and 8 empirical dietary patterns, to conduct genome-wide interaction studies in ∼340,000 European-ancestry participants to identify novel GDIs influencing HbA1c. We identified five variant-dietary trait pairs reaching genome-wide significance (p < 5×10−8): two involved dietary patterns (meat pattern with rs147678157 and a fruit &vegetable-based pattern with rs3010439) and three involved individual dietary traits (bread consumption with rs62218803, dried fruit consumption with rs140270534, and milk type [dairy vs. other] with 4:131148078_TAGAA_T). All of these were affected minimally by adjustment for geographical and lifestyle-related confounders, and four of the five variants lacked any genetic main effect that would have allowed their detection in a traditional genome-wide association study for HbA1c. Notably, multiple loci near transient receptor potential subfamily M genes (TRPM2 and TRPM3) were identified as interacting with carbohydrate-containing food groups. Some of these interactions showed nominal replication in non-European ancestry UKB subsets, as well as association using alternative measures of glycemia (fasting glucose and follow-up HbA1c measurements). Our results highlight relevant GDIs influencing HbA1c for future investigation, while reinforcing known challenges in detecting and replicating GDIs.


2020 ◽  
Author(s):  
John E. McGeary ◽  
Chelsie Benca-Bachman ◽  
Victoria Risner ◽  
Christopher G Beevers ◽  
Brandon Gibb ◽  
...  

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Radenkovic ◽  
S.C Chawla ◽  
G Botta ◽  
A Boli ◽  
M.B Banach ◽  
...  

Abstract   The two leading causes of mortality worldwide are cardiovascular disease (CVD) and cancer. The annual total cost of CVD and cancer is an estimated $844.4 billion in the US and is projected to double by 2030. Thus, there has been an increased shift to preventive medicine to improve health outcomes and development of risk scores, which allow early identification of individuals at risk to target personalised interventions and prevent disease. Our aim was to define a Risk Score R(x) which, given the baseline characteristics of a given individual, outputs the relative risk for composite CVD, cancer incidence and all-cause mortality. A non-linear model was used to calculate risk scores based on the participants of the UK Biobank (= 502548). The model used parameters including patient characteristics (age, sex, ethnicity), baseline conditions, lifestyle factors of diet and physical activity, blood pressure, metabolic markers and advanced lipid variables, including ApoA and ApoB and lipoprotein(a), as input. The risk score was defined by normalising the risk function by a fixed value, the average risk of the training set. To fit the non-linear model &gt;400,000 participants were used as training set and &gt;45,000 participants were used as test set for validation. The exponent of risk function was represented as a multilayer neural network. This allowed capturing interdependent behaviour of covariates, training a single model for all outcomes, and preserving heterogeneity of the groups, which is in contrast to CoxPH models which are traditionally used in risk scores and require homogeneous groups. The model was trained over 60 epochs and predictive performance was determined by the C-index with standard errors and confidence intervals estimated with bootstrap sampling. By inputing the variables described, one can obtain personalised hazard ratios for 3 major outcomes of CVD, cancer and all-cause mortality. Therefore, an individual with a risk Score of e.g. 1.5, at any time he/she has 50% more chances than average of experiencing the corresponding event. The proposed model showed the following discrimination, for risk of CVD (C-index = 0.8006), cancer incidence (C-index = 0.6907), and all-cause mortality (C-index = 0.7770) on the validation set. The CVD model is particularly strong (C-index &gt;0.8) and is an improvement on a previous CVD risk prediction model also based on classical risk factors with total cholesterol and HDL-c on the UK Biobank data (C-index = 0.7444) published last year (Welsh et al. 2019). Unlike classically-used CoxPH models, our model considers correlation of variables as shown by the table of the values of correlation in Figure 1. This is an accurate model that is based on the most comprehensive set of patient characteristics and biomarkers, allowing clinicians to identify multiple targets for improvement and practice active preventive cardiology in the era of precision medicine. Figure 1. Correlation of variables in the R(x) Funding Acknowledgement Type of funding source: None


2021 ◽  
pp. 1-9
Author(s):  
Janice L. Atkins ◽  
Luke C. Pilling ◽  
Christine J. Heales ◽  
Sharon Savage ◽  
Chia-Ling Kuo ◽  
...  

Background: Brain iron deposition occurs in dementia. In European ancestry populations, the HFE p.C282Y variant can cause iron overload and hemochromatosis, mostly in homozygous males. Objective: To estimated p.C282Y associations with brain MRI features plus incident dementia diagnoses during follow-up in a large community cohort. Methods: UK Biobank participants with follow-up hospitalization records (mean 10.5 years). MRI in 206 p.C282Y homozygotes versus 23,349 without variants, including T2 * measures (lower values indicating more iron). Results: European ancestry participants included 2,890 p.C282Y homozygotes. Male p.C282Y homozygotes had lower T2 * measures in areas including the putamen, thalamus, and hippocampus, compared to no HFE mutations. Incident dementia was more common in p.C282Y homozygous men (Hazard Ratio HR = 1.83; 95% CI 1.23 to 2.72, p = 0.003), as was delirium. There were no associations in homozygote women or in heterozygotes. Conclusion: Studies are needed of whether early iron reduction prevents or slows related brain pathologies in male HFE p.C282Y homozygotes.


Sign in / Sign up

Export Citation Format

Share Document