scholarly journals Benchmarking the Accuracy of Polygenic Risk Scores and their Generative Methods

Author(s):  
Scott Kulm ◽  
Jason Mezey ◽  
Olivier Elemento

ABSTRACTThe estimate of an individual’s genetic susceptibility to a disease can provide critical information when setting screening schedules, prescribing medication and making lifestyle change recommendations. The polygenic risk score is the predominant susceptibility metric, with many methods available to describe its construction. However, these methods have never been comprehensively compared or the predictive value of their outputs systematically assessed, leaving the clinical utility of polygenic risk scores uncertain. This study aims to resolve this uncertainty by deeply comparing the maximum possible, currently available, 15 polygenic risk scoring methods to 25 well-powered, UK Biobank derived, disease phenotypes. Our results show that simpler methods, which employ heuristics, bested complex, methods, which predominately model linkage disequilibrium. Accuracy was assessed with AUC improvement, the difference in area under the receiver operating curve generated by two logistic regression models, both of which share the covariates of age, sex, and principal components, while the second model also contains the polygenic risk score. To better determine the maximal utility of polygenic risk scores, straightforward score ensembles, which bested all methods across all traits in the training data-set, were evaluated in the withheld data-set. The score ensembles revealed that the accuracy gained by considering a polygenic risk score varied greatly, with AUC improvement greater than 0.05 for 9 traits. Many additional analyses revealed widespread pleiotropy across scores, large variations between assessment statistics, peculiar patterns amongst phenotype definitions, and wide ranges in the optimal number of variants used for scoring. If these many variable aspects of score creation can be well controlled and documented, simple methods can easily generate polygenic risk score that well predict an individual’s future liability of certain diseases.

2020 ◽  
pp. jmedgenet-2020-107286
Author(s):  
Jun Wei ◽  
Zhuqing Shi ◽  
Rong Na ◽  
W Kyle Resurreccion ◽  
Chi-Hsiung Wang ◽  
...  

BackgroundSNP-based polygenic risk scores have recently been adopted in the clinic for risk assessment of some common diseases. Their validity is supported by a consistent trend between their percentile rank and disease risk in populations. However, for clinical use at the individual level, the reliability of score values is necessary considering they are directly used to calculate remaining lifetime risk.ObjectivesWe assessed the reliability of polygenic score values to estimate prostate cancer (PCa), breast cancer (BCa) and colorectal cancer (CRC) risk in three incident cohorts from the UK Biobank (n>500 000).MethodsCancer-specific Genetic Risk Score (GRS), a well-established population-standardised polygenic risk score, was calculated.ResultsA systematic bias was found between estimated risks (GRS values) and observed risks; β (95% CI) was 0.67 (0.58–0.76), 0.74 (0.65–0.84) and 0.82 (0.75–0.89), respectively, for PCa, BCa and CRC, all significantly lower than 1.00 (perfect calibration), p<0.001. After applying a correction factor derived from a training data set, the β for corrected GRS values in an independent testing data set were 1.09 (1.05–1.13), 1.00 (0.88–1.12) and 1.08 (0.96–1.21), respectively, for PCa, BCa and CRC.ConclusionAssessing the calibration of polygenic risk scores is necessary and feasible to ensure their reliability prior to clinical implementation.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 1528-1528
Author(s):  
Heena Desai ◽  
Anh Le ◽  
Ryan Hausler ◽  
Shefali Verma ◽  
Anurag Verma ◽  
...  

1528 Background: The discovery of rare genetic variants associated with cancer have a tremendous impact on reducing cancer morbidity and mortality when identified; however, rare variants are found in less than 5% of cancer patients. Genome wide association studies (GWAS) have identified hundreds of common genetic variants significantly associated with a number of cancers, but the clinical utility of individual variants or a polygenic risk score (PRS) derived from multiple variants is still unclear. Methods: We tested the ability of polygenic risk score (PRS) models developed from genome-wide significant variants to differentiate cases versus controls in the Penn Medicine Biobank. Cases for 15 different cancers and cancer-free controls were identified using electronic health record billing codes for 11,524 European American and 5,994 African American individuals from the Penn Medicine Biobank. Results: The discriminatory ability of the 15 PRS models to distinguish their respective cancer cases versus controls ranged from 0.68-0.79 in European Americans and 0.74-0.93 in African Americans. Seven of the 15 cancer PRS trended towards an association with their cancer at a p<0.05 (Table), and PRS for prostate, thyroid and melanoma were significantly associated with their cancers at a bonferroni corrected p<0.003 with OR 1.3-1.6 in European Americans. Conclusions: Our data demonstrate that common variants with significant associations from GWAS studies can distinguish cancer cases versus controls for some cancers in an unselected biobank population. Given the small effects, future studies are needed to determine how best to incorporate PRS with other risk factors in the precision prediction of cancer risk. [Table: see text]


2021 ◽  
Author(s):  
Yixuan He ◽  
Chirag M Lakhani ◽  
Danielle Rasooly ◽  
Arjun K Manrai ◽  
Ioanna Tzoulaki ◽  
...  

OBJECTIVE: <p>Establish a polyexposure score for T2D incorporating 12 non-genetic exposure and examine whether a polyexposure and/or a polygenic risk score improves diabetes prediction beyond traditional clinical risk factors.</p> <h2><a></a>RESEARCH DESIGN AND METHODS:</h2> <p>We identified 356,621 unrelated individuals from the UK Biobank of white British ancestry with no prior diagnosis of T2D and normal HbA1c levels. Using self-reported and hospital admission information, we deployed a machine learning procedure to select the most predictive and robust factors out of 111 non-genetically ascertained exposure and lifestyle variables for the polyexposure risk score (PXS) in prospective T2D. We computed the clinical risk score (CRS) and polygenic risk score (PGS) by taking a weighted sum of eight established clinical risk factors and over six million SNPs, respectively.</p> <h2><a></a>RESULTS:</h2> <p>In the study population, 7,513 had incident T2D. The C-statistics for the PGS, PXS, and CRS models were 0.709, 0.762, and 0.839, respectively. Hazard ratios (HR) associated with risk score values in the top 10% percentile versus the remaining population is 2.00, 5.90, and 9.97 for PGS, PXS, and CRS respectively. Addition of PGS and PXS to CRS improves T2D classification accuracy with a continuous net reclassification index of 15.2% and 30.1% for cases, respectively, and 7.3% and 16.9% for controls, respectively. </p> <h2><a></a>CONCLUSIONS:</h2> <p>For T2D, the PXS provides modest incremental predictive value over established clinical risk factors. The concept of PXS merits further consideration in T2D risk stratification and is likely to be useful in other chronic disease risk prediction models.</p>


2018 ◽  
Author(s):  
Alexandra C. Gillett ◽  
Evangelos Vassos ◽  
Cathryn M. Lewis

1.Abstract1.1.ObjectiveStratified medicine requires models of disease risk incorporating genetic and environmental factors. These may combine estimates from different studies and models must be easily updatable when new estimates become available. The logit scale is often used in genetic and environmental association studies however the liability scale is used for polygenic risk scores and measures of heritability, but combining parameters across studies requires a common scale for the estimates.1.2.MethodsWe present equations to approximate the relationship between univariate effect size estimates on the logit scale and the liability scale, allowing model parameters to be translated between scales.1.3.ResultsThese equations are used to build a risk score on the liability scale, using effect size estimates originally estimated on the logit scale. Such a score can then be used in a joint effects model to estimate the risk of disease, and this is demonstrated for schizophrenia using a polygenic risk score and environmental risk factors.1.4.ConclusionThis straightforward method allows conversion of model parameters between the logit and liability scales, and may be a key tool to integrate risk estimates into a comprehensive risk model, particularly for joint models with environmental and genetic risk factors.


2020 ◽  
Vol 117 (11) ◽  
pp. 5997-6002 ◽  
Author(s):  
Sandya Liyanarachchi ◽  
Julius Gudmundsson ◽  
Egil Ferkingstad ◽  
Huiling He ◽  
Jon G. Jonasson ◽  
...  

Genome-wide association studies (GWASs) have identified at least 10 single-nucleotide polymorphisms (SNPs) associated with papillary thyroid cancer (PTC) risk. Most of these SNPs are common variants with small to moderate effect sizes. Here we assessed the combined genetic effects of these variants on PTC risk by using summarized GWAS results to build polygenic risk score (PRS) models in three PTC study groups from Ohio (1,544 patients and 1,593 controls), Iceland (723 patients and 129,556 controls), and the United Kingdom (534 patients and 407,945 controls). A PRS based on the 10 established PTC SNPs showed a stronger predictive power compared with the clinical factors model, with a minimum increase of area under the receiver-operating curve of 5.4 percentage points (P≤ 1.0 × 10−9). Adding an extended PRS based on 592,475 common variants did not significantly improve the prediction power compared with the 10-SNP model, suggesting that most of the remaining undiscovered genetic risk in thyroid cancer is due to rare, moderate- to high-penetrance variants rather than to common low-penetrance variants. Based on the 10-SNP PRS, individuals in the top decile group of PRSs have a close to sevenfold greater risk (95% CI, 5.4–8.8) compared with the bottom decile group. In conclusion, PRSs based on a small number of common germline variants emphasize the importance of heritable low-penetrance markers in PTC.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1859
Author(s):  
Sebastian Koch ◽  
Björn-Hergen Laabs ◽  
Meike Kasten ◽  
Eva-Juliane Vollstedt ◽  
Jos Becktepe ◽  
...  

Idiopathic Parkinson’s disease (PD) is a complex multifactorial disorder caused by the interplay of both genetic and non-genetic risk factors. Polygenic risk scores (PRSs) are one way to aggregate the effects of a large number of genetic variants upon the risk for a disease like PD in a single quantity. However, reassessment of the performance of a given PRS in independent data sets is a precondition for establishing the PRS as a valid tool to this end. We studied a previously proposed PRS for PD in a separate genetic data set, comprising 1914 PD cases and 4464 controls, and were able to replicate its ability to differentiate between cases and controls. We also assessed theoretically the prognostic value of the PD-PRS, i.e., its ability to predict the development of PD in later life for healthy individuals. As it turned out, the PD-PRS alone can be expected to perform poorly in this regard. Therefore, we conclude that the PD-PRS could serve as an important research tool, but that meaningful PRS-based prognosis of PD at an individual level is not feasible.


2021 ◽  
Author(s):  
Yixuan He ◽  
Chirag M Lakhani ◽  
Danielle Rasooly ◽  
Arjun K Manrai ◽  
Ioanna Tzoulaki ◽  
...  

OBJECTIVE: <p>Establish a polyexposure score for T2D incorporating 12 non-genetic exposure and examine whether a polyexposure and/or a polygenic risk score improves diabetes prediction beyond traditional clinical risk factors.</p> <h2><a></a>RESEARCH DESIGN AND METHODS:</h2> <p>We identified 356,621 unrelated individuals from the UK Biobank of white British ancestry with no prior diagnosis of T2D and normal HbA1c levels. Using self-reported and hospital admission information, we deployed a machine learning procedure to select the most predictive and robust factors out of 111 non-genetically ascertained exposure and lifestyle variables for the polyexposure risk score (PXS) in prospective T2D. We computed the clinical risk score (CRS) and polygenic risk score (PGS) by taking a weighted sum of eight established clinical risk factors and over six million SNPs, respectively.</p> <h2><a></a>RESULTS:</h2> <p>In the study population, 7,513 had incident T2D. The C-statistics for the PGS, PXS, and CRS models were 0.709, 0.762, and 0.839, respectively. Hazard ratios (HR) associated with risk score values in the top 10% percentile versus the remaining population is 2.00, 5.90, and 9.97 for PGS, PXS, and CRS respectively. Addition of PGS and PXS to CRS improves T2D classification accuracy with a continuous net reclassification index of 15.2% and 30.1% for cases, respectively, and 7.3% and 16.9% for controls, respectively. </p> <h2><a></a>CONCLUSIONS:</h2> <p>For T2D, the PXS provides modest incremental predictive value over established clinical risk factors. The concept of PXS merits further consideration in T2D risk stratification and is likely to be useful in other chronic disease risk prediction models.</p>


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Itziar de Rojas ◽  
Sonia Moreno-Grau ◽  
Niccolo Tesi ◽  
Benjamin Grenier-Boley ◽  
Victor Andrade ◽  
...  

AbstractGenetic discoveries of Alzheimer’s disease are the drivers of our understanding, and together with polygenetic risk stratification can contribute towards planning of feasible and efficient preventive and curative clinical trials. We first perform a large genetic association study by merging all available case-control datasets and by-proxy study results (discovery n = 409,435 and validation size n = 58,190). Here, we add six variants associated with Alzheimer’s disease risk (near APP, CHRNE, PRKD3/NDUFAF7, PLCG2 and two exonic variants in the SHARPIN gene). Assessment of the polygenic risk score and stratifying by APOE reveal a 4 to 5.5 years difference in median age at onset of Alzheimer’s disease patients in APOE ɛ4 carriers. Because of this study, the underlying mechanisms of APP can be studied to refine the amyloid cascade and the polygenic risk score provides a tool to select individuals at high risk of Alzheimer’s disease.


2021 ◽  
Author(s):  
Madeline Page ◽  
Elizabeth Vance ◽  
Matthew Cloward ◽  
Ed Ringger ◽  
Louisa Dayton ◽  
...  

Abstract Introduction: Genome-wide association (GWA) studies identify correlation between genetic variants and phenotypes. GWA findings can be used to calculate polygenic risk scores, which represent the aggregate genetic risk across all associated loci. Methods: We developed a centralized polygenic risk score calculator containing over 2,300 GWA studies from the NHGRI-EBI GWAS Catalog. Polygenic risk scores are calculated from user-uploaded data using various user-defined parameters across any disease(s) or studies. Results: The Polygenic Risk Score Knowledge Base (https://prs.byu.edu) and command-line interface facilitate user-specific polygenic risk score calculations. We report study-specific polygenic risk scores across the U.K. Biobank, 1000 Genomes, and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and identify potentially confounding genetic risk factors in ADNI.Discussion: We introduce the first streamlined analysis tool and web interface to calculate and contextualize polygenic risk scores across various studies. We anticipate that the PRSKB will facilitate a wider adaptation and innovative use of polygenic risk scores in disease research. Data Availability: This project is documented online at https://polyriskscore.readthedocs.io/en/latest/, and all programs are publicly available at https://github.com/kauwelab/PolyRiskScore. A web interface is also available at https://prs.byu.edu/.


Author(s):  
Louis Lello ◽  
Timothy G. Raben ◽  
Stephen D.H. Hsu

AbstractWe test a variety of polygenic predictors using tens of thousands of genetic siblings for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in within-family designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (<84 percentile) and 1 sibling with high PRS score (top few percentiles), the predictors identify the affected sibling about 70-90% of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. For height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.


Sign in / Sign up

Export Citation Format

Share Document