Multifactorial disorders and polygenic risk scores: predicting common diseases and the possibility of adverse selection in life and protection insurance

Abstract During the past decade, genetics research has allowed scientists and clinicians to explore the human genome in detail and reveal many thousands of common genetic variants associated with disease. Genetic risk scores, known as polygenic risk scores (PRSs), aggregate risk information from the most important genetic variants into a single score that describes an individual’s genetic predisposition to a given disease. This article reviews recent developments in the predictive utility of PRSs in relation to a person’s susceptibility to breast cancer and coronary artery disease. Prognostic models for these disorders are built using data from the UK Biobank, controlling for typical clinical and underwriting risk factors. Furthermore, we explore the possibility of adverse selection where genetic information about multifactorial disorders is available for insurance purchasers but not for underwriters. We demonstrate that prediction of multifactorial diseases, using PRSs, provides population risk information additional to that captured by normal underwriting risk factors. This research using the UK Biobank is in the public interest as it contributes to our understanding of predicting risk of disease in the population. Further research is imperative to understand how PRSs could cause adverse selection if consumers use this information to alter their insurance purchasing behaviour.

Download Full-text

Integration of rare large-effect expression variants improves polygenic risk prediction

10.1101/2020.12.02.20242990 ◽

2020 ◽

Author(s):

Craig Smail ◽

Nicole M. Ferraro ◽

Matthew G. Durrant ◽

Abhiram S. Rao ◽

Matthew Aguirre ◽

...

Keyword(s):

Genetic Variants ◽

Rare Variants ◽

Complex Trait ◽

Risk Scores ◽

Multiple Traits ◽

Polygenic Risk ◽

Common Genetic Variants ◽

Using Data ◽

The Uk ◽

The Impact

SummaryPolygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.

Download Full-text

Significant Sparse Polygenic Risk Scores across 428 traits in UK Biobank

10.1101/2021.09.02.21262942 ◽

2021 ◽

Author(s):

Yosuke Tanigawa ◽

Junyang Qian ◽

Guhan Ram Venkataraman ◽

Johanne M. Justesen ◽

Ruilin Li ◽

...

Keyword(s):

Genetic Variants ◽

Quantitative Traits ◽

Predictive Performance ◽

Risk Scores ◽

Polygenic Risk Score ◽

Uk Biobank ◽

Polygenic Risk ◽

Systematic Assessment ◽

Phenotype Data ◽

The Uk

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,600 traits using genetic and phenotype data in the UK Biobank. We report 428 sparse PRS models with significant (p < 2.5e-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, and the genotype principal components. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance in quantitative traits (Spearman's ρ = 0.54, p = 1.4e-15), but not in binary traits (ρ = 0.059, p = 0.35). The sparse PRS model trained on European individuals showed limited transferability when evaluated on individuals from non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).

Download Full-text

Common genetic variants and health outcomes appear geographically structured in the UK Biobank sample: Old concerns returning and their implications

10.1101/294876 ◽

2018 ◽

Cited By ~ 12

Author(s):

Simon Haworth ◽

Ruth Mitchell ◽

Laura Corbin ◽

Kaitlin H Wade ◽

Tom Dudding ◽

...

Keyword(s):

Genetic Variants ◽

Complex Traits ◽

Large Scale ◽

Genetic Data ◽

Population Based ◽

Risk Scores ◽

Phenotypic Variance ◽

Uk Biobank ◽

Common Genetic Variants ◽

The Uk

Introductory paragraphThe inclusion of genetic data in large studies has enabled the discovery of genetic contributions to complex traits and their application in applied analyses including those using genetic risk scores (GRS) for the prediction of phenotypic variance. If genotypes show structure by location and coincident structure exists for the trait of interest, analyses can be biased. Having illustrated structure in an apparently homogeneous collection, we aimed to a) test for geographical stratification of genotypes in UK Biobank and b) assess whether stratification might induce bias in genetic association analysis.We found that single genetic variants are associated with birth location within UK Biobank and that geographic structure in genetic data could not be accounted for using routine adjustment for study centre and principal components (PCs) derived from genotype data. We found that GRS for complex traits do appear geographically structured and analysis using GRS can yield biased associations. We discuss the likely origins of these observations and potential implications for analysis within large-scale population based genetic studies.

Download Full-text

Association of accelerometer-derived sleep measures with lifetime psychiatric diagnoses: A cross-sectional study of 89,205 participants from the UK Biobank

PLoS Medicine ◽

10.1371/journal.pmed.1003782 ◽

2021 ◽

Vol 18 (10) ◽

pp. e1003782

Author(s):

Michael Wainberg ◽

Samuel E. Jones ◽

Lindsay Melhuish Beaupre ◽

Sean L. Hill ◽

Daniel Felsky ◽

...

Keyword(s):

Bipolar Disorder ◽

Sleep Duration ◽

Sleep Efficiency ◽

Risk Scores ◽

Psychiatric Diagnoses ◽

Uk Biobank ◽

Major Depressive ◽

Cross Sectional ◽

Polygenic Risk ◽

The Uk

Background Sleep problems are both symptoms of and modifiable risk factors for many psychiatric disorders. Wrist-worn accelerometers enable objective measurement of sleep at scale. Here, we aimed to examine the association of accelerometer-derived sleep measures with psychiatric diagnoses and polygenic risk scores in a large community-based cohort. Methods and findings In this post hoc cross-sectional analysis of the UK Biobank cohort, 10 interpretable sleep measures—bedtime, wake-up time, sleep duration, wake after sleep onset, sleep efficiency, number of awakenings, duration of longest sleep bout, number of naps, and variability in bedtime and sleep duration—were derived from 7-day accelerometry recordings across 89,205 participants (aged 43 to 79, 56% female, 97% self-reported white) taken between 2013 and 2015. These measures were examined for association with lifetime inpatient diagnoses of major depressive disorder, anxiety disorders, bipolar disorder/mania, and schizophrenia spectrum disorders from any time before the date of accelerometry, as well as polygenic risk scores for major depression, bipolar disorder, and schizophrenia. Covariates consisted of age and season at the time of the accelerometry recording, sex, Townsend deprivation index (an indicator of socioeconomic status), and the top 10 genotype principal components. We found that sleep pattern differences were ubiquitous across diagnoses: each diagnosis was associated with a median of 8.5 of the 10 accelerometer-derived sleep measures, with measures of sleep quality (for instance, sleep efficiency) generally more affected than mere sleep duration. Effect sizes were generally small: for instance, the largest magnitude effect size across the 4 diagnoses was β = −0.11 (95% confidence interval −0.13 to −0.10, p = 3 × 10−56, FDR = 6 × 10−55) for the association between lifetime inpatient major depressive disorder diagnosis and sleep efficiency. Associations largely replicated across ancestries and sexes, and accelerometry-derived measures were concordant with self-reported sleep properties. Limitations include the use of accelerometer-based sleep measurement and the time lag between psychiatric diagnoses and accelerometry. Conclusions In this study, we observed that sleep pattern differences are a transdiagnostic feature of individuals with lifetime mental illness, suggesting that they should be considered regardless of diagnosis. Accelerometry provides a scalable way to objectively measure sleep properties in psychiatric clinical research and practice, even across tens of thousands of individuals.

Download Full-text

Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb

PLoS Genetics ◽

10.1371/journal.pgen.1008202 ◽

2019 ◽

Vol 15 (6) ◽

pp. e1008202 ◽

Cited By ~ 11

Author(s):

Lars G. Fritsche ◽

Lauren J. Beesley ◽

Peter VandeHaar ◽

Robert B. Peng ◽

Maxwell Salvatore ◽

...

Keyword(s):

Skin Cancer ◽

Risk Scores ◽

Uk Biobank ◽

Polygenic Risk ◽

The Uk

Download Full-text

Performance of polygenic risk scores for cancer prediction in an academic biobank.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.1528 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. 1528-1528

Author(s):

Heena Desai ◽

Anh Le ◽

Ryan Hausler ◽

Shefali Verma ◽

Anurag Verma ◽

...

Keyword(s):

Risk Score ◽

Genetic Variants ◽

Association Studies ◽

Risk Scores ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

European Americans ◽

Genome Wide ◽

Common Genetic Variants

1528 Background: The discovery of rare genetic variants associated with cancer have a tremendous impact on reducing cancer morbidity and mortality when identified; however, rare variants are found in less than 5% of cancer patients. Genome wide association studies (GWAS) have identified hundreds of common genetic variants significantly associated with a number of cancers, but the clinical utility of individual variants or a polygenic risk score (PRS) derived from multiple variants is still unclear. Methods: We tested the ability of polygenic risk score (PRS) models developed from genome-wide significant variants to differentiate cases versus controls in the Penn Medicine Biobank. Cases for 15 different cancers and cancer-free controls were identified using electronic health record billing codes for 11,524 European American and 5,994 African American individuals from the Penn Medicine Biobank. Results: The discriminatory ability of the 15 PRS models to distinguish their respective cancer cases versus controls ranged from 0.68-0.79 in European Americans and 0.74-0.93 in African Americans. Seven of the 15 cancer PRS trended towards an association with their cancer at a p<0.05 (Table), and PRS for prostate, thyroid and melanoma were significantly associated with their cancers at a bonferroni corrected p<0.003 with OR 1.3-1.6 in European Americans. Conclusions: Our data demonstrate that common variants with significant associations from GWAS studies can distinguish cancer cases versus controls for some cancers in an unselected biobank population. Given the small effects, future studies are needed to determine how best to incorporate PRS with other risk factors in the precision prediction of cancer risk. [Table: see text]

Download Full-text

A machine-learning heuristic to improve gene score prediction of polygenic traits

10.1101/107409 ◽

2017 ◽

Author(s):

Guillaume Paré ◽

Shihong Mao ◽

Wei Q. Deng

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Risk Scores ◽

Uk Biobank ◽

Polygenic Risk ◽

Learning Techniques ◽

Diabetes Status ◽

Polygenic Traits ◽

The Uk ◽

Prediction Problems

AbstractMachine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction R2 of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (N=130K; 1.98M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (p<2.2x10−16) and BMI (p<1.57x10−4), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (N=8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.

Download Full-text

Combined Utility of 25 Disease and Risk Factor Polygenic Risk Scores for Stratifying Risk of All-Cause Mortality

10.1101/2020.03.13.20035527 ◽

2020 ◽

Author(s):

Allison Meisner ◽

Prosenjit Kundu ◽

Yan Dora Zhang ◽

Lauren V. Lan ◽

Sungwon Kim ◽

...

Keyword(s):

Risk Factors ◽

Mortality Risk ◽

Association Studies ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Polygenic Risk ◽

Hazard Ratios ◽

Risk Of Mortality ◽

All Cause Mortality ◽

The Uk

ABSTRACTWhile genome-wide association studies have identified susceptibility variants for numerous traits, their combined utility for predicting broad measures of health, such as mortality, remains poorly understood. We used data from the UK Biobank to combine polygenic risk scores (PRS) for 13 diseases and 12 mortality risk factors into sex-specific composite PRS (cPRS). These cPRS were moderately associated with all-cause mortality in independent data: the estimated hazard ratios per standard deviation were 1.10 (95% confidence interval: 1.05, 1.16) and 1.15 (1.10, 1.19) for women and men, respectively. Differences in life expectancy between the top and bottom 5% of the cPRS were estimated to be 4.79 (1.76, 7.81) years and 6.75 (4.16, 9.35) years for women and men, respectively. These associations were substantially attenuated after adjusting for non-genetic mortality risk factors measured at study entry. The cPRS may be useful in counseling younger individuals at higher genetic risk of mortality on modification of non-genetic factors.

Download Full-text

Combining Clinical and Polygenic Risk Improves Stroke Prediction Among Individuals with Atrial Fibrillation

Circulation Genomic and Precision Medicine ◽

10.1161/circgen.120.003168 ◽

2021 ◽

Author(s):

Jack W. O'Sullivan ◽

Anna Shcherbina ◽

Johanne M. Justesen ◽

Mintu Turakhia ◽

Marco Perez ◽

...

Keyword(s):

Risk Factors ◽

Atrial Fibrillation ◽

Ischemic Stroke ◽

Predictive Ability ◽

Clinical Risk Factors ◽

Risk Scores ◽

Polygenic Risk ◽

Clinical Risk ◽

Increased Risk ◽

The Uk

Background - Atrial fibrillation (AF) is associated with a five-fold increased risk of ischemic stroke. A portion of this risk is heritable, however current risk stratification tools (CHA 2 DS 2 -VASc) don't include family history or genetic risk. We hypothesized that we could improve ischemic stroke prediction in patients with AF by incorporating polygenic risk scores (PRS). Methods - Using data from the largest available GWAS in Europeans, we combined over half a million genetic variants to construct a PRS to predict ischemic stroke in patients with AF. We externally validated this PRS in independent data from the UK Biobank, both independently and integrated with clinical risk factors. The integrated PRS and clinical risk factors risk tool had the greatest predictive ability. Results - Compared with the currently recommended risk tool (CHA 2 DS 2 -VASc), the integrated tool significantly improved net reclassification (NRI: 2.3% (95%CI: 1.3% to 3.0%)), and fit (χ2 P =0.002). Using this improved tool, >115,000 people with AF would have improved risk classification in the US. Independently, PRS was a significant predictor of ischemic stroke in patients with AF prospectively (Hazard Ratio: 1.13 per 1 SD (95%CI: 1.06 to 1.23)). Lastly, polygenic risk scores were uncorrelated with clinical risk factors (Pearson's correlation coefficient: -0.018). Conclusions - In patients with AF, there appears to be a significant association between PRS and risk of ischemic stroke. The greatest predictive ability was found with the integration of PRS and clinical risk factors, however the prediction of stroke remains challenging.

Download Full-text

Abstract P879: Differences in Statistical Performance of Polygenic Risk Scores for Cardiovascular Disease Across Different Race/Ethnicities

Stroke ◽

10.1161/str.52.suppl_1.p879 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Julian N Acosta ◽

Cameron Both ◽

Natalia Szejko ◽

Stacy Brown ◽

Kevin N Sheth ◽

...

Keyword(s):

Cardiovascular Disease ◽

Logistic Regression ◽

Genetic Risk ◽

Regression Models ◽

Risk Scores ◽

Uk Biobank ◽

Polygenic Risk ◽

Logistic Regression Models ◽

The Uk ◽

Significant Health

Introduction: Genome-wide association studies have identified numerous genetic risk variants for stroke and myocardial infarction (MI) in Europeans. However, the limited applicability of these results to non-Europeans due to racial/ethnic differences in the genetic architecture of cardiovascular disease (CVD), coupled with the limited availability of genomic data in non-Europeans, may create significant health disparities now that genomic-based precision medicine is a reality. We tested the hypothesis that the performance of polygenic risk scores (PRS) for CVD differ in Europeans versus non-Europeans. Methods: We conducted a nested study within the UK Biobank, a prospective, population-based study that enrolled ~500,000 participants across the UK. For this study, we identified self-reported black participants and randomly matched them 1:1 by age and sex with white participants. We created a PRS using previously discovered loci for stroke and MI. We then tested whether this PRS representing the aggregate polygenic susceptibility to CVD yielded similar precision in black versus white participants in logistic regression models. Results: Of the 502,536 participants enrolled in the UK Biobank, 8,061 were self-reported blacks, with 7,644 having available data for our analyses. We randomly matched these participants with white individuals, leading to a total sample size of 15,288 (mean age 51.9 [SD 8.1], female 8,722 [57%]). The total number of events was 741 overall, with 363 happening in blacks and 378 happening in whites. In logistic regression models including age, sex, and 5 principal components, the statistical precision (e.g. narrower confidence intervals) for the PRS was substantially higher for whites (OR 1.22, 95%CI 1.08 - 1.37; p<0.0001) compared to blacks (OR 1.24, 95%CI 1.05-1.47; p=0.01). Secondary analyses using genetically-determined ancestry yielded similar results. Conclusion: Because CVD-related PRSs are derived mainly using genetic risk factors identified in populations of European ancestry, their statistical performance is lower in non-European populations. This asymmetry can lead to significant health disparities now that these tools are being evaluated in multiple precision medicine approaches.

Download Full-text