Abstract P879: Differences in Statistical Performance of Polygenic Risk Scores for Cardiovascular Disease Across Different Race/Ethnicities

Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Julian N Acosta ◽  
Cameron Both ◽  
Natalia Szejko ◽  
Stacy Brown ◽  
Kevin N Sheth ◽  
...  

Introduction: Genome-wide association studies have identified numerous genetic risk variants for stroke and myocardial infarction (MI) in Europeans. However, the limited applicability of these results to non-Europeans due to racial/ethnic differences in the genetic architecture of cardiovascular disease (CVD), coupled with the limited availability of genomic data in non-Europeans, may create significant health disparities now that genomic-based precision medicine is a reality. We tested the hypothesis that the performance of polygenic risk scores (PRS) for CVD differ in Europeans versus non-Europeans. Methods: We conducted a nested study within the UK Biobank, a prospective, population-based study that enrolled ~500,000 participants across the UK. For this study, we identified self-reported black participants and randomly matched them 1:1 by age and sex with white participants. We created a PRS using previously discovered loci for stroke and MI. We then tested whether this PRS representing the aggregate polygenic susceptibility to CVD yielded similar precision in black versus white participants in logistic regression models. Results: Of the 502,536 participants enrolled in the UK Biobank, 8,061 were self-reported blacks, with 7,644 having available data for our analyses. We randomly matched these participants with white individuals, leading to a total sample size of 15,288 (mean age 51.9 [SD 8.1], female 8,722 [57%]). The total number of events was 741 overall, with 363 happening in blacks and 378 happening in whites. In logistic regression models including age, sex, and 5 principal components, the statistical precision (e.g. narrower confidence intervals) for the PRS was substantially higher for whites (OR 1.22, 95%CI 1.08 - 1.37; p<0.0001) compared to blacks (OR 1.24, 95%CI 1.05-1.47; p=0.01). Secondary analyses using genetically-determined ancestry yielded similar results. Conclusion: Because CVD-related PRSs are derived mainly using genetic risk factors identified in populations of European ancestry, their statistical performance is lower in non-European populations. This asymmetry can lead to significant health disparities now that these tools are being evaluated in multiple precision medicine approaches.

2018 ◽  
Author(s):  
Lars G. Fritsche ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Robert B. Peng ◽  
Maxwell Salvatore ◽  
...  

AbstractPolygenic risk scores (PRS) are designed to serve as a single summary measure, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The construction of a PRS often depends on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. In this paper, we consider several choices for constructing a PRS using summary data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict outcomes derived from electronic health records (EHR). Weexamine the three most common skin cancer subtypes in the USA: basal cellcarcinoma, cutaneous squamous cell carcinoma, and melanoma. The genetic risk profiles of subtypes may consist of both shared and unique elements and we construct PRS to understand the common versus distinct etiology. This study is conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate their association with secondary traits. PheWAS results are then replicated using population-based UK Biobank data. We develop an accompanying visual catalog calledPRSwebthat provides detailed PheWAS results and allows users to directly compare different PRS construction methods. The results of this study can provide guidance regarding PRS construction in future PRS-PheWAS studies using EHR data involving disease subtypes.Author summaryIn the study of genetically complex diseases, polygenic risk scores synthesize information from multiple genetic risk factors to provide insight into a patient’s risk of developing a disease based on his/her genetic profile. These risk scores can be explored in conjunction with health and disease information available in the electronic medical records. They may be associated with diseases that may be related to or precursors of the underlying disease of interest. Limited work is available guiding risk score construction when the goal is to identify associations across the medical phenome. In this paper, we compare different polygenic risk score construction methods in terms of their relationships with the medical phenome. We further propose methods for using these risk scores to decouple the shared and unique genetic profiles of related diseases and to explore related diseases’ shared and unique secondary associations. Leveraging and harnessing the rich data resources of the Michigan Genomics Initiative, a biorepository effort at Michigan Medicine, and the larger population-based UK Biobank study, we investigated the performance of genetic risk profiling methods for the three most common types of skin cancer: melanoma, basal cell carcinoma and squamous cell carcinoma.


2020 ◽  
Author(s):  
Michael D.E. Sewell ◽  
Xueyi Shen ◽  
Lorena Jiménez-Sánchez ◽  
Amelia J. Edmondson-Stait ◽  
Claire Green ◽  
...  

AbstractBackgroundMajor depressive disorder (MDD), schizophrenia (SCZ), and bipolar disorder (BD) have both shared and discrete genetic risk factors and abnormalities in blood-based measures of inflammation and blood-brain barrier (BBB) permeability. The relationships between such genetic architectures and blood-based markers are however unclear. We investigated relationships between polygenic risk scores for these disorders and peripheral biomarkers in the UK Biobank cohort.MethodsWe calculated polygenic risk scores (PRS) for samples of n = 367,329 (MDD PRS), n = 366,465 (SCZ PRS), and n = 366,383 (BD PRS) individuals from the UK Biobank cohort. We examined associations between each disorder PRS and 62 blood markers, using two generalized linear regression models: ‘minimally adjusted’ controlling for variables including age and sex, and ‘fully adjusted’ including additional lifestyle covariates such as alcohol and smoking status.Results12/62, 13/62 and 9/62 peripheral markers were significantly associated with MDD, SCZ and BD PRS respectively for both models. Most associations were disorder PRS-specific, including several immune-related markers for MDD and SCZ. We also identified several BBB-permeable marker associations, including vitamin D for all three disorder PRS, IGF-1 and triglycerides for MDD PRS, testosterone for SCZ PRS, and HDL cholesterol for BD PRS.ConclusionsThis study suggests that MDD, SCZ and BD have shared and distinct peripheral markers associated with disorder-specific genetic risk. The results implicate BBB permeability disruptions in all three disorders and inflammatory dysfunction in MDD and SCZ, and enrich our understanding of potential underlying pathophysiological mechanisms in major psychiatric disorders.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Radenkovic ◽  
S.C Chawla ◽  
G Botta ◽  
A Boli ◽  
M.B Banach ◽  
...  

Abstract   The two leading causes of mortality worldwide are cardiovascular disease (CVD) and cancer. The annual total cost of CVD and cancer is an estimated $844.4 billion in the US and is projected to double by 2030. Thus, there has been an increased shift to preventive medicine to improve health outcomes and development of risk scores, which allow early identification of individuals at risk to target personalised interventions and prevent disease. Our aim was to define a Risk Score R(x) which, given the baseline characteristics of a given individual, outputs the relative risk for composite CVD, cancer incidence and all-cause mortality. A non-linear model was used to calculate risk scores based on the participants of the UK Biobank (= 502548). The model used parameters including patient characteristics (age, sex, ethnicity), baseline conditions, lifestyle factors of diet and physical activity, blood pressure, metabolic markers and advanced lipid variables, including ApoA and ApoB and lipoprotein(a), as input. The risk score was defined by normalising the risk function by a fixed value, the average risk of the training set. To fit the non-linear model &gt;400,000 participants were used as training set and &gt;45,000 participants were used as test set for validation. The exponent of risk function was represented as a multilayer neural network. This allowed capturing interdependent behaviour of covariates, training a single model for all outcomes, and preserving heterogeneity of the groups, which is in contrast to CoxPH models which are traditionally used in risk scores and require homogeneous groups. The model was trained over 60 epochs and predictive performance was determined by the C-index with standard errors and confidence intervals estimated with bootstrap sampling. By inputing the variables described, one can obtain personalised hazard ratios for 3 major outcomes of CVD, cancer and all-cause mortality. Therefore, an individual with a risk Score of e.g. 1.5, at any time he/she has 50% more chances than average of experiencing the corresponding event. The proposed model showed the following discrimination, for risk of CVD (C-index = 0.8006), cancer incidence (C-index = 0.6907), and all-cause mortality (C-index = 0.7770) on the validation set. The CVD model is particularly strong (C-index &gt;0.8) and is an improvement on a previous CVD risk prediction model also based on classical risk factors with total cholesterol and HDL-c on the UK Biobank data (C-index = 0.7444) published last year (Welsh et al. 2019). Unlike classically-used CoxPH models, our model considers correlation of variables as shown by the table of the values of correlation in Figure 1. This is an accurate model that is based on the most comprehensive set of patient characteristics and biomarkers, allowing clinicians to identify multiple targets for improvement and practice active preventive cardiology in the era of precision medicine. Figure 1. Correlation of variables in the R(x) Funding Acknowledgement Type of funding source: None


Circulation ◽  
2018 ◽  
Vol 138 (Suppl_1) ◽  
Author(s):  
Punag Divanji ◽  
Gregory Nah ◽  
Ian Harris ◽  
Anu Agarwal ◽  
Nisha I Parikh

Introduction: Characterized by significant left ventricular (LV) dysfunction and clinical heart failure (HF), peripartum cardiomyopathy (PPCM) has an incidence of approximately 1/2200 live births (0.04%). Prior studies estimate that approximately 25% of those with recovered LV function will have recurrent clinical PPCM during subsequent pregnancies, compared to 50% of those without recovered LV function. Specific predictors of recurrent PPCM have not been studied in cohorts with large numbers. Methods: From 2005-2011, we identified 1,872,227 pregnancies by International Classification of Diseases, 9th Revision (ICD-9) codes in the California Healthcare Cost and Utilization Project (HCUP) database, which captures over 95% of the California hospitalized population. Excluding 15,765 women with prior cardiovascular disease (myocardial infarction, coronary artery disease, stroke, HF, valve disease, or congenital heart disease), yielded n=1,856,462 women. Among women without prior cardiovascular disease, we identified index and subsequent pregnancies with PPCM to determine episodes of recurrent PPCM. We considered the following potential predictors of PPCM recurrence in both univariate and age-adjusted logistic regression models: age, race, hypertension, diabetes, smoking, obesity, chronic kidney disease, family history, pre-eclampsia, ectopic pregnancy, income, and insurance status. Results: In HCUP, n=783 women had pregnancies complicated by PPCM (mean age=30.8 years). Among these women, n=133 had a subsequent pregnancy (17%; mean age=28.1 years), with a mean follow-up of 4.34 years (±1.71 years). In this group of 133 subsequent pregnancies, n=14 (10.5%) were complicated by recurrent PPCM, with a mean time-to-event of 2.2 years (±1.89 years). Among the risk factors studied, the only univariate predictor of recurrent PPCM was grand multiparity, defined as ≥ 5 previous deliveries (odds ratio: 22; 95% confidence interval 4.43-118.22). The other predictors we studied were not significantly associated with recurrent PPCM in either univariate or multivariable models. Conclusion: In a large population database in California with 783 cases of PPCM over a 6-year period, 17% of women had a subsequent pregnancy, of which 10.5% had recurrent PPCM. In age-adjusted logistic regression models, grand multiparity was the only statistically significant predictor of recurrent PPCM.


PLoS Medicine ◽  
2021 ◽  
Vol 18 (10) ◽  
pp. e1003782
Author(s):  
Michael Wainberg ◽  
Samuel E. Jones ◽  
Lindsay Melhuish Beaupre ◽  
Sean L. Hill ◽  
Daniel Felsky ◽  
...  

Background Sleep problems are both symptoms of and modifiable risk factors for many psychiatric disorders. Wrist-worn accelerometers enable objective measurement of sleep at scale. Here, we aimed to examine the association of accelerometer-derived sleep measures with psychiatric diagnoses and polygenic risk scores in a large community-based cohort. Methods and findings In this post hoc cross-sectional analysis of the UK Biobank cohort, 10 interpretable sleep measures—bedtime, wake-up time, sleep duration, wake after sleep onset, sleep efficiency, number of awakenings, duration of longest sleep bout, number of naps, and variability in bedtime and sleep duration—were derived from 7-day accelerometry recordings across 89,205 participants (aged 43 to 79, 56% female, 97% self-reported white) taken between 2013 and 2015. These measures were examined for association with lifetime inpatient diagnoses of major depressive disorder, anxiety disorders, bipolar disorder/mania, and schizophrenia spectrum disorders from any time before the date of accelerometry, as well as polygenic risk scores for major depression, bipolar disorder, and schizophrenia. Covariates consisted of age and season at the time of the accelerometry recording, sex, Townsend deprivation index (an indicator of socioeconomic status), and the top 10 genotype principal components. We found that sleep pattern differences were ubiquitous across diagnoses: each diagnosis was associated with a median of 8.5 of the 10 accelerometer-derived sleep measures, with measures of sleep quality (for instance, sleep efficiency) generally more affected than mere sleep duration. Effect sizes were generally small: for instance, the largest magnitude effect size across the 4 diagnoses was β = −0.11 (95% confidence interval −0.13 to −0.10, p = 3 × 10−56, FDR = 6 × 10−55) for the association between lifetime inpatient major depressive disorder diagnosis and sleep efficiency. Associations largely replicated across ancestries and sexes, and accelerometry-derived measures were concordant with self-reported sleep properties. Limitations include the use of accelerometer-based sleep measurement and the time lag between psychiatric diagnoses and accelerometry. Conclusions In this study, we observed that sleep pattern differences are a transdiagnostic feature of individuals with lifetime mental illness, suggesting that they should be considered regardless of diagnosis. Accelerometry provides a scalable way to objectively measure sleep properties in psychiatric clinical research and practice, even across tens of thousands of individuals.


PLoS Genetics ◽  
2019 ◽  
Vol 15 (6) ◽  
pp. e1008202 ◽  
Author(s):  
Lars G. Fritsche ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Robert B. Peng ◽  
Maxwell Salvatore ◽  
...  

2017 ◽  
Author(s):  
Guillaume Paré ◽  
Shihong Mao ◽  
Wei Q. Deng

AbstractMachine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction R2 of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (N=130K; 1.98M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (p<2.2x10−16) and BMI (p<1.57x10−4), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (N=8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.


2018 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractPolygenic Risk Scores (PRS) consist in combining the information across many single-nucleotide polymorphisms (SNPs) in a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The “Clumping+Thresholding” (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T.In this paper, we present an efficient method to jointly estimate SNP effects, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. The choice of hyper-parameters for a predictive model is very important since it can dramatically impact its predictive performance. As an example, AUC values range from less than 60% to 90% in a model with 30 causal SNPs, depending on the p-value threshold in C+T.We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. PLR consistently achieves higher predictive performance than the two other methods while being as fast as C+T. We find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC of 89% and of 82.5%.In conclusion, our study demonstrates that penalized logistic regression can achieve more discriminative polygenic risk scores, while being applicable to large-scale individual-level data thanks to the implementation we provide in the R package bigstatsr.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Yingying Wang

Abstract Background Obesity and homocysteine (Hcy) are two important risk factors for cardiovascular disease (CVD); however, there were conflicting results for the relationship between them. Our study is to explore the associations of general and central obesity with hyperhomocysteinemia (HHcy) in middle-aged women. Methods The current analysis was based on data from 11007 women aged 40-60 years. Height, weight, and waist circumference (WC) were measured and serum homocysteine was determined. Multiple logistic regression models were used to assess the associations of the risk of hyperhomocysteinemia (HHcy, Hcy&gt;15μmol/L) with BMI and WC. Results 13.71% women had HHcy. The prevalences of BMI-based general obesity and WC-based central obesity were 11.17% and 22.88%, respectively. Compared with non-obese women, the mean serum Hcy concentration was significantly higher in WC-based central obese women (P = 0.002), but not in BMI-based general obese women (P &gt; 0.05). In the multiple logistic regression models, central obesity was positively related to the risk of HHcy (OR = 1.30, 95%CI=1.10 to 1.52), while general obesity was inversely related to the risk of HHcy (OR = 0.82, 95%CI=0.72 to 0.93 and OR = 0.71,95% CI = 0.57 to 0.89). Conclusions Central obesity was positively, while general obesity was negatively related to the risk of HHcy. Menopause showed no effect modification on these associations. Key messages Homocysteine; Central obesity; Menopause; Cardiovascular Disease


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Michael C Honigberg ◽  
Amy Sarma ◽  
Nandita Scott ◽  
Malissa J Wood ◽  
Pradeep Natarajan

Introduction: Depression is associated with an increased risk of coronary artery disease (CAD). Whether depression modifies genetic risk of cardiovascular and cardiometabolic disease is unknown. Methods: We included genotyped, unrelated European ancestry individuals in the UK Biobank. Using genome-wide significant single nucleotide polymorphisms (SNPs) from studies external to the UK Biobank, we generated polygenic risk scores (PRS) for coronary artery disease (CAD, 74 SNPs), hypertension (75 SNPs), type 2 diabetes (T2D, 64 SNPs), atrial fibrillation (25 SNPs), and ischemic stroke (11 SNPs). Participants were stratified by PRS for each condition as low (quintile 1), intermediate (quintiles 2-4), and high (quintile 5) genetic risk. Cox models tested the association of depression frequency with each incident condition among individuals with high PRS, with adjustment for age, sex, the first 20 principal components, genotyping array, and Townsend deprivation index. Additional models further adjusted for health behaviors (exercise, tobacco and alcohol use, vegetable and fresh fruit intake) and tested associations across the PRS spectrum. Results: Among 348,083 individuals, 78,664 (22.6%) reported depression in the past 2 weeks, including 14,776 (4.2%) with depression more than half of days. Depression burden modified the risk of incident CAD across the spectrum of CAD polygenic risk (Figure 1A). Among individuals with high PRS, lack of depression was associated with lower risk of incident CAD (HR 0.70, 95% 0.58-0.86), hypertension (HR 0.58, 95% CI 0.50-0.67), T2D (HR 0.48, 95% CI 0.41-0.55), and atrial fibrillation (HR 0.74, 95% CI 0.62-0.89) compared to those with a high burden of depression. These risk reductions were minimally attenuated after further adjustment for health behaviors (Figure 1B). Conclusions: Lower burden of depression was associated was decreased risks of cardiovascular disease among individuals at high genetic cardiovascular risk.


Sign in / Sign up

Export Citation Format

Share Document