scholarly journals A Smoothed Version of the Lassosum Penalty for Fitting Integrated Risk Models Using Summary Statistics or Individual-Level Data

Genes ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 112
Author(s):  
Georg Hahn ◽  
Dmitry Prokopenko ◽  
Sharon Lutz ◽  
Kristina Mullin ◽  
Rudolph Tanzi ◽  
...  

Polygenic risk scores are a popular means to predict the disease risk or disease susceptibility of an individual based on its genotype information. When adding other important epidemiological covariates such as age or sex, we speak of an integrated risk model. Methodological advances for fitting more accurate integrated risk models are of immediate importance to improve the precision of risk prediction, thereby potentially identifying patients at high risk early on when they are still able to benefit from preventive steps/interventions targeted at increasing their odds of survival, or at reducing their chance of getting a disease in the first place. This article proposes a smoothed version of the “Lassosum” penalty used to fit polygenic risk scores and integrated risk models using either summary statistics or raw data. The smoothing allows one to obtain explicit gradients everywhere for efficient minimization of the Lassosum objective function while guaranteeing bounds on the accuracy of the fit. An experimental section on both Alzheimer’s disease and COPD (chronic obstructive pulmonary disease) demonstrates the increased accuracy of the proposed smoothed Lassosum penalty compared to the original Lassosum algorithm (for the datasets under consideration), allowing it to draw equal with state-of-the-art methodology such as LDpred2 when evaluated via the AUC (area under the ROC curve) metric.

2021 ◽  
Author(s):  
Georg Hahn ◽  
Dmitry Prokopenko ◽  
Sharon M. Lutz ◽  
Kristina Mullin ◽  
Rudolph E. Tanzi ◽  
...  

AbstractPolygenic risk scores are a popular means to predict the disease risk or disease susceptibility of an individual based on its genotype information. When adding other important epidemiological covariates such as age or sex, we speak of an integrated risk model. Methodological advances for fitting more accurate integrated risk models are of immediate importance to improve the precision of risk prediction, thereby potentially identifying patients at high risk early on when they are still able to benefit from preventive steps/interventions targeted at increasing their odds of survival, or at reducing their chance of getting a disease in the first place. This article proposes a smoothed version of the “Lassosum” penalty used to fit polygenic risk scores and integrated risk models. The smoothing allows one to obtain explicit gradients everywhere for efficient minimization of the Lassosum objective function while guaranteeing bounds on the accuracy of the fit. An experimental section demonstrates the increased accuracy of the proposed smoothed Lassosum penalty compared to the original Lassosum algorithm, allowing it to draw equal with state-of-the-art methodology such as LDpred when evaluated via the AUC (area under the ROC curve) metric.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jiangming Sun ◽  
Yunpeng Wang ◽  
Lasse Folkersen ◽  
Yan Borné ◽  
Inge Amlien ◽  
...  

AbstractA promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual’s disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction.


2021 ◽  
Author(s):  
Ying Wang ◽  
Shinichi Namba ◽  
Esteban Lopera ◽  
Sini Kerminen ◽  
Kristin Tsuo ◽  
...  

SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic data and electronic health records, many associations between genetic variants and phenotypes of interest have been discovered. Polygenic risk scores (PRS), which are being widely explored in precision medicine, use the results of association studies to predict the genetic component of disease risk by accumulating risk alleles weighted by their effect sizes. However, limited studies have thoroughly investigated best practices for PRS in global populations across different diseases. In this study, we utilize data from the Global-Biobank Meta-analysis Initiative (GBMI), which consists of individuals from diverse ancestries and across continents, to explore methodological considerations and PRS prediction performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRS using heuristic (pruning and thresholding, P+T) and Bayesian (PRS-CS) methods. We found that the genetic architecture, such as SNP-based heritability and polygenicity, varied greatly among endpoints. For both PRS construction methods, using a European ancestry LD reference panel resulted in comparable or higher prediction accuracy compared to several other non-European based panels; this is largely attributable to European descent populations still comprising the majority of GBMI participants. PRS-CS overall outperformed the classic P+T method, especially for endpoints with higher SNP-based heritability. For example, substantial improvements are observed in East-Asian ancestry (EAS) using PRS-CS compared to P+T for heart failure (HF) and chronic obstructive pulmonary disease (COPD). Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma which has known variation in disease prevalence across global populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using the GBMI and highlight the importance of best practices for PRS in the biobank-scale genomics era.


2021 ◽  
Author(s):  
Sijia Huang ◽  
Xiao Ji ◽  
Michael Cho ◽  
Jaehyun Joo ◽  
Jason Moore

Abstract Background COPD is a complex heterogeneous disease influenced by both environmental and genetic risk factors. Traditional genome wide association studies (GWAS) have been successful in identifying many reproducible risk variants of moderate to small effect. Polygenic risk scores (PRS) were developed as way to aggregate risk alleles weighted by their effect size to produce a score which could be used in clinical practice to identify individuals at high risk of disease. A limitation of both GWAS and PRS is that they make the important assumption that the effect of each allele is independent and not modified by other genetic or environmental factors. Machine learning methods such as deep learning (DL) neural networks complement the GWAS and PRS paradigm by making fewer assumptions about the nature of the genetic effects being modeled. For example, the hidden layers of a DL model have the potential to model gene-gene interactions with non-additive effects on disease risk. The goal of the present study was to develop a DL neural network approach to GWAS and PRS and to compare it to the prevailing paradigm based on modeling independent effects. We applied our DL-PRS method to genetic association data from several GWAS studies of chronic obstructive pulmonary disease (COPD).Results We developed a DL learning algorithm for modeling the relationship between genetic variation from GWAS and risk of COPD in several population-based studies. We then developed a DL-PRS based on nodes and associated weights from the first and second layer of the DL neural network. Our DL-PRS framework has overall satisfactory performance in the prediction of COPD and provides significant contribution to prediction in addition to the current PRS methods. Moreover, regarding the clinical relevance of COPD, our DL-PRS has a consistent and closer relationship regarding individual deciles and lung functions such as FEV1/FVC and predicted FEV1%. Conclusions Not only does DL-PRS show favorable predictive performance with current benchmark PRS methods, but it also extends the ranges of PRS deciles in predicting different stages of COPD. Moreover, our DL-PRS results were replicated in an independent cohort. This study opens the door to the use of machine learning for developing risk scores from models developed using fewer assumptions about the nature of the genetic effects.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Jie Song ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research, but often include tuning parameters which are difficult to optimize in practice due to limited access to individual-level data. Here, we introduce PUMAS, a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform various model-tuning procedures using GWAS summary statistics and effectively benchmark and optimize PRS models under diverse genetic architecture. Furthermore, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis.


2019 ◽  
Author(s):  
Zijie Zhao ◽  
Yanyao Yi ◽  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yupei Lin ◽  
...  

AbstractPolygenic risk scores (PRSs) have wide applications in human genetics research. Notably, most PRS models include tuning parameters which improve predictive performance when properly selected. However, existing model-tuning methods require individual-level genetic data as the training dataset or as a validation dataset independent from both training and testing samples. These data rarely exist in practice, creating a significant gap between PRS methodology and applications. Here, we introduce PUMAS (Parameter-tuning Using Marginal Association Statistics), a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform a variety of model-tuning procedures (e.g. cross-validation) using GWAS summary statistics and can effectively benchmark and optimize PRS models under diverse genetic architecture. On average, PUMAS improves the predictive R2 by 205.6% and 62.5% compared to PRSs with arbitrary p-value cutoffs of 0.01 and 1, respectively. Applied to 211 neuroimaging traits and Alzheimer’s disease, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis. We believe our method resolves a fundamental problem without a current solution and will greatly benefit genetic prediction applications.


Sign in / Sign up

Export Citation Format

Share Document