scholarly journals Leveraging Functional Annotations in Genetic Risk Prediction for Human Complex Diseases

2016 ◽  
Author(s):  
Yiming Hu ◽  
Qiongshi Lu ◽  
Ryan Powles ◽  
Xinwei Yao ◽  
Fang Fang ◽  
...  

AbstractGenome wide association studies have identified numerous regions in the genome associated with hundreds of human diseases. Building accurate genetic risk prediction models from these data will have great impacts on disease prevention and treatment strategies. However, prediction accuracy remains moderate for most diseases, which is largely due to the challenges in identifying all the disease-associated variants and accurately estimating their effect sizes. We introduce AnnoPred, a principled framework that incorporates diverse functional annotation data to improve risk prediction accuracy, and demonstrate its performance on multiple human complex diseases.

2017 ◽  
Vol 13 (6) ◽  
pp. e1005589 ◽  
Author(s):  
Yiming Hu ◽  
Qiongshi Lu ◽  
Ryan Powles ◽  
Xinwei Yao ◽  
Can Yang ◽  
...  

2019 ◽  
Vol 29 (1) ◽  
pp. 44-56
Author(s):  
Changshuai Wei ◽  
Ming Li ◽  
Yalu Wen ◽  
Chengyin Ye ◽  
Qing Lu

Genetic association studies using high-throughput genotyping and sequencing technologies have identified a large number of genetic variants associated with complex human diseases. These findings have provided an unprecedented opportunity to identify individuals in the population at high risk for disease who carry causal genetic mutations and hold great promise for early intervention and individualized medicine. While interest is high in building risk prediction models based on recent genetic findings, it is crucial to have appropriate statistical measurements to assess the performance of a genetic risk prediction model. Predictiveness curves were recently proposed as a graphic tool for evaluating a risk prediction model on the basis of a single continuous biomarker. The curve evaluates a risk prediction model for classification performance as well as its usefulness when applied to a population. In this article, we extend the predictiveness curve to measure the collective contribution of multiple genetic variants. We further propose a nonparametric, U-statistics-based measurement, referred to as the U-Index, to quantify the performance of a multi-locus predictiveness curve. In particular, a global U-Index and a partial U-Index can be used in the general population and a subpopulation of particular clinical interest, respectively. Through simulation studies, we demonstrate that the proposed U-Index has advantages over several existing summary statistics under various disease models. We also show that the partial U-Index can have its own uniqueness when rare variants have a substantial contribution to disease risk. Finally, we use the proposed predictiveness curve and its corresponding U-Index to evaluate the performance of a genetic risk prediction model for nicotine dependence.


2021 ◽  
Author(s):  
Helgi Hilmarsson ◽  
Arvind S. Kumar ◽  
Richa Rastogi ◽  
Carlos D. Bustamante ◽  
Daniel Mas Montserrat ◽  
...  

ABSTRACTAs genome-wide association studies and genetic risk prediction models are extended to globally diverse and admixed cohorts, ancestry deconvolution has become an increasingly important tool. Also known as local ancestry inference (LAI), this technique identifies the ancestry of each region of an individual’s genome, thus permitting downstream analyses to account for genetic effects that vary between ancestries. Since existing LAI methods were developed before the rise of massive, whole genome biobanks, they are computationally burdened by these large next generation datasets. Current LAI algorithms also fail to harness the potential of whole genome sequences, falling well short of the accuracy that such high variant densities can enable. Here we introduce Gnomix, a set of algorithms that address each of these points, achieving higher accuracy and swifter computational performance than any existing LAI method, while also enabling portable models that are particularly useful when training data are not shareable due to privacy or other restrictions. We demonstrate Gnomix (and its swift phase correction counterpart Gnofix) on worldwide whole-genome data from both humans and canids and utilize its high resolution accuracy to identify the location of ancient New World haplotypes in the Xoloitzcuintle, dating back over 100 generations. Code is available at https://github.com/AI-sandbox/gnomix.


2020 ◽  
Vol 38 (12) ◽  
pp. 1312-1321
Author(s):  
Noha Sharafeldin ◽  
Joshua Richman ◽  
Alysia Bosworth ◽  
Yanjun Chen ◽  
Purnima Singh ◽  
...  

PURPOSE Using a candidate gene approach, we tested the hypothesis that individual single nucleotide polymorphisms (SNPs) and gene-level variants are associated with cognitive impairment in patients with hematologic malignancies treated with blood or marrow transplantation (BMT) and that inclusion of these SNPs improves risk prediction beyond that offered by clinical and demographic characteristics. PATIENTS AND METHODS In the discovery cohort, BMT recipients underwent a standardized battery of neuropsychological tests pre-BMT and at 6 months, 1 year, 2 years, and 3 years post-BMT. Associations between 68 candidate genes and cognitive impairment were assessed using generalized estimating equation models. Elastic-Net regression was used to build Base (sociodemographic), Clinical, and Combined (Base plus Clinical plus genetic) risk prediction models of post-BMT impairment. An independent nonoverlapping cohort from the BMT Survivor Study with self-report of learning/memory problems (as identified by their health care provider) was used for model replication. RESULTS The discovery cohort included 277 participants (58.5% males; 68.6% non-Hispanic whites; and 46.6% allogeneic BMT recipients). Adjusting for BMT type, age at BMT, sex, race/ethnicity, and cognitive reserve, SNPs in the blood-brain barrier, telomere homeostasis, and DNA repair genes were significantly associated with cognitive impairment. Compared with the Clinical Model, the Combined Model had higher predictive power in both the discovery cohort (mean area under the receiver operating characteristic curve [AUC], 0.89; 95% CI, 0.85 to 0.93 v 0.77; 95% CI, 0.71 to 0.83; P = 1.24 × 10−9) and the replication cohort (AUC, 0.71; 95% CI, 0.66 to 0.76 v 0.63; 95% CI, 0.57 to 0.68; P = .004). CONCLUSION Inclusion of candidate genetic variants enhanced the prediction of risk of post-BMT cognitive impairment beyond that offered by demographic/clinical characteristics and represents a step toward a personalized approach to managing patients at high risk for cognitive impairment after BMT.


2012 ◽  
Vol 32 (suppl_1) ◽  
Author(s):  
Themistocles L Assimes ◽  
Benjamin Goldstein ◽  

Genome wide association studies (GWAS) to date have identified 30 CAD susceptibility loci but the ability to use this information to improve risk prediction remains limited. A meta-analysis of the GWAS and Cardio Metabochip data produced by the CARDIoGRAM+C4D consortium representing 63,253 cases and 126,820 controls has identified 1885 SNPs passing a False Discovery Rate (FDR) threshold of 0.5%. We hypothesized that an expanded multi locus genetic risk score (GRS) incorporating genotype information at all loci below an FDR of 0.5% would perform better than a GRS restricted to 42 loci reaching genome wide significance and tested this hypothesis in subjects of European ancestry participating in the Atherosclerosis Risk in the Community (ARIC) study. Models testing the GRS were either minimally (age and sex) or fully adjusted for traditional risk factors (TRFs). The Figure shows the hazard ratio (HZ) and 95% CI for incident events comparing each quintile of GRS to the middle quintile. The GRS including genotype information at all loci with an FDR of 0.5% noticeably improves risk prediction over the GRS restricted to genome wide significant loci in both the minimally and fully adjusted models based on several metrics including i) HR per GRS quintile, ii) the HR per SD of the GRS, and iii) the logistic regression pseudo R2, and iv) the c statistic. The HR per GRS quintile and per SD of GRS were all lower in the fully adjusted models compared to the respective minimally adjusted models but the reduction of the HR was more striking for the models that tested the more expansive GRS. These findings suggest that a larger proportion of novel GWAS CAD loci are mediating their effects through TRFs. While these findings demonstrate some progress in risk prediction using GWAS loci, both the limited and the expanded GRS continues to explain a relatively small proportion of the overall variance compared to TRF. Thus, the clinical utility of a CAD GRS remains to be determined.


2020 ◽  
Vol 3 (1) ◽  
pp. 265-288
Author(s):  
Ning Sun ◽  
Hongyu Zhao

Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.


2014 ◽  
Vol 7 (2) ◽  
pp. 110-115 ◽  
Author(s):  
Jacqueline N. Milton ◽  
Victor R. Gordeuk ◽  
James G. Taylor ◽  
Mark T. Gladwin ◽  
Martin H. Steinberg ◽  
...  

2019 ◽  
Author(s):  
Arun Durvasula ◽  
Kirk E. Lohmueller

Accurate genetic risk prediction is a key goal for medical genetics and great progress has been made toward identifying individuals with extreme risk across several traits and diseases (Collins and Varmus, 2015). However, many of these studies are done in predominantly European populations (Bustamante et al., 2011; Popejoy and Fullerton, 2016). Although GWAS effect sizes correlate across ancestries (Wojcik et al., 2019), risk scores show substantial reductions in accuracy when applied to non-European populations (Kim et al., 2018; Martin et al., 2019; Scutari et al., 2016). We use simulations to show that human demographic history and negative selection on complex traits result in population specific genetic architectures. For traits under moderate negative selection, ~50% of the heritability can be accounted for by variants in Europe that are absent from Africa. We show that this directly leads to poor performance in risk prediction when using variants discovered in Europe to predict risk in African populations, especially in the tails of the risk distribution. To evaluate the impact of this effect in genomic data, we built a Bayesian model to stratify heritability between European-specific and shared variants and applied it to 43 traits and diseases in the UK Biobank. Across these phenotypes, we find ~50% of the heritability comes from European-specific variants, setting an upper bound on the accuracy of genetic risk prediction in non-European populations using effect sizes discovered in European populations. We conclude that genetic association studies need to include more diverse populations to enable to utility of genetic risk prediction in all populations.


2019 ◽  
Author(s):  
R.L. Kember ◽  
A. Verma ◽  
S. Verma ◽  
A. Lucas ◽  
R. Judy ◽  
...  

AbstractCardio-renal-metabolic (CaReMe) conditions are common and the leading cause of mortality around the world. Genome-wide association studies have shown that these diseases are polygenic and share many genetic risk factors. Identifying individuals at high genetic risk will allow us to target prevention and treatment strategies. Polygenic risk scores (PRS) are aggregate weighted counts that can demonstrate an individual’s genetic liability for disease. However, current PRS are often based on European ancestry individuals, limiting the implementation of precision medicine efforts in diverse populations. In this study, we develop PRS for six diseases and traits related to cardio-renal-metabolic disease in the Penn Medicine Biobank. We investigate their performance in both European and African ancestry individuals, and identify genetic and phenotypic overlap within these conditions. We find that genetic risk is associated with the primary phenotype in both ancestries, but this does not translate into a model of predictive value in African ancestry individuals. We conclude that future research should prioritize genetic studies in diverse ancestries in order to address this disparity.


Sign in / Sign up

Export Citation Format

Share Document