scholarly journals On prs for complex polygenic trait prediction

2018 ◽  
Author(s):  
Bingxin Zhao ◽  
Fei Zou

Polygenic risk score (PRS) is the state-of-art prediction method for complex traits using summary level data from discovery genome-wide association studies (GWAS). The PRS, as its name suggests, is designed for polygenic traits by aggregating small genetic effects from a large number of causal SNPs and thus is viewed as a powerful method for predicting complex polygenic traits by the genetics community. However, one concern is that the prediction accuracy of PRS in practice remains low with little clinical utility, even for highly heritable traits. Another practical concern is whether genome-wide SNPs should be used in constructing PRS or not. To address the two concerns, we investigate PRS both empirically and theoretically. We show how the performance of PRS is influenced by the triplet (n, p, m), where n, p, m are the sample size, the number of SNPs studied, and the number of true causal SNPs, respectively. For a given heritability, we find that i) when PRS is constructed with all p SNPs (referred as GWAS-PRS), its prediction accuracy is controlled by the p/n ratio; while ii) when PRS is built with a set of top-ranked SNPs that pass a pre-specified threshold (referred as threshold-PRS), its accuracy varies depending on how sparse the true genetic signals are. Only when m is magnitude smaller than n, or genetic signals are sparse, can threshold-PRS perform well and outperform GWAS-PRS. Our results demystify the low performance of PRS in predicting highly polygenic traits, which will greatly increase researchers’ aware-ness of the power and limitations of PRS, and clear up some confusion on the clinical application of PRS.

Author(s):  
Cesar A Medina ◽  
Harpreet Kaur ◽  
Ian Ray ◽  
Long-Xi Yu

Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve by conventional breeding strategies. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa, previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops including alfalfa. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches which use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. This is the first report in alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.


Animals ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 541
Author(s):  
Long Chen ◽  
Jennie E. Pryce ◽  
Ben J. Hayes ◽  
Hans D. Daetwyler

Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.


2021 ◽  
Vol 19 (4) ◽  
pp. e36
Author(s):  
Wonil Chung

Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.


2019 ◽  
Author(s):  
Ehud Karavani ◽  
Or Zuk ◽  
Danny Zeevi ◽  
Gil Atzmon ◽  
Nir Barzilai ◽  
...  

AbstractGenome-wide association studies have led to the development of polygenic score (PS) predictors that explain increasing proportions of the variance in human complex traits. In parallel, progress in preimplantation genetic testing now allows genome-wide genotyping of embryos generated viain vitrofertilization (IVF). Jointly, these developments suggest the possibility of screening embryos for polygenic traits such as height or cognitive function. There are clear ethical, legal, and societal concerns regarding such a procedure, but these cannot be properly discussed in the absence of data on the expected outcomes of screening. Here, we use theory, simulations, and real data to evaluate the potential gain of PS-based embryo selection, defined as the expected difference in trait value between the top-scoring embryo and an average, unselected embryo. We observe that the gain increases very slowly with the number of embryos, but more rapidly with increased variance explained by the PS. Given currently available polygenic predictors and typical IVF yields, the average gain due to selection would be ≈2.5cm if selecting for height, and ≈2.5 IQ (intelligence quotient) points if selecting for cognitive function. These mean values are accompanied by wide confidence intervals; in real data drawn from nuclear families with up to 20 offspring each, we observe that the offspring with the highest PS for height was the tallest only in 25% of the families. We discuss prospects and limitations of PS-based embryo selection for the foreseeable future.


2020 ◽  
Author(s):  
Olivier Naret ◽  
David AA Baranger ◽  
Sharada Prasanna Mohanty ◽  
Bastian Greshake Tzovaras ◽  
Marcel Salathé ◽  
...  

AbstractBackgroundThe increasing statistical power of genome-wide association studies is fostering the development of precision medicine through genomic predictions of complex traits. Nevertheless, it has been shown that the results remain relatively modest. A reason might be the nature of the methods typically used to construct genomic predictions. Recent machine learning techniques have properties that could help to capture the architecture of complex traits better and improve genomic prediction accuracy.MethodsWe relied on crowd-sourcing to efficiently compare multiple genomic prediction methods. This represents an innovative approach in the genomic field because of the privacy concerns linked to human genetic data. There are two crowd-sourcing elements building our study. First, we constructed a dataset from openSNP (opensnp.org), an open repository where people voluntarily share their genotyping data and phenotypic information in an effort to participate in open science. To leverage this resource we release the ‘openSNP Cohort Maker’, a tool that builds a homogeneous and up-to-date cohort based on the data available on opensnp.org. Second, we organized an open online challenge on the CrowdAI platform (crowdai.org) aiming at predicting height from genome-wide genotyping data.ResultsThe ‘openSNP Height Prediction’ challenge lasted for three months. A total of 138 challengers contributed to 1275 submissions. The winner computed a polygenic risk score using the publicly available summary statistics of the GIANT study to achieve the best result (r2 = 0.53 versus r2 = 0.49 for the second-best).ConclusionWe report here the first crowd-sourced challenge on publicly available genome-wide genotyping data. We also deliver the ‘openSNP Cohort Maker’ that will allow people to make use of the data available on opensnp.org.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1055
Author(s):  
Kaloyan Stoychev ◽  
Dancho Dilkov ◽  
Elahe Naghavi ◽  
Zornitsa Kamburova

(1) Background: Comorbidity between Alcohol Use Disorders (AUD), mood, and anxiety disorders represents a significant health burden, yet its neurobiological underpinnings are still elusive. The current paper reviews all genome-wide association studies conducted in the past ten years, sampling those on AUD and mood or anxiety disorders. (2) Methods: In keeping with PRISMA guidelines, we searched EMBASE, Medline/PUBMED, and PsycINFO databases (January 2010 to December 2020), including references of enrolled studies. Study selection was based on predefined criteria and data underwent a multistep revision process. (3) Results: 15 studies were included. Some of them explored dual diagnoses phenotypes directly while others employed correlational analysis based on polygenic risk score approach. Their results support the significant overlap of genetic factors involved in AUDs and mood and anxiety disorders. Comorbidity risk seems to be conveyed by genes engaged in neuronal development, connectivity, and signaling although the precise neuronal pathways and mechanisms remain unclear. (4) Conclusion: given that genes associated with complex traits including comorbid clinical presentations are of small effect, and individually responsible for a very low proportion of total variance, larger samples consisting of multiple refined comorbid combinations confirmed by re-sequencing approaches will be necessary to disentangle the genetic architecture of dual diagnosis.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2021 ◽  
Vol 42 (1) ◽  
Author(s):  
Dinesh K. Saini ◽  
Yuvraj Chopra ◽  
Jagmohan Singh ◽  
Karansher S. Sandhu ◽  
Anand Kumar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document