Strategies for Developing Prediction Models From Genome-Wide Association Studies

2013 ◽  
Vol 37 (8) ◽  
pp. 768-777 ◽  
Author(s):  
Jincao Wu ◽  
Ruth M. Pfeiffer ◽  
Mitchell H. Gail
2020 ◽  
Vol 3 (1) ◽  
pp. 265-288
Author(s):  
Ning Sun ◽  
Hongyu Zhao

Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.


2008 ◽  
Vol 93 (12) ◽  
pp. 4633-4642 ◽  
Author(s):  
Jose C. Florez

Context: Over the last few months, genome-wide association studies have contributed significantly to our understanding of the genetic architecture of type 2 diabetes. If and how this information will impact clinical practice is not yet clear. Evidence Acquisition: Primary papers reporting genome-wide association studies in type 2 diabetes or establishing a reproducible association for specific candidate genes were compiled. Further information was obtained from background articles, authoritative reviews, and relevant meeting conferences and abstracts. Evidence Synthesis: As many as 17 genetic loci have been convincingly associated with type 2 diabetes; 14 of these were not previously known, and most of them were unsuspected. The associated polymorphisms are common in populations of European descent but have modest effects on risk. These loci highlight new areas for biological exploration and allow the initiation of experiments designed to develop prediction models and test possible pharmacogenetic and other applications. Conclusions: Although substantial progress in our knowledge of the genetic basis of type 2 diabetes is taking place, these new discoveries represent but a small proportion of the genetic variation underlying the susceptibility to this disorder. Major work is still required to identify the causal variants, test their role in disease prediction and ascertain their therapeutic implications.


Author(s):  
Greg Dyson ◽  
Charles F. Sing

AbstractWe have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.


2021 ◽  
Author(s):  
Chenggen Chu ◽  
Shichen Wang ◽  
Jackie C. Rudd ◽  
Amir M.H. Ibrahim ◽  
Qingwu Xue ◽  
...  

Abstract Using imbalanced historical yield data to predict performance and select new lines is an arduous breeding task. Genome-wide association studies (GWAS) and high throughput genotyping based on sequencing techniques can increase prediction accuracy. An association mapping panel of 227 Texas elite (TXE) wheat breeding lines was used for GWAS and a training population to develop prediction models for grain yield selection. An imbalanced set of yield data collected from 102 environments (year-by-location) over ten years, through testing yield in 40–66 lines each year at 6–14 locations with 38–41 lines repeated in the test in any two consecutive years, was used. Based on correlations among data from different environments within two adjacent years and heritability estimated in each environment, yield data from 87 environments were selected and assigned to two correlation-based groups. The yield best linear unbiased estimation (BLUE) from each group, along with reaction to greenbug and Hessian fly in each line, were used for GWAS to reveal genomic regions associated with yield and insect resistance. A total of 74 genomic regions were associated with grain yield and two of them were commonly detected in both correlation-based groups. Greenbug resistance in TXE lines was mainly controlled by Gb3 on chromosome 7DL in addition to two novel regions on 3DL and 6DS, and Hessian fly resistance was conferred by the region on 1AS. Genomic prediction models developed in two correlation-based groups were validated using a set of 105 new advanced breeding lines and the model from correlation-based group G2 was more reliable for prediction. This research not only identified genomic regions associated with yield and insect resistance but also established the method of using historical imbalanced breeding data to develop a genomic prediction model for crop improvement.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Feng Guo ◽  
Xuechen Chen ◽  
Jenny Chang-Claude ◽  
Michael Hoffmeister ◽  
Hermann Brenner

Abstract Background Polygenic risk scores (PRS), which are derived from results of large genome-wide association studies, are increasingly propagated for colorectal cancer (CRC) risk stratification. The majority of studies included in the large genome-wide association studies consortia were conducted in the United States and Germany, where colonoscopy with detection and removal of polyps has been widely practiced over the last decades. We aimed to assess if and to what extent the history of colonoscopy with polypectomy may alter metrics of the predictive ability of PRS for CRC risk. Methods A PRS based on 140 single nucleotide polymorphisms was compared between 4939 CRC patients and 3797 control persons of the Darmkrebs: Chancen der Verhütung durch Screening (DACHS) study, a population-based case-control study conducted in Germany. Risk discrimination was quantified according to the history of colonoscopy and polypectomy by areas under the curves (AUCs) and their 95% confidence intervals (CIs). All statistical tests were 2-sided. Results AUCs and 95% CIs were higher among subjects without previous colonoscopy (AUC = 0.622, 95% CI = 0.606 to 0.639) than among those with previous colonoscopy and polypectomy (AUC = 0.568, 95% CI = 0.536 to 0.601; difference [Δ AUC] = 0.054, P = .004). Such differences were consistently seen in sex-specific groups (women: Δ AUC = 0.073, P = .02; men: Δ AUC = 0.046, P = .048) and age-specific groups (younger than 70 years: Δ AUC = 0.052, P = .07; 70 years or older: Δ AUC = 0.049, P = .045). Conclusions Predictive performance of PRS may be underestimated in populations with widespread use of colonoscopy. Future studies using PRS to develop CRC prediction models should carefully consider colonoscopy history to provide more accurate estimates.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Hang-Rai Kim ◽  
Sang-Hyuk Jung ◽  
Jaeho Kim ◽  
Hyemin Jang ◽  
Sung Hoon Kang ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have identified a number of genetic variants for Alzheimer’s disease (AD). However, most GWAS were conducted in individuals of European ancestry, and non-European populations are still underrepresented in genetic discovery efforts. Here, we performed GWAS to identify single nucleotide polymorphisms (SNPs) associated with amyloid β (Aβ) positivity using a large sample of Korean population. Methods One thousand four hundred seventy-four participants of Korean ancestry were recruited from multicenters in South Korea. Discovery dataset consisted of 1190 participants (383 with cognitively unimpaired [CU], 330 with amnestic mild cognitive impairment [aMCI], and 477 with AD dementia [ADD]) and replication dataset consisted of 284 participants (46 with CU, 167 with aMCI, and 71 with ADD). GWAS was conducted to identify SNPs associated with Aβ positivity (measured by amyloid positron emission tomography). Aβ prediction models were developed using the identified SNPs. Furthermore, bioinformatics analysis was conducted for the identified SNPs. Results In addition to APOE, we identified nine SNPs on chromosome 7, which were associated with a decreased risk of Aβ positivity at a genome-wide suggestive level. Of these nine SNPs, four novel SNPs (rs73375428, rs2903923, rs3828947, and rs11983537) were associated with a decreased risk of Aβ positivity (p < 0.05) in the replication dataset. In a meta-analysis, two SNPs (rs7337542 and rs2903923) reached a genome-wide significant level (p < 5.0 × 10−8). Prediction performance for Aβ positivity increased when rs73375428 were incorporated (area under curve = 0.75; 95% CI = 0.74–0.76) in addition to clinical factors and APOE genotype. Cis-eQTL analysis demonstrated that the rs73375428 was associated with decreased expression levels of FGL2 in the brain. Conclusion The novel genetic variants associated with FGL2 decreased risk of Aβ positivity in the Korean population. This finding may provide a candidate therapeutic target for AD, highlighting the importance of genetic studies in diverse populations.


2021 ◽  
Author(s):  
Samuel Pattillo Smith ◽  
Sahar Shahamatdar ◽  
Wei Cheng ◽  
Selena Zhang ◽  
Joseph Paik ◽  
...  

AbstractSince 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from European ancestry individuals apply heterogeneously in non-European ancestry individuals. Here, we argue that enrichment analyses which aggregate SNP-level association statistics at multiple genomic scales—to genes and pathways—have been overlooked and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the insights generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven self-identified human ancestries in the UK Biobank and the Biobank Japan, as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African-American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. By testing for statistical associations at multiple genomic scales, enrichment analyses also illustrate the importance of reconciling contrasting results from association tests, heritability estimation, and prediction models in order to make personalized medicine a reality for all.


Sign in / Sign up

Export Citation Format

Share Document