scholarly journals Demographic history mediates the effect of stratification on polygenic scores

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Arslan A Zaidi ◽  
Iain Mathieson

Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but reestimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.

Author(s):  
Arslan A. Zaidi ◽  
Iain Mathieson

AbstractLarge genome-wide association studies (GWAS) have identified many loci exhibiting small but statistically significant associations with complex traits and disease risk. However, control of population stratification continues to be a limiting factor, particularly when calculating polygenic scores where subtle biases can cumulatively lead to large errors. We simulated GWAS under realistic models of demographic history to study the effect of residual stratification in large GWAS. We show that when population structure is recent, it cannot be fully corrected using principal components based on common variants—the standard approach—because common variants are uninformative about recent demographic history. Consequently, polygenic scores calculated from such GWAS results are biased in that they recapitulate non-genetic environmental structure. Principal components calculated from rare variants or identity-by-descent segments largely correct for this structure if environmental effects are smooth. However, even these corrections are not effective for local or batch effects. While sibling-based association tests are immune to stratification, the hybrid approach of ascertaining variants in a standard GWAS and then re-estimating effect sizes in siblings reduces but does not eliminate bias. Finally, we show that rare variant burden tests are relatively robust to stratification. Our results demonstrate that the effect of population stratification on GWAS and polygenic scores depends not only on the frequencies of tested variants and the distribution of environmental effects but also on the demographic history of the population.


Author(s):  
Huaqing Zhao ◽  
Nandita Mitra ◽  
Peter A. Kanetsky ◽  
Katherine L. Nathanson ◽  
Timothy R. Rebbeck

Abstract Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.


2020 ◽  
Vol 29 (5) ◽  
pp. 859-863 ◽  
Author(s):  
Genevieve H L Roberts ◽  
Stephanie A Santorico ◽  
Richard A Spritz

Abstract Autoimmune vitiligo is a complex disease involving polygenic risk from at least 50 loci previously identified by genome-wide association studies. The objectives of this study were to estimate and compare vitiligo heritability in European-derived patients using both family-based and ‘deep imputation’ genotype-based approaches. We estimated family-based heritability (h2FAM) by vitiligo recurrence among a total 8034 first-degree relatives (3776 siblings, 4258 parents or offspring) of 2122 unrelated vitiligo probands. We estimated genotype-based heritability (h2SNP) by deep imputation to Haplotype Reference Consortium and the 1000 Genomes Project data in unrelated 2812 vitiligo cases and 37 079 controls genotyped genome wide, achieving high-quality imputation from markers with minor allele frequency (MAF) as low as 0.0001. Heritability estimated by both approaches was exceedingly high; h2FAM = 0.75–0.83 and h2SNP = 0.78. These estimates are statistically identical, indicating there is essentially no remaining ‘missing heritability’ for vitiligo. Overall, ~70% of h2SNP is represented by common variants (MAF > 0.01) and 30% by rare variants. These results demonstrate that essentially all vitiligo heritable risk is captured by array-based genotyping and deep imputation. These findings suggest that vitiligo may provide a particularly tractable model for investigation of complex disease genetic architecture and predictive aspects of personalized medicine.


2018 ◽  
Author(s):  
Suhas Ganesh ◽  
Ahmed P Husayn ◽  
Ravi Kumar Nadella ◽  
Ravi Prabhakar More ◽  
Manasa Sheshadri ◽  
...  

AbstractIntroductionSevere Mental Illnesses (SMI), such as bipolar disorder and schizophrenia, are highly heritable, and have a complex pattern of inheritance. Genome wide association studies detect a part of the heritability, which can be attributed to common genetic variation. Examination of rare variants with Next Generation Sequencing (NGS) may add to the understanding of genetic architecture of SMIs.MethodsWe analyzed 32 ill subjects (with diagnosis of Bipolar Disorder, n=26; schizophrenia, n=4; schizoaffective disorder, n=1 schizophrenia like psychosis, n=1) from 8 multiplex families; and 33 healthy individuals by whole exome sequencing. Prioritized variants were selected by a 4-step filtering process, which included deleteriousness by 5 in silico algorithms; sharing within families, absence in the controls and rarity in South Asian sample of Exome Aggregation Consortium.ResultsWe identified a total of 42 unique rare, non-synonymous deleterious variants in this study with an average of 5 variants per family. None of the variants were shared across families, indicating a ‘private’ mutational profile. Twenty (47.6%) of the variant harboring genes identified in this sample have been previously reported to contribute to the risk of neuropsychiatric syndromes. These include genes which are related to neurodevelopmental processes, or have been implicated in different monogenic syndromes with a severe neurodevelopmental phenotype.ConclusionNGS approaches in family based studies are useful to identify novel and rare variants in genes for complex disorders like SMI. The study further validates the phenotypic burden of rare variants in Mendelian disease genes, indicating pleiotropic effects in the etiology of severe mental illnesses.


2018 ◽  
Vol 20 (6) ◽  
pp. 2200-2216 ◽  
Author(s):  
Fentaw Abegaz ◽  
Kridsadakorn Chaichoompu ◽  
Emmanuelle Génin ◽  
David W Fardo ◽  
Inke R König ◽  
...  

Abstract Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Shan Jiang ◽  
Daizhan Zhou ◽  
Yin-Ying Wang ◽  
Peilin Jia ◽  
Chunling Wan ◽  
...  

AbstractSchizophrenia (SCZ) is a severe psychiatric disorder with a strong genetic component. High heritability of SCZ suggests a major role for transmitted genetic variants. Furthermore, SCZ is also associated with a marked reduction in fecundity, leading to the hypothesis that alleles with large effects on risk might often occur de novo. In this study, we conducted whole-genome sequencing for 23 families from two cohorts with unaffected siblings and parents. Two nonsense de novo mutations (DNMs) in GJC1 and HIST1H2AD were identified in SCZ patients. Ten genes (DPYSL2, NBPF1, SDK1, ZNF595, ZNF718, GCNT2, SNX9, AACS, KCNQ1, and MSI2) were found to carry more DNMs in SCZ patients than their unaffected siblings by burden test. Expression analyses indicated that these DNM implicated genes showed significantly higher expression in prefrontal cortex in prenatal stage. The DNM in the GJC1 gene is highly likely a loss function mutation (pLI = 0.94), leading to the dysregulation of ion channel in the glutamatergic excitatory neurons. Analysis of rare variants in independent exome sequencing dataset indicates that GJC1 has significantly more rare variants in SCZ patients than in unaffected controls. Data from genome-wide association studies suggested that common variants in the GJC1 gene may be associated with SCZ and SCZ-related traits. Genes co-expressed with GJC1 are involved in SCZ, SCZ-associated pathways, and drug targets. These evidences suggest that GJC1 may be a risk gene for SCZ and its function may be involved in prenatal and early neurodevelopment, a vulnerable period for developmental disorders such as SCZ.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Mashaal Sohail ◽  
Robert M Maier ◽  
Andrea Ganna ◽  
Alex Bloemendal ◽  
Alicia R Martin ◽  
...  

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (<xref ref-type="decision-letter" rid="SA1">see decision letter</xref>).


2019 ◽  
Author(s):  
Shan Jiang ◽  
Daizhan Zhou ◽  
Yin-Ying Wang ◽  
Peilin Jia ◽  
Chunling Wan ◽  
...  

AbstractSchizophrenia (SCZ) is a severe psychiatric disorder with a strong genetic component. High heritability of SCZ suggests a major role for transmitted genetic variants. Furthermore, SCZ is also associated with a marked reduction in fecundity, leading to the hypothesis that alleles with large effects on risk might often occur de novo. In this study, we conducted whole-genome sequencing for 23 families from two cohorts with matched unaffected siblings and parents. Two nonsense de novo mutations (DNMs) in GJC1 and HIST1H2AD were identified in SCZ patients. Ten genes (DPYSL2, NBPF1, SDK1, ZNF595, ZNF718, GCNT2, SNX9, AACS, KCNQ1 and MSI2) were found to carry more DNMs in SCZ patients than their unaffected siblings by burden test. Expression analyses indicated that these DNM implicated genes showed significantly higher expression in prefrontal cortex in prenatal stage. The DNM in the GJC1 gene is highly likely a loss function mutation (pLI = 0.94), leading to the dysregulation of ion channel in the glutamatergic excitatory neurons. Analysis of rare variants in independent exome sequencing dataset indicates that GJC1 has significantly more rare variants in SCZ patients than in unaffected controls. Data from genome-wide association studies suggested that common variants in the GJC1 gene may be associated with SCZ and SCZ-related traits. Genes co-expressed with GJC1 are involved in SCZ, SCZ-associated pathways and drug targets. These evidence suggest that GJC1 may be a risk gene for SCZ and its function may be involved in prenatal and early neurodevelopment, a vulnerable period for developmental disorders such as SCZ.


Genome ◽  
2013 ◽  
Vol 56 (10) ◽  
pp. 634-640 ◽  
Author(s):  
Cristiana Cruceanu ◽  
Amirthagowri Ambalavanan ◽  
Dan Spiegelman ◽  
Julie Gauthier ◽  
Ronald G. Lafrenière ◽  
...  

Bipolar disorder (BD) is a psychiatric condition characterized by the occurrence of at least two episodes of clinically disturbed mood including mania and depression. A vast literature describing BD studies suggests that a strong genetic contribution likely underlies this condition; heritability is estimated to be as high as 80%. Many studies have identified BD susceptibility loci, but because of the genetic and phenotypic heterogeneity observed across individuals, very few loci were subsequently replicated. Research in BD genetics to date has consisted of classical linkage or genome-wide association studies, which have identified candidate genes hypothesized to present common susceptibility variants. Although the observation of such common variants is informative, they can only explain a small fraction of the predicted BD heritability, suggesting a considerable contribution would come from rare and highly penetrant variants. We are seeking to identify such rare variants, and to increase the likelihood of being successful, we aimed to reduce the phenotypic heterogeneity factor by focusing on a well-defined subphenotype of BD: excellent response to lithium monotherapy. Our group has previously shown positive response to lithium therapy clusters in families and has a consistent clinical presentation with minimal comorbidity. To identify such rare variants, we are using a targeted exome capture and high-throughput DNA sequencing approach, and analyzing the entire coding sequences of BD affected individuals from multigenerational families. We are prioritizing rare variants with a frequency of less than 1% in the population that segregate with affected status within each family, as well as being potentially highly penetrant (e.g., protein truncating, missense, or frameshift) or functionally relevant (e.g., 3′UTR, 5′UTR, or splicing). By focusing on rare variants in a familial cohort, we hope to explain a significant portion of the missing heritability in BD, as well as to narrow our current insight on the key biochemical pathways implicated in this complex disorder.


Sign in / Sign up

Export Citation Format

Share Document