Increasing the resolution and precision of psychiatric GWAS by re-imputing summary statistics using a large, diverse reference panel

Mapping Intimacies ◽

10.1101/496570 ◽

2018 ◽

Author(s):

Chris Chatzinakos ◽

Donghyung Lee ◽

Na Cai ◽

Vladimir I. Vladimirov ◽

Bradley T. Webb ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genome Project ◽

Genotype Imputation ◽

Reference Panel ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Common Variants ◽

Post Traumatic Stress ◽

Mixed Ancestry

ABSTRACTGenotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel e.g., ~2,500 subject coming from the 1000 Genome Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: i) uses a much larger and more diverse reference panel and ii) estimates weights of ethnic mixture based solely on Z-scores (when AFs are not available). We applied DISTMIX2 to GWAS summary statistics from the Psychiatric Genetic Consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder (PTSD) study found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most Psychiatric GWAS studies.

Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model

10.1101/133132 ◽

2017 ◽

Cited By ~ 8

Author(s):

Dominic Holland ◽

Oleksandr Frei ◽

Rahul Desikan ◽

Chun-Chieh Fan ◽

Alexey A. Shadrin ◽

...

Keyword(s):

Association Studies ◽

Causal Snps ◽

Reference Panel ◽

Causal Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Common Variants ◽

Genome Wide ◽

Causal Variants

AbstractEstimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities ranging from ≃ 2 × 10−5to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.Author SummaryThere are ~10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype.Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ~11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants – we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities – whose product allows for lower bound estimates of heritability – vary by orders of magnitude.

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

10.1101/2021.12.20.473535 ◽

2021 ◽

Author(s):

Quan Sun ◽

Weifang Liu ◽

Jonathan D Rosen ◽

Le Huang ◽

Rhonda G Pace ◽

...

Keyword(s):

Cystic Fibrosis ◽

Sample Size ◽

Association Studies ◽

Genetic Disorder ◽

Genome Project ◽

Genotype Imputation ◽

Reference Panel ◽

Effective Sample Size ◽

Polygenic Risk Score ◽

Genome Wide Association Studies

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped ~8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (~30x) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among CF patients. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the ~8,000 CF samples with GWAS array genotype using the TOPMed freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for CF patients, boosting genomic coverage from ~0.3 - 4.2 million genotyped markers to ~11 - 43 million well-imputed markers, and significantly improving Polygenic Risk Score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of CF patients. We demonstrate that despite having ~3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely due to allele and haplotype differences between CF patients and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Abstract 367: Extreme High-Density Lipoprotein Cholesterol Genetics: An Assortment of Large and Small Polygenic Effects

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.367 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Jacqueline S Dron ◽

Jian Wang ◽

Cécile Low-Kam ◽

Sumeet A Khetarpal ◽

John F Robinson ◽

...

Keyword(s):

Large Scale ◽

Genetic Basis ◽

Rare Variants ◽

Association Studies ◽

Density Lipoprotein ◽

Copy Number Variations ◽

Genome Wide Association Studies ◽

Common Variants ◽

Targeted Next Generation Sequencing ◽

Common Genetic Variants

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.

Common variants at 5q33.1 predispose to migraine in African-American children

Journal of Medical Genetics ◽

10.1136/jmedgenet-2018-105359 ◽

2018 ◽

Vol 55 (12) ◽

pp. 831-836 ◽

Cited By ~ 2

Author(s):

Xiao Chang ◽

Renata Pellegrino ◽

James Garifallou ◽

Michael March ◽

James Snyder ◽

...

Keyword(s):

African American ◽

Large Scale ◽

Association Studies ◽

Independent Study ◽

P Value ◽

African American Children ◽

Genome Wide Association Studies ◽

Common Variants ◽

Primary Analysis ◽

American Children

BackgroundGenome-wide association studies (GWASs) have identified multiple susceptibility loci for migraine in European adults. However, no large-scale genetic studies have been performed in children or African Americans with migraine.MethodsWe conducted a GWAS of 380 African-American children and 2129 ancestry-matched controls to identify variants associated with migraine. We then attempted to replicate our primary analysis in an independent cohort of 233 African-American patients and 4038 non-migraine control subjects.ResultsThe results of this study indicate that common variants at 5q33.1 associated with migraine risk in African-American children (rs72793414, p=1.94×10−9). The association was validated in an independent study (p=3.87×10−3) for an overall meta-analysis p value of 3.81×10−10. eQTL (Expression quantitative trait loci) analysis of the Genotype-Tissue Expression data also shows the genotypes of rs72793414 were strongly correlated with the mRNA expression levels of NMUR2 at 5q33.1. NMUR2 encodes a G protein-coupled receptor of neuromedin-U (NMU). NMU, a highly conserved neuropeptide, participates in diverse physiological processes of the central nervous system.ConclusionsThis study provides new insights into the genetic basis of childhood migraine and allow for precision therapeutic development strategies targeting migraine patients of African-American ancestry.

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Human demographic history impacts genetic risk prediction across diverse populations

10.1101/070797 ◽

2016 ◽

Cited By ~ 7

Author(s):

Alicia R. Martin ◽

Christopher R. Gignoux ◽

Raymond K. Walters ◽

Genevieve L. Wojcik ◽

Benjamin M. Neale ◽

...

Keyword(s):

Risk Prediction ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Demographic History ◽

Population History ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Medical Genomics

AbstractThe vast majority of genome-wide association studies are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g. linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely-used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWAS, we used published summary statistics to calculate polygenic risk scores for six well-studied traits and diseases. We identified directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk were typically highest in the population from which summary statistics were derived. We demonstrated that scores inferred from European GWAS were biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction were possible and unpredictable. This work cautions that summarizing findings from large-scale GWAS may have limited portability to other populations using standard approaches, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.

Pleiotropy of Alzheimer’s Disease and Educational Attainment: Insights from the Summary Statistics

Innovation in Aging ◽

10.1093/geroni/igab046.3513 ◽

2021 ◽

Vol 5 (Supplement_1) ◽

pp. 986-986

Author(s):

Yury Loika ◽

Elena Loiko ◽

Irina Culminskaya ◽

Alexander Kulminski

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Educational Attainment ◽

Large Scale ◽

Association Studies ◽

Epidemiological Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Nucleotide Polymorphisms ◽

Omnibus Test

Abstract Epidemiological studies report beneficial associations of higher educational attainment (EDU) with Alzheimer’s disease (AD). Prior genome-wide association studies (GWAS) also reported variants associated with AD and EDU separately. The analysis of pleiotropic predisposition to these phenotypes may shed light on EDU-related protection against AD. We examined pleiotropic predisposition to AD and EDU using Fisher’s method and omnibus test applied to summary statistics for single nucleotide polymorphisms (SNPs) associated with AD and EDU in large-scale univariate GWAS at suggestive-effect (5×10-8

Deep genotype imputation captures virtually all heritability of autoimmune vitiligo

Human Molecular Genetics ◽

10.1093/hmg/ddaa005 ◽

2020 ◽

Vol 29 (5) ◽

pp. 859-863 ◽

Cited By ~ 3

Author(s):

Genevieve H L Roberts ◽

Stephanie A Santorico ◽

Richard A Spritz

Keyword(s):

Complex Disease ◽

Rare Variants ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Common Variants ◽

Genome Wide ◽

Autoimmune Vitiligo ◽

Family Based ◽

Project Data

Abstract Autoimmune vitiligo is a complex disease involving polygenic risk from at least 50 loci previously identified by genome-wide association studies. The objectives of this study were to estimate and compare vitiligo heritability in European-derived patients using both family-based and ‘deep imputation’ genotype-based approaches. We estimated family-based heritability (h2FAM) by vitiligo recurrence among a total 8034 first-degree relatives (3776 siblings, 4258 parents or offspring) of 2122 unrelated vitiligo probands. We estimated genotype-based heritability (h2SNP) by deep imputation to Haplotype Reference Consortium and the 1000 Genomes Project data in unrelated 2812 vitiligo cases and 37 079 controls genotyped genome wide, achieving high-quality imputation from markers with minor allele frequency (MAF) as low as 0.0001. Heritability estimated by both approaches was exceedingly high; h2FAM = 0.75–0.83 and h2SNP = 0.78. These estimates are statistically identical, indicating there is essentially no remaining ‘missing heritability’ for vitiligo. Overall, ~70% of h2SNP is represented by common variants (MAF > 0.01) and 30% by rare variants. These results demonstrate that essentially all vitiligo heritable risk is captured by array-based genotyping and deep imputation. These findings suggest that vitiligo may provide a particularly tractable model for investigation of complex disease genetic architecture and predictive aspects of personalized medicine.

Increasing the resolution and precision of psychiatric genome‐wide association studies by re‐imputing summary statistics using a large, diverse reference panel

American Journal of Medical Genetics Part B Neuropsychiatric Genetics ◽

10.1002/ajmg.b.32834 ◽

2021 ◽

Author(s):

Chris Chatzinakos ◽

Donghyung Lee ◽

Na Cai ◽

Vladimir I. Vladimirov ◽

Bradley T. Webb ◽

...

Keyword(s):

Association Studies ◽

Reference Panel ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide