Deep genotype imputation captures virtually all heritability of autoimmune vitiligo

Abstract Autoimmune vitiligo is a complex disease involving polygenic risk from at least 50 loci previously identified by genome-wide association studies. The objectives of this study were to estimate and compare vitiligo heritability in European-derived patients using both family-based and ‘deep imputation’ genotype-based approaches. We estimated family-based heritability (h2FAM) by vitiligo recurrence among a total 8034 first-degree relatives (3776 siblings, 4258 parents or offspring) of 2122 unrelated vitiligo probands. We estimated genotype-based heritability (h2SNP) by deep imputation to Haplotype Reference Consortium and the 1000 Genomes Project data in unrelated 2812 vitiligo cases and 37 079 controls genotyped genome wide, achieving high-quality imputation from markers with minor allele frequency (MAF) as low as 0.0001. Heritability estimated by both approaches was exceedingly high; h2FAM = 0.75–0.83 and h2SNP = 0.78. These estimates are statistically identical, indicating there is essentially no remaining ‘missing heritability’ for vitiligo. Overall, ~70% of h2SNP is represented by common variants (MAF > 0.01) and 30% by rare variants. These results demonstrate that essentially all vitiligo heritable risk is captured by array-based genotyping and deep imputation. These findings suggest that vitiligo may provide a particularly tractable model for investigation of complex disease genetic architecture and predictive aspects of personalized medicine.

Download Full-text

GIGI2: A Fast Approach for Parallel Genotype Imputation in Large Pedigrees

10.1101/533687 ◽

2019 ◽

Author(s):

Ehsan Ullah ◽

Khalid Kunji ◽

Ellen M. Wijsman ◽

Mohamad Saad

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Association Studies ◽

Large Family ◽

Genotype Imputation ◽

Computational Time ◽

Genome Wide Association Studies ◽

Link Type ◽

Genome Wide ◽

Family Based

AbstractMotivationImputation of untyped SNPs has become important in Genome-wide Association Studies (GWAS). There has also been a trend towards analyzing rare variants, driven by the decrease of genome sequencing costs. Rare variants are enriched in pedigrees that have many cases or extreme phenotypes. This is especially the case for large pedigrees, which makes family-based designs ideal to detect rare variants associated with complex traits. The costs of performing relatively large family-based GWAS can be significantly reduced by fully sequencing only a fraction of the pedigree and performing imputation on the remaining subjects. The program GIGI can efficiently perform imputation in large pedigrees but can be time consuming. Here, we implement GIGI’s imputation approach in a new program, GIGI2, which performs imputation with computational time reduced by at least 25x on one thread and 120x on eight threads. The memory usage of GIGI2 is reduced by at least 30x. This reduction is achieved by implementing better memory layout and a better algorithm for solving the Identity by Descent graphs, as well as with additional features, including multithreading. We also make GIGI2 available as a webserver based on the same framework as the Michigan Imputation Server.AvailabilityGIGI2 is freely available online at https://cse-git.qcri.org/eullah/GIGI2 and the websever is at https://imputation.qcri.org/[email protected]

Download Full-text

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Download Full-text

SMARCA2 common variant association and rare variant excess in Schizophrenia patients from an Algerian Trio Cohort

European Psychiatry ◽

10.1016/s0924-9338(11)73051-6 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 1346-1346

Author(s):

D. Benmessaoud ◽

A.-M. Lepagnol-Bestel ◽

M. Delepine ◽

J. Hager ◽

J.-M. Moalic ◽

...

Keyword(s):

Rare Variants ◽

Association Studies ◽

Common Variant ◽

Genome Wide Association Studies ◽

Common Variants ◽

Fisher Test ◽

Coding Regions ◽

Genome Wide ◽

Whole Exome ◽

Positive Evolution

Genome wide association studies (GWAS) of Schizophrenia (SZ) patients have identified common variants in ten genes including SMARCA2 (Koga et al., HMG, 2009). We found that the SZ-GWAS genes are part of an interacting network centered on SMARCA2 (Loe-Mie et al., HMG, 2010). Furthermore, SMARCA2 was found disrupted in SZ (Walsh et al., Science, 2008). SMARCA2 encodes the ATPase (BRM) of the SWI/SNF chromatin remodeling complex that is at the interface of genome and environmental adaptation.Taking advantage of an Algerian trio cohort of one hundred SZ patients (Benmessaoud et al., BMC Psychiatry, 2008), we replicated the association of SNP rs2296212 localized in exon 33, already shown associated in Koga study and resulting in D1546E amino acid change in the SMARCA2 protein. We studied SMARCA2 codons and found that exon 33 displays a signature of positive evolution in the primate lineage.Our working hypothesis is that the coding regions displaying positive selection are target of novel rare variants. To address this question, we sequenced two exons displaying positive evolution and one exon without evidence of positive evolution.We found (i) that rare variants are significantly in excess in SZ-patients compared to their parents (p = 0.038, Fisher test) and (ii) a higher proportion of rare variants in the primate-accelerated exons compared with the non-evolutionary exon in SZ-patients (p = 0.032, Fisher test).SMARCA2 exon sequencing and whole exome sequencing from patients harboring SNP rs2296212 common variant are under progress. Altogether, these results are expected to give new insights into the genetic architecture of SZ.

Download Full-text

Detecting association of rare and common variants by adaptive combination of P-values

Genetics Research ◽

10.1017/s0016672315000208 ◽

2015 ◽

Vol 97 ◽

Cited By ~ 2

Author(s):

YAJING ZHOU ◽

YONG WANG

Keyword(s):

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Common Variants ◽

Next Generation Sequencing Technology ◽

Adaptive Combination ◽

Genome Wide ◽

Wide Range ◽

Causal Variants ◽

Burden Tests

SummaryGenome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.

Download Full-text

Unique roles of rare variants in the genetics of complex diseases in humans

Journal of Human Genetics ◽

10.1038/s10038-020-00845-2 ◽

2020 ◽

Vol 66 (1) ◽

pp. 11-23

Author(s):

Yukihide Momozawa ◽

Keijiro Mizukami

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Association Studies ◽

Complex Diseases ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequencing Analysis ◽

Common Variants ◽

Distinctive Features ◽

Genome Wide

AbstractGenome-wide association studies have identified >10,000 genetic variants associated with various phenotypes and diseases. Although the majority are common variants, rare variants with >0.1% of minor allele frequency have been investigated by imputation and using disease-specific custom SNP arrays. Rare variants sequencing analysis mainly revealed have played unique roles in the genetics of complex diseases in humans due to their distinctive features, in contrast to common variants. Unique roles are hypothesis-free evidence for gene causality, a precise target of functional analysis for understanding disease mechanisms, a new favorable target for drug development, and a genetic marker with high disease risk for personalized medicine. As whole-genome sequencing continues to identify more rare variants, the roles associated with rare variants will also increase. However, a better estimation of the functional impact of rare variants across whole genome is needed to enhance their contribution to improvements in human health.

Download Full-text

Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses

10.1101/579201 ◽

2019 ◽

Cited By ~ 4

Author(s):

Mart Kals ◽

Tiit Nikopensius ◽

Kristi Läll ◽

Kalle Pärn ◽

Timo Tõnis Sikka ◽

...

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genotype Imputation ◽

Reference Panel ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Variant Analysis ◽

Coding Variants

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.

Download Full-text

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

10.1101/563866 ◽

2019 ◽

Cited By ~ 49

Author(s):

Daniel Taliun ◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

Zachary A. Szpiech ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Disease Biology ◽

Genome Wide ◽

Novel Variants

Summary paragraphThe Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.

Download Full-text

Family-based exome-sequencing approach identifies rare susceptibility variants for lithium-responsive bipolar disorder

Genome ◽

10.1139/gen-2013-0081 ◽

2013 ◽

Vol 56 (10) ◽

pp. 634-640 ◽

Cited By ~ 32

Author(s):

Cristiana Cruceanu ◽

Amirthagowri Ambalavanan ◽

Dan Spiegelman ◽

Julie Gauthier ◽

Ronald G. Lafrenière ◽

...

Keyword(s):

Bipolar Disorder ◽

Rare Variants ◽

Association Studies ◽

Phenotypic Heterogeneity ◽

Exome Capture ◽

Genome Wide Association Studies ◽

Complex Disorder ◽

Genome Wide ◽

High Throughput Dna Sequencing ◽

Family Based

Bipolar disorder (BD) is a psychiatric condition characterized by the occurrence of at least two episodes of clinically disturbed mood including mania and depression. A vast literature describing BD studies suggests that a strong genetic contribution likely underlies this condition; heritability is estimated to be as high as 80%. Many studies have identified BD susceptibility loci, but because of the genetic and phenotypic heterogeneity observed across individuals, very few loci were subsequently replicated. Research in BD genetics to date has consisted of classical linkage or genome-wide association studies, which have identified candidate genes hypothesized to present common susceptibility variants. Although the observation of such common variants is informative, they can only explain a small fraction of the predicted BD heritability, suggesting a considerable contribution would come from rare and highly penetrant variants. We are seeking to identify such rare variants, and to increase the likelihood of being successful, we aimed to reduce the phenotypic heterogeneity factor by focusing on a well-defined subphenotype of BD: excellent response to lithium monotherapy. Our group has previously shown positive response to lithium therapy clusters in families and has a consistent clinical presentation with minimal comorbidity. To identify such rare variants, we are using a targeted exome capture and high-throughput DNA sequencing approach, and analyzing the entire coding sequences of BD affected individuals from multigenerational families. We are prioritizing rare variants with a frequency of less than 1% in the population that segregate with affected status within each family, as well as being potentially highly penetrant (e.g., protein truncating, missense, or frameshift) or functionally relevant (e.g., 3′UTR, 5′UTR, or splicing). By focusing on rare variants in a familial cohort, we hope to explain a significant portion of the missing heritability in BD, as well as to narrow our current insight on the key biochemical pathways implicated in this complex disorder.

Download Full-text

Genetic Determinants of Paget’s Disease of Bone

Current Osteoporosis Reports ◽

10.1007/s11914-021-00676-w ◽

2021 ◽

Author(s):

Navnit S. Makaram ◽

Stuart H. Ralston

Keyword(s):

Genetic Factors ◽

Association Studies ◽

Paget’S Disease ◽

Paget's Disease ◽

Paget’S Disease Of Bone ◽

Genome Wide Association Studies ◽

Paget's Disease Of Bone ◽

Genome Wide ◽

Family Based

Abstract Purpose of Review To provide an overview of the role of genes and loci that predispose to Paget’s disease of bone and related disorders. Recent Findings Studies over the past ten years have seen major advances in knowledge on the role of genetic factors in Paget’s disease of bone (PDB). Genome wide association studies have identified six loci that predispose to the disease whereas family based studies have identified a further eight genes that cause PDB. This brings the total number of genes and loci implicated in PDB to fourteen. Emerging evidence has shown that a number of these genes also predispose to multisystem proteinopathy syndromes where PDB is accompanied by neurodegeneration and myopathy due to the accumulation of abnormal protein aggregates, emphasising the importance of defects in autophagy in the pathogenesis of PDB. Summary Genetic factors play a key role in the pathogenesis of PDB and the studies in this area have identified several genes previously not suspected to play a role in bone metabolism. Genetic testing coupled to targeted therapeutic intervention is being explored as a way of halting disease progression and improving outcome before irreversible skeletal damage has occurred.

Download Full-text

Impact of Pre and Post Variant Filtration Strategies on Imputation

10.21203/rs.3.rs-128366/v1 ◽

2020 ◽

Author(s):

Celine Charon ◽

Rodrigue Allodji ◽

Vincent Meyer ◽

Jean-François Deleuze

Keyword(s):

Quality Control ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Direct Effects ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Conservative Post

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.

Download Full-text