Unique roles of rare variants in the genetics of complex diseases in humans

AbstractGenome-wide association studies have identified >10,000 genetic variants associated with various phenotypes and diseases. Although the majority are common variants, rare variants with >0.1% of minor allele frequency have been investigated by imputation and using disease-specific custom SNP arrays. Rare variants sequencing analysis mainly revealed have played unique roles in the genetics of complex diseases in humans due to their distinctive features, in contrast to common variants. Unique roles are hypothesis-free evidence for gene causality, a precise target of functional analysis for understanding disease mechanisms, a new favorable target for drug development, and a genetic marker with high disease risk for personalized medicine. As whole-genome sequencing continues to identify more rare variants, the roles associated with rare variants will also increase. However, a better estimation of the functional impact of rare variants across whole genome is needed to enhance their contribution to improvements in human health.

Download Full-text

SMARCA2 common variant association and rare variant excess in Schizophrenia patients from an Algerian Trio Cohort

European Psychiatry ◽

10.1016/s0924-9338(11)73051-6 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 1346-1346

Author(s):

D. Benmessaoud ◽

A.-M. Lepagnol-Bestel ◽

M. Delepine ◽

J. Hager ◽

J.-M. Moalic ◽

...

Keyword(s):

Rare Variants ◽

Association Studies ◽

Common Variant ◽

Genome Wide Association Studies ◽

Common Variants ◽

Fisher Test ◽

Coding Regions ◽

Genome Wide ◽

Whole Exome ◽

Positive Evolution

Genome wide association studies (GWAS) of Schizophrenia (SZ) patients have identified common variants in ten genes including SMARCA2 (Koga et al., HMG, 2009). We found that the SZ-GWAS genes are part of an interacting network centered on SMARCA2 (Loe-Mie et al., HMG, 2010). Furthermore, SMARCA2 was found disrupted in SZ (Walsh et al., Science, 2008). SMARCA2 encodes the ATPase (BRM) of the SWI/SNF chromatin remodeling complex that is at the interface of genome and environmental adaptation.Taking advantage of an Algerian trio cohort of one hundred SZ patients (Benmessaoud et al., BMC Psychiatry, 2008), we replicated the association of SNP rs2296212 localized in exon 33, already shown associated in Koga study and resulting in D1546E amino acid change in the SMARCA2 protein. We studied SMARCA2 codons and found that exon 33 displays a signature of positive evolution in the primate lineage.Our working hypothesis is that the coding regions displaying positive selection are target of novel rare variants. To address this question, we sequenced two exons displaying positive evolution and one exon without evidence of positive evolution.We found (i) that rare variants are significantly in excess in SZ-patients compared to their parents (p = 0.038, Fisher test) and (ii) a higher proportion of rare variants in the primate-accelerated exons compared with the non-evolutionary exon in SZ-patients (p = 0.032, Fisher test).SMARCA2 exon sequencing and whole exome sequencing from patients harboring SNP rs2296212 common variant are under progress. Altogether, these results are expected to give new insights into the genetic architecture of SZ.

Download Full-text

Sequencing of over 100,000 individuals identifies multiple genes and rare variants associated with Crohns disease susceptibility

10.1101/2021.06.15.21258641 ◽

2021 ◽

Author(s):

Aleksejs Sazonovs ◽

Christine R Stevens ◽

Guhan R Venkataraman ◽

Kai Yuan ◽

Brandon Avila ◽

...

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Sequence Data ◽

Association Studies ◽

Genome Wide Association Studies ◽

Crohns Disease ◽

Biological Targets ◽

Genome Wide ◽

Coding Variants ◽

First Time

Genome-wide association studies (GWAS) have identified hundreds of loci associated with Crohns disease (CD); however, as with all complex diseases, deriving pathogenic mechanisms from these non-coding GWAS discoveries has been challenging. To complement GWAS and better define actionable biological targets, we analysed sequence data from more than 30,000 CD cases and 80,000 population controls. We observe rare coding variants in established CD susceptibility genes as well as ten genes where coding variation directly implicates the gene in disease risk for the first time.

Download Full-text

Disease association with frequented regions of genotype graphs

10.1101/2020.09.25.20201640 ◽

2020 ◽

Author(s):

Samuel Hokin ◽

Alan Cleary ◽

Joann Mudge

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Association Studies ◽

Disease Status ◽

Disease Association ◽

Genome Wide Association Studies ◽

Entire Genome ◽

Machine Learning Classification ◽

Complementary Method ◽

Genome Wide

Complex diseases, with many associated genetic and environmental factors, are a challenging target for genomic risk assessment. Genome-wide association studies (GWAS) associate disease status with, and compute risk from, individual common variants, which can be problematic for diseases with many interacting or rare variants. In addition, GWAS typically employ a reference genome which is not built from the subjects of the study, whose genetic background may differ from the reference and whose genetic characterization may be limited. We present a complementary method based on disease association with collections of genotypes, called frequented regions, on a pangenomic graph built from subjects' genomes. We introduce the pangenomic genotype graph, which is better suited than sequence graphs to human disease studies. Our method draws out collections of features, across multiple genomic segments, which are associated with disease status. We show that the frequented regions method consistently improves machine-learning classification of disease status over GWAS classification, allowing incorporation of rare or interacting variants. Notably, genomic segments that have few or no variants of genome-wide significance (p<5x10-8) provide much-improved classification with frequented regions, encouraging their application across the entire genome. Frequented regions may also be utilized for purposes such as choice of treatment in addition to prediction of disease risk.

Download Full-text

Deep genotype imputation captures virtually all heritability of autoimmune vitiligo

Human Molecular Genetics ◽

10.1093/hmg/ddaa005 ◽

2020 ◽

Vol 29 (5) ◽

pp. 859-863 ◽

Cited By ~ 3

Author(s):

Genevieve H L Roberts ◽

Stephanie A Santorico ◽

Richard A Spritz

Keyword(s):

Complex Disease ◽

Rare Variants ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Common Variants ◽

Genome Wide ◽

Autoimmune Vitiligo ◽

Family Based ◽

Project Data

Abstract Autoimmune vitiligo is a complex disease involving polygenic risk from at least 50 loci previously identified by genome-wide association studies. The objectives of this study were to estimate and compare vitiligo heritability in European-derived patients using both family-based and ‘deep imputation’ genotype-based approaches. We estimated family-based heritability (h2FAM) by vitiligo recurrence among a total 8034 first-degree relatives (3776 siblings, 4258 parents or offspring) of 2122 unrelated vitiligo probands. We estimated genotype-based heritability (h2SNP) by deep imputation to Haplotype Reference Consortium and the 1000 Genomes Project data in unrelated 2812 vitiligo cases and 37 079 controls genotyped genome wide, achieving high-quality imputation from markers with minor allele frequency (MAF) as low as 0.0001. Heritability estimated by both approaches was exceedingly high; h2FAM = 0.75–0.83 and h2SNP = 0.78. These estimates are statistically identical, indicating there is essentially no remaining ‘missing heritability’ for vitiligo. Overall, ~70% of h2SNP is represented by common variants (MAF > 0.01) and 30% by rare variants. These results demonstrate that essentially all vitiligo heritable risk is captured by array-based genotyping and deep imputation. These findings suggest that vitiligo may provide a particularly tractable model for investigation of complex disease genetic architecture and predictive aspects of personalized medicine.

Download Full-text

Detecting association of rare and common variants by adaptive combination of P-values

Genetics Research ◽

10.1017/s0016672315000208 ◽

2015 ◽

Vol 97 ◽

Cited By ~ 2

Author(s):

YAJING ZHOU ◽

YONG WANG

Keyword(s):

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Common Variants ◽

Next Generation Sequencing Technology ◽

Adaptive Combination ◽

Genome Wide ◽

Wide Range ◽

Causal Variants ◽

Burden Tests

SummaryGenome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.

Download Full-text

Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits

Human Molecular Genetics ◽

10.1093/hmg/ddz314 ◽

2020 ◽

Vol 29 (3) ◽

pp. 515-526 ◽

Cited By ~ 4

Author(s):

Tianzhong Yang ◽

Chong Wu ◽

Peng Wei ◽

Wei Pan

Keyword(s):

Gene Expression ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Low Frequency ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Common Variants ◽

Transcriptomic Data ◽

Genome Wide

Abstract Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene–trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.

Download Full-text

Rare ABCA7 variants in 2 German families with Alzheimer disease

Neurology Genetics ◽

10.1212/nxg.0000000000000224 ◽

2018 ◽

Vol 4 (2) ◽

pp. e224 ◽

Cited By ~ 4

Author(s):

Patrick May ◽

Sabrina Pichler ◽

Daniela Hartl ◽

Dheeraj R. Bobbili ◽

Manuel Mayhaus ◽

...

Keyword(s):

Alzheimer Disease ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variants ◽

Late Onset ◽

Association Studies ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Pathogenic Variants ◽

Genome Wide

ObjectiveThe aim of this study was to identify variants associated with familial late-onset Alzheimer disease (AD) using whole-genome sequencing.MethodsSeveral families with an autosomal dominant inheritance pattern of AD were analyzed by whole-genome sequencing. Variants were prioritized for rare, likely pathogenic variants in genes already known to be associated with AD and confirmed by Sanger sequencing using standard protocols.ResultsWe identified 2 rare ABCA7 variants (rs143718918 and rs538591288) with varying penetrance in 2 independent German AD families, respectively. The single nucleotide variant (SNV) rs143718918 causes a missense mutation, and the deletion rs538591288 causes a frameshift mutation of ABCA7. Both variants have previously been reported in larger cohorts but with incomplete segregation information. ABCA7 is one of more than 20 AD risk loci that have so far been identified by genome-wide association studies, and both common and rare variants of ABCA7 have previously been described in different populations with higher frequencies in AD cases than in controls and varying penetrance. Furthermore, ABCA7 is known to be involved in several AD-relevant pathways.ConclusionsWe conclude that both SNVs might contribute to the development of AD in the examined family members. Together with previous findings, our data confirm ABCA7 as one of the most relevant AD risk genes.

Download Full-text

The multiple testing burden in sequencing-based disease studies of global populations

10.1101/053264 ◽

2016 ◽

Cited By ~ 1

Author(s):

Sara L. Pulit ◽

Sera A.J. de With ◽

Paul I.W. de Bakker

Keyword(s):

Disease Risk ◽

Association Studies ◽

Statistical Tests ◽

Whole Genome Sequence ◽

Common Disease ◽

Genome Wide Association Studies ◽

Sequencing Analysis ◽

Genome Wide ◽

A Genome ◽

Genome Wide Significance

AbstractGenome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. Currently, studies of common disease are rapidly shifting towards the use of sequencing technologies. As the cost of sequencing drops, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples should practically employ a genome-wide significance threshold of of p <5 ×10−9, though the threshold does vary with ancestry. Studies of European or East Asian ancestry should set genome-wide significance at approximately p <5×10−9, but similar studies of African or South Asian samples should be more stringent (p <1×10−9). Because sequencing analysis brings with it many challenges (especially for rare variants), appropriate adoption of a revised multiple test correction will be crucial to avoid irreproducible claims of association.

Download Full-text

Rare risk variants associate with epigenetic dysregulation in migraine

10.1101/2021.12.20.21268001 ◽

2021 ◽

Author(s):

Tanya Ramdal Techlo ◽

Mona Ameri Chalmer ◽

Peter Loof Møller ◽

Lisette Johanna Antonia Kogelman ◽

Isa Amalie Olofsson ◽

...

Keyword(s):

Large Scale ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Risk Variant ◽

Polycomb Response Element ◽

Risk Variants ◽

Genome Wide ◽

Independent Case

Migraine has a heritability of up to 65%. Genome-wide association studies (GWAS) on migraine have identified 123 risk loci, explaining only 10.6% of migraine heritability. Thus, there is a considerable genetic component not identified with GWAS. Further, the causality of the identified risk loci remains inconclusive. Rare variants contribute to the risk of migraine but GWAS are often underpowered to detect these. Whole genome sequencing is reliable for analyzing rare variants but is not frequently used in large-scale. We assessed if rare variants in the migraine risk loci associated with migraine. We used a large cohort of whole genome sequenced migraine patients (1,040 individuals from 155 families). The findings were replicated in an independent case-control cohort (2,027 migraine patients, 1,650 controls). We found rare variants (minor allele frequency<0.1%) associated with migraine in a Polycomb Response Element in the ASTN2 locus. The association was independent of the GWAS lead risk variant in the locus. The findings place rare variants as risk factors for migraine. We propose a biological mechanism by which epigenetic regulation by Polycomb Response Elements plays a crucial role in migraine etiology.

Download Full-text

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Download Full-text