scholarly journals Genome-wide rare variant analysis for thousands of phenotypes in 54,000 exomes

2019 ◽  
Author(s):  
Elizabeth T. Cirulli ◽  
Simon White ◽  
Robert W. Read ◽  
Gai Elhanan ◽  
William J Metcalf ◽  
...  

Defining the effects that rare variants can have on human phenotypes is essential to advancing our understanding of human health and disease. Large-scale human genetic analyses have thus far focused on common variants, but the development of large cohorts of deeply phenotyped individuals with exome sequence data has now made comprehensive analyses of rare variants possible. We analyzed the effects of rare (MAF<0.1%) variants on 3,166 phenotypes in 40,468 exome-sequenced individuals from the UK Biobank and performed replication as well as meta-analyses with 1,067 phenotypes in 13,470 members of the Healthy Nevada Project (HNP) cohort who underwent Exome+ sequencing at Helix. Our analyses of non-benign coding and loss of function (LoF) variants identified 78 gene-based associations that passed our statistical significance threshold (p<5×10-9). These are associations in which carrying any rare coding or LoF variant in the gene is associated with an enrichment for a specific phenotype, as opposed to GWAS-based associations of strictly single variants. Importantly, our results do not suffer from the test statistic inflation that is often seen with rare variant analyses of biobank-scale data because of our rare variant-tailored methodology, which includes a step that optimizes the carrier frequency threshold for each phenotype based on prevalence. Of the 47 discovery associations whose phenotypes were represented in the replication cohort, 98% showed effects in the expected direction, and 45% attained formal replication significance (p<0.001). Six additional significant associations were identified in our meta-analysis of both cohorts. Among the results, we confirm known associations of PCSK9 and APOB variation with LDL levels; we extend knowledge of variation in the TYRP1 gene, previously associated with blonde hair color only in Solomon Islanders to blonde hair color in individuals of European ancestry; we show that PAPPA, a gene in which common variants had previously associated with height via GWAS, contains rare variants that decrease height; and we make the novel discovery that STAB1 variation is associated with blood flow in the brain. Our results are available for download and interactive browsing in an app (https://ukb.research.helix.com). This comprehensive analysis of the effects of rare variants on human phenotypes marks one of the first steps in the next big phase of human genetics, where large, deeply phenotyped cohorts with next generation sequence data will elucidate the effects of rare variants.

Author(s):  
Seung Hoan Choi ◽  
Sean J. Jurgens ◽  
Christopher M. Haggerty ◽  
Amelia W. Hall ◽  
Jennifer L. Halford ◽  
...  

Background - Alterations in electrocardiographic (ECG) intervals are well-known markers for arrhythmia and sudden cardiac death (SCD) risk. While the genetics of arrhythmia syndromes have been studied, relations between ECG intervals and rare genetic variation at a population level are poorly understood. Methods - Using a discovery sample of 29,000 individuals with whole-genome sequencing from TOPMed and replication in nearly 100,000 with whole-exome sequencing from the UK Biobank and MyCode, we examined associations between low-frequency and rare coding variants with 5 routinely measured ECG traits (RR, P-wave, PR, and QRS intervals and corrected QT interval [QTc]). Results - We found that rare variants associated with population-based ECG intervals identify established monogenic SCD genes ( KCNQ1 , KCNH2 , SCN5A ), a controversial monogenic SCD gene ( KCNE1 ), and novel genes ( PAM , MFGE8 ) involved in cardiac conduction. Loss-of-function and pathogenic SCN5A variants, carried by 0.1% of individuals, were associated with a nearly 6-fold increased odds of first-degree atrioventricular block ( P =8.4x10 -5 ). Similar variants in KCNQ1 and KCNH2 (0.2% of individuals) were associated with a 23-fold increased odds of marked QTc prolongation ( P =4x10 -25 ), a marker of SCD risk. Incomplete penetrance of such deleterious variation was common as over 70% of carriers had normal ECG intervals. Conclusions - Our findings indicate that large-scale high-depth sequence data and ECG analysis identifies monogenic arrhythmia susceptibility genes and rare variants with large effects. Known pathogenic variation in conventional arrhythmia and SCD genes exhibited incomplete penetrance and accounted for only a small fraction of marked ECG interval prolongation.


2020 ◽  
Author(s):  
Amol C. Shetty ◽  
Jeffrey O’Connell ◽  
Braxton D. Mitchell ◽  
Timothy D. O’Connor ◽  
◽  
...  

AbstractMotivationThe global human population has experienced an explosive growth from a few million to roughly 7 billion people in the last 10,000 years. Accompanying this growth has been the accumulation of rare variants that can inform our understanding of human evolutionary history. Common variants have primarily been used to infer the structure of the human population and relatedness between two individuals. However, with the increasing abundance of rare variants observed in large-scale projects, such as Trans-Omics for Precision Medicine (TOPMed), the use of rare variants to decipher cryptic relatedness and fine-scale population structure can be beneficial to the study of population demographics and association studies. Identity-by-descent (IBD) is an important framework used for identifying these relationships. IBD segments are broken down by recombination over time, such that longer shared haplotypes give strong evidence of recent relatedness while shorter shared haplotypes are indicative of more distant relationships. Current methods to identify IBD accurately detect only long segments (> 2cM) found in related individuals.AlgorithmWe describe a metric that leverages rare-variants shared between individuals to improve the detection of short IBD segments. We computed IBD segments using existing methods implemented in Refined IBD where we enrich the signal using our metric that facilitates the detection of short IBD segments (<2cM) by explicitly incorporating rare variants.ResultsTo test our new metric, we simulated datasets involving populations with varying divergent time-scales. We show that rare-variant IBD identifies shorter segments with greater confidence and enables the detection of older divergence between populations. As an example, we applied our metric to the Old-Order Amish cohort with known genealogies dating 14 generations back to validate its ability to detect genetic relatedness between distant relatives. This analysis shows that our method increases the accuracy of identifying shorter segments that in turn capture distant relationships.ConclusionsWe describe a method to enrich the detection of short IBD segments using rare-variant sharing within IBD segments. Leveraging rare-variant sharing improves the information content of short IBD segments better than common variants alone. We validated the method in both simulated and empirical datasets. This method can benefit association analyses, IBD mapping analyses, and demographic inferences.


Author(s):  
Doris Škorić-Milosavljević ◽  
Najim Lahrouchi ◽  
Fernanda M. Bosada ◽  
Gregor Dombrowsky ◽  
Simon G. Williams ◽  
...  

Abstract Purpose Rare genetic variants in KDR, encoding the vascular endothelial growth factor receptor 2 (VEGFR2), have been reported in patients with tetralogy of Fallot (TOF). However, their role in disease causality and pathogenesis remains unclear. Methods We conducted exome sequencing in a familial case of TOF and large-scale genetic studies, including burden testing, in >1,500 patients with TOF. We studied gene-targeted mice and conducted cell-based assays to explore the role of KDR genetic variation in the etiology of TOF. Results Exome sequencing in a family with two siblings affected by TOF revealed biallelic missense variants in KDR. Studies in knock-in mice and in HEK 293T cells identified embryonic lethality for one variant when occurring in the homozygous state, and a significantly reduced VEGFR2 phosphorylation for both variants. Rare variant burden analysis conducted in a set of 1,569 patients of European descent with TOF identified a 46-fold enrichment of protein-truncating variants (PTVs) in TOF cases compared to controls (P = 7 × 10-11). Conclusion Rare KDR variants, in particular PTVs, strongly associate with TOF, likely in the setting of different inheritance patterns. Supported by genetic and in vivo and in vitro functional analysis, we propose loss-of-function of VEGFR2 as one of the mechanisms involved in the pathogenesis of TOF.


2018 ◽  
Author(s):  
Ridge Dershem ◽  
Raghu P.R. Metpally ◽  
Kirk Jeffreys ◽  
Sarathbabu Krishnamurthy ◽  
Diane T. Smelser ◽  
...  

AbstractMany G protein-coupled receptors (GPCRs) lack common variants that lead to reproducible genome-wide disease associations. Here we used rare variant approaches to assess the disease associations of 85 orphan or understudied GPCRs in an unselected cohort of 51,289 individuals. Rare loss-of-function variants, missense variants predicted to be pathogenic or likely pathogenic, and a subset of rare synonymous variants were used as independent data sets for sequence kernel association testing (SKAT). Strong, phenome-wide disease associations shared by two or more variant categories were found for 39% of the GPCRs. Validating the bioinformatics and SKAT analyses, functional characterization of rare missense and synonymous variants of GPR39, a Family A GPCR, showed altered expression and/or Zn2+-mediated signaling for members of both variant classes. Results support the utility of rare variant analyses for identifying disease associations for genes that lack common variants, while also highlighting the functional importance of rare synonymous variants.Author summaryRare variant approaches have emerged as a viable way to identify disease associations for genes without clinically important common variants. Rare synonymous variants are generally considered benign. We demonstrate that rare synonymous variants represent a potentially important dataset for deriving disease associations, here applied to analysis of a set of orphan or understudied GPCRs. Synonymous variants yielded disease associations in common with loss-of-function or missense variants in the same gene. We rationalize their associations with disease by confirming their impact on expression and agonist activation of a representative example, GPR39. This study highlights the importance of rare synonymous variants in human physiology, and argues for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.


2017 ◽  
Vol 37 (suppl_1) ◽  
Author(s):  
Jacqueline S Dron ◽  
Jian Wang ◽  
Cécile Low-Kam ◽  
Sumeet A Khetarpal ◽  
John F Robinson ◽  
...  

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.


2020 ◽  
Vol 31 (2) ◽  
pp. 365-373 ◽  
Author(s):  
Adam P. Levine ◽  
Melanie M.Y. Chan ◽  
Omid Sadeghi-Alavijeh ◽  
Edwin K.S. Wong ◽  
H. Terence Cook ◽  
...  

BackgroundPrimary membranoproliferative GN, including complement 3 (C3) glomerulopathy, is a rare, untreatable kidney disease characterized by glomerular complement deposition. Complement gene mutations can cause familial C3 glomerulopathy, and studies have reported rare variants in complement genes in nonfamilial primary membranoproliferative GN.MethodsWe analyzed whole-genome sequence data from 165 primary membranoproliferative GN cases and 10,250 individuals without the condition (controls) as part of the National Institutes of Health Research BioResource–Rare Diseases Study. We examined copy number, rare, and common variants.ResultsOur analysis included 146 primary membranoproliferative GN cases and 6442 controls who were unrelated and of European ancestry. We observed no significant enrichment of rare variants in candidate genes (genes encoding components of the complement alternative pathway and other genes associated with the related disease atypical hemolytic uremic syndrome; 6.8% in cases versus 5.9% in controls) or exome-wide. However, a significant common variant locus was identified at 6p21.32 (rs35406322) (P=3.29×10−8; odds ratio [OR], 1.93; 95% confidence interval [95% CI], 1.53 to 2.44), overlapping the HLA locus. Imputation of HLA types mapped this signal to a haplotype incorporating DQA1*05:01, DQB1*02:01, and DRB1*03:01 (P=1.21×10−8; OR, 2.19; 95% CI, 1.66 to 2.89). This finding was replicated by analysis of HLA serotypes in 338 individuals with membranoproliferative GN and 15,614 individuals with nonimmune renal failure.ConclusionsWe found that HLA type, but not rare complement gene variation, is associated with primary membranoproliferative GN. These findings challenge the paradigm of complement gene mutations typically causing primary membranoproliferative GN and implicate an underlying autoimmune mechanism in most cases.


Biostatistics ◽  
2019 ◽  
Author(s):  
Jingchunzi Shi ◽  
Michael Boehnke ◽  
Seunggeun Lee

Summary Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.


2016 ◽  
Author(s):  
Antonio F Pardiñas ◽  
Peter Holmans ◽  
Andrew J Pocklington ◽  
Valentina Escott-Price ◽  
Stephan Ripke ◽  
...  

Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide such insight. We report the largest single cohort genome-wide association study of schizophrenia (11,260 cases and 24,542 controls) and through meta-analysis with existing data we identify 50 novel GWAS loci. Using gene-wide association statistics we implicate an additional set of 22 novel associations that map onto a single gene. We show for the first time that the common variant association signal is highly enriched among genes that are intolerant to loss of function mutations and that variants in these genes persist in the population despite the low fecundity associated with the disorder through the process of background selection. Associations point to novel areas of biology (e.g. metabotropic GABA-B signalling and acetyl cholinesterase), reinforce those implicated in earlier GWAS studies (e.g. calcium channel function), converge with earlier rare variants studies (e.g. NRXN1, GABAergic signalling), identify novel overlaps with autism (e.g. RBFOX1, FOXP1, FOXG1), and support early controversial candidate gene hypotheses (e.g. ERBB4 implicating neuregulin signalling). We also demonstrate the involvement of six independent central nervous system functional gene sets in schizophrenia pathophysiology. These findings provide novel insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation intolerant genes and suggest a mechanism by which common risk variants are maintained in the population.


2020 ◽  
Author(s):  
David Curtis

Background Depression is moderately heritable but there is no common genetic variant which has a major effect on susceptibility. It is possible that some very rare variants could have substantial effect sizes and these could be identified from exome sequence data. Methods Data from 50,000 exome-sequenced UK Biobank participants was analysed. Subjects were treated as cases if they had reported having seen a psychiatrist for "nerves, anxiety, tension or depression". Gene-wise weighted burden analysis was performed to see if there were any genes or sets of genes for which there was an excess of rare, functional variants in cases. Results There were 5,872 cases and 43,862 controls. There were 22,028 informative genes but none produced a statistically significant result after correction for multiple testing. Of the 25 genes individually significant at p<0.001 none appeared to be a biologically plausible candidate. No set of genes achieved statistical significance after correction for multiple testing and those with the lowest p values again did not appear to be biologically plausible candidates. Limitations The phenotype is based on self-report and the cases are likely to somewhat heterogeneous. The number of cases is on the low side for a study of exome sequence data. Conclusions The results conform exactly with the expectation under the null hypothesis. It seems unlikely that depression genetics research will produce findings that might have a substantial clinical impact until far larger samples become available.


2021 ◽  
Author(s):  
Tony Zeng ◽  
Yang I Li

Recent progress in deep learning approaches have greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues that has been trained on RNA splicing and sequence data from four species. Pangolin outperforms state of the art methods for predicting RNA splicing on a variety of prediction tasks. We use Pangolin to study the impact of genetic variants on RNA splicing, including lineage-specific variants and rare variants of uncertain significance. Pangolin predicts loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense (AUPRC = 0.93), demonstrating remarkable potential for identifying pathogenic variants.


Sign in / Sign up

Export Citation Format

Share Document