scholarly journals Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank

2019 ◽  
Author(s):  
Cristopher V. Van Hout ◽  
Ioanna Tachmazidou ◽  
Joshua D. Backman ◽  
Joshua X. Hoffman ◽  
Bin Ye ◽  
...  

SUMMARYThe UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.

Nature ◽  
2020 ◽  
Vol 586 (7831) ◽  
pp. 749-756 ◽  
Author(s):  
Cristopher V. Van Hout ◽  
◽  
Ioanna Tachmazidou ◽  
Joshua D. Backman ◽  
Joshua D. Hoffman ◽  
...  

AbstractThe UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


2020 ◽  
Author(s):  
Quanli Wang ◽  
Ryan S. Dhindsa ◽  
Keren Carss ◽  
Andrew R Harper ◽  
Abhishek Nag ◽  
...  

The UK Biobank (UKB) represents an unprecedented population-based study of 502,543 participants with detailed phenotypic data and linkage to medical records. While the release of genotyping array data for this cohort has bolstered genomic discovery for common variants, the contribution of rare variants to this broad phenotype collection remains relatively unknown. Here, we use exome sequencing data from 177,882 UKB participants to evaluate the association between rare protein-coding variants with 10,533 binary and 1,419 quantitative phenotypes. We performed both a variant-level phenome-wide association study (PheWAS) and a gene-level collapsing analysis-based PheWAS tailored to detecting the aggregate contribution of rare variants. The latter revealed 911 statistically significant gene-phenotype relationships, with a median odds ratio of 15.7 for binary traits. Among the binary trait associations identified using collapsing analysis, 83% were undetectable using single variant association tests, emphasizing the power of collapsing analysis to detect signal in the setting of high allelic heterogeneity. As a whole, these genotype-phenotype associations were significantly enriched for loss-of-function mediated traits and currently approved drug targets. Using these results, we summarise the contribution of rare variants to common diseases in the context of the UKB phenome and provide an example of how novel gene-phenotype associations can aid in therapeutic target prioritisation.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Alex Gyftopoulos ◽  
Yi-Ju Chen ◽  
Libin Wang ◽  
Charles H Williams ◽  
Young Wook Chun ◽  
...  

Introduction: Hypertrophic cardiomyopathy (HCM) is the most commonly inherited cardiac disease affecting 1:500 to 1:200 individuals worldwide. HCM has a heterogeneous genetic profile and phenotypic expression. More than 1400 known pathogenic variants have been identified in 11 sarcomere genes. In about 40% of HCM patients, the genetic cause may not be identified. The same mutation may lead to different phenotypes and severity in different individuals. Identification of novel HCM genes and modifiers will expand our understanding of the signaling pathways that are responsible for phenotypic expression of HCM. Methods: The UK Biobank comprises clinical and genetic data for greater than 500,000 individuals. We used OASIS, an information system for analyzing, searching, and visualizing associations between phenotype and genotype data to analyze this data. We compared control individuals to HCM individuals identified by ICD-10 code (I42.1 and I42.2) in a 20-to-1 fashion. Related individuals and those with confounding diagnoses were excluded. Results: The analysis was performed with Plink’s GLM option, and we identified 84 variants with a minor allele frequency of 0.5% or greater in 65 genes associated with HCM with a p < 1x10 -6 , including 4 with p < 5x10 -8 . The identified genes encode lncRNAs, miRNAs, and membrane proteins. Variants with high significance were identified in the genes encoding putative ciliary components DNAL4 (dynein axonemal light chain 4; p = 2.9x10 -8 ), MYO1D (unconventional myosin 1D; p = 3.1x10 -8 ), ITFAP (intraflagellar transport associated protein; p = 9.5x10 -8 ), CABCOCO1 (ciliary associated calcium biding coiled-coil 1; p = 3.7x 10 -7 ), EVL (Enah-Vasp-like; p = 4.4x 10 -7 ) and IFT122 (intraflagellar transport 122; p = 8.0 x10 -7 ). Conclusion: While none of these have previously associated with HCM, our findings suggest ciliary structure and function may play a role in disease manifestation. Our method is unique by pooling individuals in a large population set to identify potential causative or contributing mutations. Bioinformatic tools, such as OASIS, allow for the identification of previously unrecognized variants that may play a role in the development of HCM. This approach has identified numerous novel genes as possible risk loci.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Julie A. Fitzpatrick ◽  
Nicolas Basty ◽  
Madeleine Cule ◽  
Yi Liu ◽  
Jimmy D. Bell ◽  
...  

AbstractPsoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurement and manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of the psoas and iliacus muscles) using a convolutional neural network. Magnetic resonance images were obtained from the UK Biobank for 5000 participants, balanced for age, gender and BMI. Ninety manual annotations were available for model training and validation. The model showed excellent performance against out-of-sample data (average dice score coefficient of 0.9046 ± 0.0058 for six-fold cross-validation). Iliopsoas muscle volumes were successfully measured in all 5000 participants. Iliopsoas volume was greater in male compared with female subjects. There was a small but significant asymmetry between left and right iliopsoas muscle volumes. We also found that iliopsoas volume was significantly related to height, BMI and age, and that there was an acceleration in muscle volume decrease in men with age. Our method provides a robust technique for measuring iliopsoas muscle volume that can be applied to large cohorts.


2021 ◽  
Author(s):  
Lei Zhang ◽  
Shan-Shan Yan ◽  
Jing-Jing Ni ◽  
Yu-Fang Pei

The large-scale open access whole-exome sequencing (WES) data of the UK Biobank ~200,000 participants is accelerating a new wave of genetic association studies aiming to identify rare and functional loss-of-function (LoF) variants associated with a broad range of complex traits and diseases, however the community is in short of stringent replication of new associations. In this study, we proposed to merge the WES genotypes and the genome-wide genotyping (GWAS) genotypes of 167,000 UKB Caucasian participants into a combined reference panel, and then to impute 241,911 UKB Caucasian participants who had the GWAS genotypes only. We then proposed to use the imputed data to replicate association identified in the discovery WES sample. Using a leave-100-out imputation strategy in the reference panel, we showed that average imputation accuracy measure r2 is modest to high at LoF variants of all minor allele frequency (MAF) intervals including ultra-rare ones: 0.942 at MAF interval [1%, 50%], 0.807 at [0.1%, 1.0%), 0.805 at [0.01%, 0.1%), 0.664 at [0.001%, 0.01%) and 0.410 at (0, 0.001%). As applications, we studied single variant level and gene level associations of LoF variants with estimated heel BMD (eBMD) and 4 lipid traits: high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), triglycerides (TG) and total cholesterol (TC). In addition to replicating dozens of previously reported genes such as MEPE for eBMD and PCSK9 for more than one lipid trait, the results also identified 2 novel gene-level associations: PLIN1 (cumulative MAF=0.10%, discovery BETA=0.38, P=1.20X10-13; replication BETA=0.25, P=1.03X10-6) and ANGPTL3 (cumulative MAF=0.10%, discovery BETA=−0.36, P=4.70X10-11; replication BETA=−0.30, P=6.60X10-11) for HDL-C, as well as one novel single variant level association (11:14843853:C:T, MAF=0.11%, discovery BETA=−0.31, P=2.70X10-9; replication BETA=−0.31, P=8.80X10-14, PDE3B) for TG. Our results highlighted the strength of WES based genotype imputation as well as provided useful imputed data within the UKB cohort.


2020 ◽  
Author(s):  
Sean J. Jurgens ◽  
Seung Hoan Choi ◽  
Valerie N. Morrill ◽  
Mark Chaffin ◽  
James P. Pirruccello ◽  
...  

AbstractBackgroundMany human diseases are known to have a genetic contribution. While genome-wide studies have identified many disease-associated loci, it remains challenging to elucidate causal genes. In contrast, exome sequencing provides an opportunity to identify new disease genes and large-effect variants of clinical relevance. We therefore sought to determine the contribution of rare genetic variation in a curated set of human diseases and traits using a unique resource of 200,000 individuals with exome sequencing data from the UK Biobank.Methods and ResultsWe included 199,832 participants with a mean age of 68 at follow-up. Exome-wide gene-based tests were performed for 64 diseases and 23 quantitative traits using a mixed-effects model, testing rare loss-of-function and damaging missense variants. We identified 51 known and 23 novel associations with 26 diseases and traits at a false-discovery-rate of 1%. There was a striking risk associated with many Mendelian disease genes including: MYPBC3 with over a 100-fold increased odds of hypertrophic cardiomyopathy, PKD1 with a greater than 25-fold increased odds of chronic kidney disease, and BRCA2, BRCA1, ATM and PALB2 with 3 to 10-fold increased odds of breast cancer. Notable novel findings included an association between GIGYF1 and type 2 diabetes (OR 5.6, P=5.35×10−8), elevated blood glucose, and lower insulin-like-growth-factor-1 levels. Rare variants in CCAR2 were also associated with diabetes risk (OR 13, P=8.5×10−8), while COL9A3 was associated with cataract (OR 3.4, P=6.7×10−8). Notable associations for blood lipids and hypercholesterolemia included NR1H3, RRBP1, GIGYF1, SCGN, APH1A, PDE3B and ANGPTL8. A number of novel genes were associated with height, including DTL, PIEZO1, SCUBE3, PAPPA and ADAMTS6, while BSN was associated with body-mass-index. We further assessed putatively pathogenic variants in known Mendelian cardiovascular disease genes and found that between 1.3 and 2.3% of the population carried likely pathogenic variants in known cardiomyopathy, arrhythmia or hypercholesterolemia genes.ConclusionsLarge-scale population sequencing identifies known and novel genes harboring high-impact variation for human traits and diseases. A number of novel findings, including GIGYF1,represent interesting potential therapeutic targets. Exome sequencing at scale can identify a meaningful proportion of the population that carries a pathogenic variant underlying cardiovascular disease.


2021 ◽  
Author(s):  
Iain S. Forrest ◽  
Kumardeep Chaudhary ◽  
Ha My T. Vy ◽  
Shantanu Bafna ◽  
Daniel M. Jordan ◽  
...  

ABSTRACTA major goal of genomic medicine is to quantify the disease risk of genetic variants. Here, we report the penetrance of 37,772 clinically relevant variants (including those reported in ClinVar1 and of loss-of-function consequence) for 197 diseases in an analysis of exome sequence data for 72,434 individuals over five ancestries and six decades of ages from two large-scale population-based biobanks (BioMe Biobank and UK Biobank). With a high-quality set of 5,359 clinically impactful variants, we evaluate disease prevalence in carriers and non-carriers to interrogate major determinants and implications of penetrance. First, we associate biomarker levels with penetrance of variants in known disease-predisposition genes and illustrate their clear biological link to disease. We then systematically uncover large numbers of ClinVar pathogenic variants that confer low risk of disease, even among those reviewed by experts, while delineating stark differences in variant penetrance by molecular consequence. Furthermore, we ascertain numerous variants present in non-European ancestries and reveal how increasing carrier age modifies penetrance estimates. Lastly, we examine substantial heterogeneity of penetrance among variants in known disease-predisposition genes for conditions such as familial hypercholesterolemia and breast cancer. These data indicate that existing categorical systems for variant classification do not adequately capture disease risk and warrant consideration of a more quantitative system based on population-based penetrance to evaluate clinical impact.


BMJ ◽  
2021 ◽  
pp. n214
Author(s):  
Weedon MN ◽  
Jackson L ◽  
Harrison JW ◽  
Ruth KS ◽  
Tyrrell J ◽  
...  

Abstract Objective To determine whether the sensitivity and specificity of SNP chips are adequate for detecting rare pathogenic variants in a clinically unselected population. Design Retrospective, population based diagnostic evaluation. Participants 49 908 people recruited to the UK Biobank with SNP chip and next generation sequencing data, and an additional 21 people who purchased consumer genetic tests and shared their data online via the Personal Genome Project. Main outcome measures Genotyping (that is, identification of the correct DNA base at a specific genomic location) using SNP chips versus sequencing, with results split by frequency of that genotype in the population. Rare pathogenic variants in the BRCA1 and BRCA2 genes were selected as an exemplar for detailed analysis of clinically actionable variants in the UK Biobank, and BRCA related cancers (breast, ovarian, prostate, and pancreatic) were assessed in participants through use of cancer registry data. Results Overall, genotyping using SNP chips performed well compared with sequencing; sensitivity, specificity, positive predictive value, and negative predictive value were all above 99% for 108 574 common variants directly genotyped on the SNP chips and sequenced in the UK Biobank. However, the likelihood of a true positive result decreased dramatically with decreasing variant frequency; for variants that are very rare in the population, with a frequency below 0.001% in UK Biobank, the positive predictive value was very low and only 16% of 4757 heterozygous genotypes from the SNP chips were confirmed with sequencing data. Results were similar for SNP chip data from the Personal Genome Project, and 20/21 individuals analysed had at least one false positive rare pathogenic variant that had been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, which are individually very rare, the overall performance metrics for the SNP chips versus sequencing in the UK Biobank were: sensitivity 34.6%, specificity 98.3%, positive predictive value 4.2%, and negative predictive value 99.9%. Rates of BRCA related cancers in UK Biobank participants with a positive SNP chip result were similar to those for age matched controls (odds ratio 1.31, 95% confidence interval 0.99 to 1.71) because the vast majority of variants were false positives, whereas sequence positive participants had a significantly increased risk (odds ratio 4.05, 2.72 to 6.03). Conclusions SNP chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.


Author(s):  
Emily Breidbart ◽  
Liyong Deng ◽  
Patricia Lanzano ◽  
Xiao Fan ◽  
Jiancheng Guo ◽  
...  

Abstract Objectives There have been few large-scale studies utilizing exome sequencing for genetically undiagnosed maturity onset diabetes of the young (MODY), a monogenic form of diabetes that is under-recognized. We describe a cohort of 160 individuals with suspected monogenic diabetes who were genetically assessed for mutations in genes known to cause MODY. Methods We used a tiered testing approach focusing initially on GCK and HNF1A and then expanding to exome sequencing for those individuals without identified mutations in GCK or HNF1A. The average age of onset of hyperglycemia or diabetes diagnosis was 19 years (median 14 years) with an average HbA1C of 7.1%. Results Sixty (37.5%) probands had heterozygous likely pathogenic/pathogenic variants in one of the MODY genes, 90% of which were in GCK or HNF1A. Less frequently, mutations were identified in PDX1, HNF4A, HNF1B, and KCNJ11. For those probands with available family members, 100% of the variants segregated with diabetes in the family. Cascade genetic testing in families identified 75 additional family members with a familial MODY mutation. Conclusions Our study is one of the largest and most ethnically diverse studies using exome sequencing to assess MODY genes. Tiered testing is an effective strategy to genetically diagnose atypical diabetes, and familial cascade genetic testing identified on average one additional family member with monogenic diabetes for each mutation identified in a proband.


2021 ◽  
Vol 7 (2) ◽  
pp. 105
Author(s):  
Vinodhini Thiyagaraja ◽  
Robert Lücking ◽  
Damien Ertz ◽  
Samantha C. Karunarathna ◽  
Dhanushka N. Wanasinghe ◽  
...  

Ostropales sensu lato is a large group comprising both lichenized and non-lichenized fungi, with several lineages expressing optional lichenization where individuals of the same fungal species exhibit either saprotrophic or lichenized lifestyles depending on the substrate (bark or wood). Greatly variable phenotypic characteristics and large-scale phylogenies have led to frequent changes in the taxonomic circumscription of this order. Ostropales sensu lato is currently split into Graphidales, Gyalectales, Odontotrematales, Ostropales sensu stricto, and Thelenellales. Ostropales sensu stricto is now confined to the family Stictidaceae, which includes a large number of species that are poorly known, since they usually have small fruiting bodies that are rarely collected, and thus, their taxonomy remains partly unresolved. Here, we introduce a new genus Ostropomyces to accommodate a novel lineage related to Ostropa, which is composed of two new species, as well as a new species of Sphaeropezia, S. shangrilaensis. Maximum likelihood and Bayesian inference analyses of mitochondrial small subunit spacers (mtSSU), large subunit nuclear rDNA (LSU), and internal transcribed spacers (ITS) sequence data, together with phenotypic data documented by detailed morphological and anatomical analyses, support the taxonomic affinity of the new taxa in Stictidaceae. Ancestral character state analysis did not resolve the ancestral nutritional status of Stictidaceae with confidence using Bayes traits, but a saprotrophic ancestor was indicated as most likely in a Bayesian binary Markov Chain Monte Carlo sampling (MCMC) approach. Frequent switching in nutritional modes between lineages suggests that lifestyle transition played an important role in the evolution of this family.


Sign in / Sign up

Export Citation Format

Share Document