Rare Genetic Variation Underlying Human Diseases and Traits: Results from 200,000 Individuals in the UK Biobank

AbstractBackgroundMany human diseases are known to have a genetic contribution. While genome-wide studies have identified many disease-associated loci, it remains challenging to elucidate causal genes. In contrast, exome sequencing provides an opportunity to identify new disease genes and large-effect variants of clinical relevance. We therefore sought to determine the contribution of rare genetic variation in a curated set of human diseases and traits using a unique resource of 200,000 individuals with exome sequencing data from the UK Biobank.Methods and ResultsWe included 199,832 participants with a mean age of 68 at follow-up. Exome-wide gene-based tests were performed for 64 diseases and 23 quantitative traits using a mixed-effects model, testing rare loss-of-function and damaging missense variants. We identified 51 known and 23 novel associations with 26 diseases and traits at a false-discovery-rate of 1%. There was a striking risk associated with many Mendelian disease genes including: MYPBC3 with over a 100-fold increased odds of hypertrophic cardiomyopathy, PKD1 with a greater than 25-fold increased odds of chronic kidney disease, and BRCA2, BRCA1, ATM and PALB2 with 3 to 10-fold increased odds of breast cancer. Notable novel findings included an association between GIGYF1 and type 2 diabetes (OR 5.6, P=5.35×10−8), elevated blood glucose, and lower insulin-like-growth-factor-1 levels. Rare variants in CCAR2 were also associated with diabetes risk (OR 13, P=8.5×10−8), while COL9A3 was associated with cataract (OR 3.4, P=6.7×10−8). Notable associations for blood lipids and hypercholesterolemia included NR1H3, RRBP1, GIGYF1, SCGN, APH1A, PDE3B and ANGPTL8. A number of novel genes were associated with height, including DTL, PIEZO1, SCUBE3, PAPPA and ADAMTS6, while BSN was associated with body-mass-index. We further assessed putatively pathogenic variants in known Mendelian cardiovascular disease genes and found that between 1.3 and 2.3% of the population carried likely pathogenic variants in known cardiomyopathy, arrhythmia or hypercholesterolemia genes.ConclusionsLarge-scale population sequencing identifies known and novel genes harboring high-impact variation for human traits and diseases. A number of novel findings, including GIGYF1,represent interesting potential therapeutic targets. Exome sequencing at scale can identify a meaningful proportion of the population that carries a pathogenic variant underlying cardiovascular disease.

Download Full-text

Resilience to dominant genetic disease in the healthy elderly

10.1101/19006932 ◽

2019 ◽

Author(s):

Paul Lacaze ◽

Robert Sebra ◽

Moeen Riaz ◽

Jane Tiller ◽

Jerico Revote ◽

...

Keyword(s):

Genetic Disease ◽

The Elderly ◽

Prior History ◽

Disease Genes ◽

Uk Biobank ◽

Healthy Elderly ◽

Pathogenic Variants ◽

High Genetic Risk ◽

Dominant Disease ◽

The Uk

ABSTRACTHere we describe genomic screening of the healthy elderly to identify those resilient to adult-onset genetic disease, despite being at exceptionally high genetic risk. We sequenced 13,131 individuals aged 70 or older (mean age 75 years) from the ASPirin in Reducing Events in the Elderly (ASPREE) trial. Participants had no prior history of cardiovascular disease, life-threatening cancer, persistent physical disability or dementia. We compared the prevalence of pathogenic variants in medically actionable autosomal dominant disease genes with that from the UK Biobank population, and assessed their clinical impact using personal medical history and adjudicated study outcomes during 4.5 years of follow-up. The frequency of pathogenic variants was less than reported among the younger UK Biobank population, suggesting these variants confer a survival disadvantage during the middle years of life. Yet we identified 141 individuals with pathogenic variants free of any associated disease up to average age 79.5 years. Further study of these elderly resilient individuals might help uncover genetic mechanisms that protect against the development of disease.

Download Full-text

Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank

10.1101/572347 ◽

2019 ◽

Cited By ~ 50

Author(s):

Cristopher V. Van Hout ◽

Ioanna Tachmazidou ◽

Joshua D. Backman ◽

Joshua X. Hoffman ◽

Bin Ye ◽

...

Keyword(s):

Exome Sequencing ◽

Large Scale ◽

Sequence Data ◽

Varicose Veins ◽

Large Population ◽

Uk Biobank ◽

Loss Of Function ◽

Phenotypic Data ◽

Pathogenic Variants ◽

The Uk

SUMMARYThe UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.

Download Full-text

Use of SNP chips to detect rare pathogenic variants: retrospective, population based diagnostic evaluation

BMJ ◽

10.1136/bmj.n214 ◽

2021 ◽

pp. n214

Author(s):

Weedon MN ◽

Jackson L ◽

Harrison JW ◽

Ruth KS ◽

Tyrrell J ◽

...

Keyword(s):

Positive Predictive Value ◽

Predictive Value ◽

Population Based ◽

Genome Project ◽

Personal Genome ◽

Uk Biobank ◽

Sequencing Data ◽

Snp Chip ◽

Pathogenic Variants ◽

The Uk

Abstract Objective To determine whether the sensitivity and specificity of SNP chips are adequate for detecting rare pathogenic variants in a clinically unselected population. Design Retrospective, population based diagnostic evaluation. Participants 49 908 people recruited to the UK Biobank with SNP chip and next generation sequencing data, and an additional 21 people who purchased consumer genetic tests and shared their data online via the Personal Genome Project. Main outcome measures Genotyping (that is, identification of the correct DNA base at a specific genomic location) using SNP chips versus sequencing, with results split by frequency of that genotype in the population. Rare pathogenic variants in the BRCA1 and BRCA2 genes were selected as an exemplar for detailed analysis of clinically actionable variants in the UK Biobank, and BRCA related cancers (breast, ovarian, prostate, and pancreatic) were assessed in participants through use of cancer registry data. Results Overall, genotyping using SNP chips performed well compared with sequencing; sensitivity, specificity, positive predictive value, and negative predictive value were all above 99% for 108 574 common variants directly genotyped on the SNP chips and sequenced in the UK Biobank. However, the likelihood of a true positive result decreased dramatically with decreasing variant frequency; for variants that are very rare in the population, with a frequency below 0.001% in UK Biobank, the positive predictive value was very low and only 16% of 4757 heterozygous genotypes from the SNP chips were confirmed with sequencing data. Results were similar for SNP chip data from the Personal Genome Project, and 20/21 individuals analysed had at least one false positive rare pathogenic variant that had been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, which are individually very rare, the overall performance metrics for the SNP chips versus sequencing in the UK Biobank were: sensitivity 34.6%, specificity 98.3%, positive predictive value 4.2%, and negative predictive value 99.9%. Rates of BRCA related cancers in UK Biobank participants with a positive SNP chip result were similar to those for age matched controls (odds ratio 1.31, 95% confidence interval 0.99 to 1.71) because the vast majority of variants were false positives, whereas sequence positive participants had a significantly increased risk (odds ratio 4.05, 2.72 to 6.03). Conclusions SNP chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.

Download Full-text

COMPARISON OF ATHEROSCLEROTIC CARDIOVASCULAR DISEASE RISK PREDICTION BY LIPOPROTEIN(A) LEVELS BETWEEN PERSONS WITH AND WITHOUT PRIOR CARDIOVASCULAR DISEASE: THE UK BIOBANK

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(21)02842-4 ◽

2021 ◽

Vol 77 (18) ◽

pp. 1484

Author(s):

Nathan D. Wong ◽

Yanglu Zhao ◽

Ailin Barseghian El-Farra ◽

Michael Wilkinson

Keyword(s):

Cardiovascular Disease ◽

Risk Prediction ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Atherosclerotic Cardiovascular Disease ◽

Uk Biobank ◽

Lipoprotein A ◽

Atherosclerotic Cardiovascular Disease Risk ◽

The Uk

Download Full-text

Advanced cardiometabolic & inflammatory markers for prediction of cardiovascular disease and cancer

European Heart Journal ◽

10.1093/ehjci/ehaa946.2921 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

D Radenkovic ◽

S.C Chawla ◽

G Botta ◽

A Boli ◽

M.B Banach ◽

...

Keyword(s):

Cardiovascular Disease ◽

Risk Score ◽

Cancer Incidence ◽

Risk Function ◽

Patient Characteristics ◽

Risk Scores ◽

Uk Biobank ◽

Training Set ◽

All Cause Mortality ◽

The Uk

Abstract The two leading causes of mortality worldwide are cardiovascular disease (CVD) and cancer. The annual total cost of CVD and cancer is an estimated $844.4 billion in the US and is projected to double by 2030. Thus, there has been an increased shift to preventive medicine to improve health outcomes and development of risk scores, which allow early identification of individuals at risk to target personalised interventions and prevent disease. Our aim was to define a Risk Score R(x) which, given the baseline characteristics of a given individual, outputs the relative risk for composite CVD, cancer incidence and all-cause mortality. A non-linear model was used to calculate risk scores based on the participants of the UK Biobank (= 502548). The model used parameters including patient characteristics (age, sex, ethnicity), baseline conditions, lifestyle factors of diet and physical activity, blood pressure, metabolic markers and advanced lipid variables, including ApoA and ApoB and lipoprotein(a), as input. The risk score was defined by normalising the risk function by a fixed value, the average risk of the training set. To fit the non-linear model >400,000 participants were used as training set and >45,000 participants were used as test set for validation. The exponent of risk function was represented as a multilayer neural network. This allowed capturing interdependent behaviour of covariates, training a single model for all outcomes, and preserving heterogeneity of the groups, which is in contrast to CoxPH models which are traditionally used in risk scores and require homogeneous groups. The model was trained over 60 epochs and predictive performance was determined by the C-index with standard errors and confidence intervals estimated with bootstrap sampling. By inputing the variables described, one can obtain personalised hazard ratios for 3 major outcomes of CVD, cancer and all-cause mortality. Therefore, an individual with a risk Score of e.g. 1.5, at any time he/she has 50% more chances than average of experiencing the corresponding event. The proposed model showed the following discrimination, for risk of CVD (C-index = 0.8006), cancer incidence (C-index = 0.6907), and all-cause mortality (C-index = 0.7770) on the validation set. The CVD model is particularly strong (C-index >0.8) and is an improvement on a previous CVD risk prediction model also based on classical risk factors with total cholesterol and HDL-c on the UK Biobank data (C-index = 0.7444) published last year (Welsh et al. 2019). Unlike classically-used CoxPH models, our model considers correlation of variables as shown by the table of the values of correlation in Figure 1. This is an accurate model that is based on the most comprehensive set of patient characteristics and biomarkers, allowing clinicians to identify multiple targets for improvement and practice active preventive cardiology in the era of precision medicine. Figure 1. Correlation of variables in the R(x) Funding Acknowledgement Type of funding source: None

Download Full-text

Independent and joint associations of grip strength and adiposity with all-cause and cardiovascular disease mortality in 403,199 adults: the UK Biobank study

American Journal of Clinical Nutrition ◽

10.3945/ajcn.117.156851 ◽

2017 ◽

pp. ajcn156851 ◽

Cited By ~ 11

Author(s):

Youngwon Kim ◽

Katrien Wijndaele ◽

Duck-chul Lee ◽

Stephen J Sharp ◽

Nick Wareham ◽

...

Keyword(s):

Cardiovascular Disease ◽

Grip Strength ◽

Uk Biobank ◽

Cardiovascular Disease Mortality ◽

Disease Mortality ◽

The Uk ◽

Joint Associations

Download Full-text

Coffee Consumption and Cardiovascular Diseases: A Mendelian Randomization Study

Nutrients ◽

10.3390/nu13072218 ◽

2021 ◽

Vol 13 (7) ◽

pp. 2218

Author(s):

Shuai Yuan ◽

Paul Carter ◽

Amy M. Mason ◽

Stephen Burgess ◽

Susanna C. Larsson

Keyword(s):

Cardiovascular Disease ◽

Cardiovascular Diseases ◽

Intracerebral Hemorrhage ◽

Genetic Variants ◽

Odds Ratio ◽

Observational Studies ◽

Mendelian Randomization ◽

Coffee Consumption ◽

Uk Biobank ◽

The Uk

Coffee consumption has been linked to a lower risk of cardiovascular disease in observational studies, but whether the associations are causal is not known. We conducted a Mendelian randomization investigation to assess the potential causal role of coffee consumption in cardiovascular disease. Twelve independent genetic variants were used to proxy coffee consumption. Summary-level data for the relations between the 12 genetic variants and cardiovascular diseases were taken from the UK Biobank with up to 35,979 cases and the FinnGen consortium with up to 17,325 cases. Genetic predisposition to higher coffee consumption was not associated with any of the 15 studied cardiovascular outcomes in univariable MR analysis. The odds ratio per 50% increase in genetically predicted coffee consumption ranged from 0.97 (95% confidence interval (CI), 0.63, 1.50) for intracerebral hemorrhage to 1.26 (95% CI, 1.00, 1.58) for deep vein thrombosis in the UK Biobank and from 0.86 (95% CI, 0.50, 1.49) for subarachnoid hemorrhage to 1.34 (95% CI, 0.81, 2.22) for intracerebral hemorrhage in FinnGen. The null findings remained in multivariable Mendelian randomization analyses adjusted for genetically predicted body mass index and smoking initiation, except for a suggestive positive association for intracerebral hemorrhage (odds ratio 1.91; 95% CI, 1.03, 3.54) in FinnGen. This Mendelian randomization study showed limited evidence that coffee consumption affects the risk of developing cardiovascular disease, suggesting that previous observational studies may have been confounded.

Download Full-text

Association of Lichen Planus with Cardiovascular Disease: A Combined Analysis in the UK Biobank and All of Us Study

Journal of the American Academy of Dermatology ◽

10.1016/j.jaad.2021.09.030 ◽

2021 ◽

Author(s):

Audrey C. Leasure ◽

Julian N. Acosta ◽

Lauren H. Sansing ◽

Kevin N. Sheth ◽

Jeffrey M. Cohen ◽

...

Keyword(s):

Cardiovascular Disease ◽

Lichen Planus ◽

Uk Biobank ◽

Combined Analysis ◽

The Uk

Download Full-text

Caffeinated Coffee and Tea Consumption, Genetic Variation and Cognitive Function in the UK Biobank

Journal of Nutrition ◽

10.1093/jn/nxaa147 ◽

2020 ◽

Vol 150 (8) ◽

pp. 2164-2174

Author(s):

Marilyn C Cornelis ◽

Sandra Weintraub ◽

Martha Clare Morris

Keyword(s):

Genetic Variation ◽

Cognitive Function ◽

Poor Performance ◽

Sociodemographic Factors ◽

Caffeine Intake ◽

Tea Consumption ◽

Uk Biobank ◽

Trail Making ◽

Caffeine Metabolism ◽

The Uk

ABSTRACT Background Coffee and tea are the major contributors of caffeine in the diet. Evidence points to the premise that caffeine may benefit cognition. Objective We examined the associations of habitual regular coffee or tea and caffeine intake with cognitive function whilst additionally accounting for genetic variation in caffeine metabolism. Methods We included white participants aged 37–73 y from the UK Biobank who provided biological samples and completed touchscreen questionnaires regarding sociodemographic factors, medical history, lifestyle, and diet. Habitual caffeine-containing coffee and tea intake was self-reported in cups/day and used to estimate caffeine intake. Between 97,369 and 445,786 participants with data also completed ≥1 of 7 self-administered cognitive functioning tests using a touchscreen system (2006–2010) or on home computers (2014). Multivariable regressions were used to examine the association between coffee, tea, or caffeine intake and cognition test scores. We also tested interactions between coffee, tea, or caffeine intake and a genetic-based caffeine-metabolism score (CMS) on cognitive function. Results After multivariable adjustment, reaction time, Pairs Matching, Trail Making test B, and symbol digit substitution, performance significantly decreased with consumption of 1 or more cups of coffee (all tests P-trend < 0.0001). Tea consumption was associated with poor performance on all tests (P-trend < 0.0001). No statistically significant CMS × tea, CMS × coffee, or CMS × caffeine interactions were observed. Conclusions Our findings, based on the participants of the UK Biobank, provide little support for habitual consumption of regular coffee or tea and caffeine in improving cognitive function. On the contrary, we observed decrements in performance with intakes of these beverages which may be a result of confounding. Whether habitual caffeine intake affects cognitive function therefore remains to be tested.

Download Full-text

Whole-Exome Sequencing Identifies Novel Variants for Tooth Agenesis

Journal of Dental Research ◽

10.1177/0022034517724149 ◽

2017 ◽

Vol 97 (1) ◽

pp. 49-59 ◽

Cited By ~ 16

Author(s):

N. Dinckan ◽

R. Du ◽

L.E. Petty ◽

Z. Coban-Akdemir ◽

S.N. Jhangiani ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Permanent Teeth ◽

Tooth Agenesis ◽

Disease Genes ◽

Nonsense Mediated Decay ◽

Pathogenic Variants ◽

Whole Exome ◽

Novel Variants ◽

Complex Inheritance

Tooth agenesis is a common craniofacial abnormality in humans and represents failure to develop 1 or more permanent teeth. Tooth agenesis is complex, and variations in about a dozen genes have been reported as contributing to the etiology. Here, we combined whole-exome sequencing, array-based genotyping, and linkage analysis to identify putative pathogenic variants in candidate disease genes for tooth agenesis in 10 multiplex Turkish families. Novel homozygous and heterozygous variants in LRP6, DKK1, LAMA3, and COL17A1 genes, as well as known variants in WNT10A, were identified as likely pathogenic in isolated tooth agenesis. Novel variants in KREMEN1 were identified as likely pathogenic in 2 families with suspected syndromic tooth agenesis. Variants in more than 1 gene were identified segregating with tooth agenesis in 2 families, suggesting oligogenic inheritance. Structural modeling of missense variants suggests deleterious effects to the encoded proteins. Functional analysis of an indel variant (c.3607+3_6del) in LRP6 suggested that the predicted resulting mRNA is subject to nonsense-mediated decay. Our results support a major role for WNT pathways genes in the etiology of tooth agenesis while revealing new candidate genes. Moreover, oligogenic cosegregation was suggestive for complex inheritance and potentially complex gene product interactions during development, contributing to improved understanding of the genetic etiology of familial tooth agenesis.

Download Full-text