Efficient identification of trait-associated loss-of-function variants in the UK Biobank cohort by exome-sequencing based genotype imputation

The large-scale open access whole-exome sequencing (WES) data of the UK Biobank ~200,000 participants is accelerating a new wave of genetic association studies aiming to identify rare and functional loss-of-function (LoF) variants associated with a broad range of complex traits and diseases, however the community is in short of stringent replication of new associations. In this study, we proposed to merge the WES genotypes and the genome-wide genotyping (GWAS) genotypes of 167,000 UKB Caucasian participants into a combined reference panel, and then to impute 241,911 UKB Caucasian participants who had the GWAS genotypes only. We then proposed to use the imputed data to replicate association identified in the discovery WES sample. Using a leave-100-out imputation strategy in the reference panel, we showed that average imputation accuracy measure r2 is modest to high at LoF variants of all minor allele frequency (MAF) intervals including ultra-rare ones: 0.942 at MAF interval [1%, 50%], 0.807 at [0.1%, 1.0%), 0.805 at [0.01%, 0.1%), 0.664 at [0.001%, 0.01%) and 0.410 at (0, 0.001%). As applications, we studied single variant level and gene level associations of LoF variants with estimated heel BMD (eBMD) and 4 lipid traits: high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), triglycerides (TG) and total cholesterol (TC). In addition to replicating dozens of previously reported genes such as MEPE for eBMD and PCSK9 for more than one lipid trait, the results also identified 2 novel gene-level associations: PLIN1 (cumulative MAF=0.10%, discovery BETA=0.38, P=1.20X10-13; replication BETA=0.25, P=1.03X10-6) and ANGPTL3 (cumulative MAF=0.10%, discovery BETA=−0.36, P=4.70X10-11; replication BETA=−0.30, P=6.60X10-11) for HDL-C, as well as one novel single variant level association (11:14843853:C:T, MAF=0.11%, discovery BETA=−0.31, P=2.70X10-9; replication BETA=−0.31, P=8.80X10-14, PDE3B) for TG. Our results highlighted the strength of WES based genotype imputation as well as provided useful imputed data within the UKB cohort.

Download Full-text

Baseline cardiometabolic profiles and SARS-CoV-2 infection in the UK Biobank

PLoS ONE ◽

10.1371/journal.pone.0248602 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0248602 ◽

Cited By ~ 1

Author(s):

Ryan J. Scalsky ◽

Yi-Ju Chen ◽

Karan Desai ◽

Jeffery R. O’Connell ◽

James A. Perry ◽

...

Keyword(s):

Risk Factors ◽

Type Ii Diabetes ◽

Cardiometabolic Risk ◽

Low Density Lipoprotein ◽

Density Lipoprotein ◽

Lipoprotein Cholesterol ◽

Type Ii ◽

Uk Biobank ◽

The Uk ◽

The Impact

Background SARS-CoV-2 is a rapidly spreading coronavirus responsible for the Covid-19 pandemic, which is characterized by severe respiratory infection. Many factors have been identified as risk factors for SARS-CoV-2, with much early attention being paid to body mass index (BMI), which is a well-known cardiometabolic risk factor. Objective This study seeks to examine the impact of additional baseline cardiometabolic risk factors including high density lipoprotein-cholesterol (HDL-C), low density lipoprotein-cholesterol (LDL-C), Apolipoprotein A-I (ApoA-I), Apolipoprotein B (ApoB), triglycerides, hemoglobin A1c (HbA1c) and diabetes on the odds of testing positive for SARS-CoV-2 in UK Biobank (UKB) study participants. Methods We examined the effect of BMI, lipid profiles, diabetes and alcohol intake on the odds of testing positive for SARS-Cov-2 among 9,005 UKB participants tested for SARS-CoV-2 from March 16 through July 14, 2020. Odds ratios and 95% confidence intervals were computed using logistic regression adjusted for age, sex and ancestry. Results Higher BMI, Type II diabetes and HbA1c were associated with increased SARS-CoV-2 odds (p < 0.05) while HDL-C and ApoA-I were associated with decreased odds (p < 0.001). Though the effect of BMI, Type II diabetes and HbA1c were eliminated when HDL-C was controlled, the effect of HDL-C remained significant when BMI was controlled for. LDL-C, ApoB and triglyceride levels were not found to be significantly associated with increased odds. Conclusion Elevated HDL-C and ApoA-I levels were associated with reduced odds of testing positive for SARS-CoV-2, while higher BMI, type II diabetes and HbA1c were associated with increased odds. The effects of BMI, type II diabetes and HbA1c levels were no longer significant after controlling for HDL-C, suggesting that these effects may be mediated in part through regulation of HDL-C levels. In summary, our study suggests that baseline HDL-C level may be useful for stratifying SARS-CoV-2 infection risk and corroborates the emerging picture that HDL-C may confer protection against sepsis in general and SARS-CoV-2 in particular.

Download Full-text

Association of pre-pandemic high-density lipoprotein cholesterol with risk of COVID-19 hospitalisation and death: the UK Biobank cohort study

Preventive Medicine Reports ◽

10.1016/j.pmedr.2021.101461 ◽

2021 ◽

pp. 101461

Author(s):

Camille Lassale ◽

Mark Hamer ◽

Álvaro Hernáez ◽

Catharine R. Gale ◽

G. David Batty

Keyword(s):

Cohort Study ◽

High Density Lipoprotein ◽

High Density Lipoprotein Cholesterol ◽

Density Lipoprotein ◽

High Density ◽

Lipoprotein Cholesterol ◽

Uk Biobank ◽

The Uk

Download Full-text

Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants

10.1101/2020.12.13.422582 ◽

2020 ◽

Author(s):

Quanli Wang ◽

Ryan S. Dhindsa ◽

Keren Carss ◽

Andrew R Harper ◽

Abhishek Nag ◽

...

Keyword(s):

Exome Sequencing ◽

Drug Targets ◽

Rare Variants ◽

Population Based ◽

Uk Biobank ◽

Loss Of Function ◽

Sequencing Data ◽

Phenotypic Data ◽

Protein Coding ◽

The Uk

The UK Biobank (UKB) represents an unprecedented population-based study of 502,543 participants with detailed phenotypic data and linkage to medical records. While the release of genotyping array data for this cohort has bolstered genomic discovery for common variants, the contribution of rare variants to this broad phenotype collection remains relatively unknown. Here, we use exome sequencing data from 177,882 UKB participants to evaluate the association between rare protein-coding variants with 10,533 binary and 1,419 quantitative phenotypes. We performed both a variant-level phenome-wide association study (PheWAS) and a gene-level collapsing analysis-based PheWAS tailored to detecting the aggregate contribution of rare variants. The latter revealed 911 statistically significant gene-phenotype relationships, with a median odds ratio of 15.7 for binary traits. Among the binary trait associations identified using collapsing analysis, 83% were undetectable using single variant association tests, emphasizing the power of collapsing analysis to detect signal in the setting of high allelic heterogeneity. As a whole, these genotype-phenotype associations were significantly enriched for loss-of-function mediated traits and currently approved drug targets. Using these results, we summarise the contribution of rare variants to common diseases in the context of the UKB phenome and provide an example of how novel gene-phenotype associations can aid in therapeutic target prioritisation.

Download Full-text

Exome sequencing and characterization of 49,960 individuals in the UK Biobank

Nature ◽

10.1038/s41586-020-2853-0 ◽

2020 ◽

Vol 586 (7831) ◽

pp. 749-756 ◽

Cited By ~ 5

Author(s):

Cristopher V. Van Hout ◽

◽

Ioanna Tachmazidou ◽

Joshua D. Backman ◽

Joshua D. Hoffman ◽

...

Keyword(s):

Exome Sequencing ◽

Sequence Data ◽

Varicose Veins ◽

Large Population ◽

Clinical Importance ◽

Fold Increase ◽

Uk Biobank ◽

Loss Of Function ◽

Association Analyses ◽

The Uk

AbstractThe UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.

Download Full-text

Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank

10.1101/572347 ◽

2019 ◽

Cited By ~ 50

Author(s):

Cristopher V. Van Hout ◽

Ioanna Tachmazidou ◽

Joshua D. Backman ◽

Joshua X. Hoffman ◽

Bin Ye ◽

...

Keyword(s):

Exome Sequencing ◽

Large Scale ◽

Sequence Data ◽

Varicose Veins ◽

Large Population ◽

Uk Biobank ◽

Loss Of Function ◽

Phenotypic Data ◽

Pathogenic Variants ◽

The Uk

SUMMARYThe UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.

Download Full-text

Predicting the effect of statins on cancer risk using genetic variants from a Mendelian randomization study in the UK Biobank

eLife ◽

10.7554/elife.57191 ◽

2020 ◽

Vol 9 ◽

Author(s):

Paul Carter ◽

Mathew Vithayathil ◽

Siddhartha Kar ◽

Rahul Potluri ◽

Amy M Mason ◽

...

Keyword(s):

Standard Deviation ◽

Cancer Risk ◽

Genetic Variants ◽

Ldl Cholesterol ◽

Human Genetics ◽

Density Lipoprotein ◽

Lipid Lowering ◽

Uk Biobank ◽

Standard Deviation Increase ◽

The Uk

Laboratory studies have suggested oncogenic roles of lipids, as well as anticarcinogenic effects of statins. Here we assess the potential effect of statin therapy on cancer risk using evidence from human genetics. We obtained associations of lipid-related genetic variants with the risk of overall and 22 site-specific cancers for 367,703 individuals in the UK Biobank. In total, 75,037 individuals had a cancer event. Variants in the HMGCR gene region, which represent proxies for statin treatment, were associated with overall cancer risk (odds ratio [OR] per one standard deviation decrease in low-density lipoprotein [LDL] cholesterol 0.76, 95% confidence interval [CI] 0.65–0.88, p=0.0003) but variants in gene regions representing alternative lipid-lowering treatment targets (PCSK9, LDLR, NPC1L1, APOC3, LPL) were not. Genetically predicted LDL-cholesterol was not associated with overall cancer risk (OR per standard deviation increase 1.01, 95% CI 0.98–1.05, p=0.50). Our results predict that statins reduce cancer risk but other lipid-lowering treatments do not. This suggests that statins reduce cancer risk through a cholesterol independent pathway.

Download Full-text

Prevalence and cardiometabolic correlates of ketohexokinase gene variants among UK Biobank participants

PLoS ONE ◽

10.1371/journal.pone.0247683 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247683

Author(s):

Joseph A. Johnston ◽

David R. Nelson ◽

Pallav Bhatnagar ◽

Sarah E. Curtis ◽

Yu Chen ◽

...

Keyword(s):

Autosomal Recessive ◽

Compound Heterozygous ◽

Gene Variants ◽

Uk Biobank ◽

Loss Of Function ◽

Recessive Condition ◽

Clinical Consequences ◽

Autosomal Recessive Condition ◽

Clinically Significant ◽

The Uk

Essential fructosuria (EF) is a benign, asymptomatic, autosomal recessive condition caused by loss-of-function variants in the ketohexokinase gene and characterized by intermittent appearance of fructose in the urine. Despite a basic understanding of the genetic and molecular basis of EF, relatively little is known about the long-term clinical consequences of ketohexokinase gene variants. We examined the frequency of ketohexokinase variants in the UK Biobank sample and compared the cardiometabolic profiles of groups of individuals with and without these variants alone or in combination. Study cohorts consisted of groups of participants defined based on the presence of one or more of the five ketohexokinase gene variants tested for in the Affymetrix assays used by the UK Biobank. The rs2304681:G>A (p.Val49Ile) variant was present on more than one-third (36.8%) of chromosomes; other variant alleles were rare (<1%). No participants with the compound heterozygous genotype present in subjects exhibiting the EF phenotype in the literature (Gly40Arg/Ala43Thr) were identified. The rs2304681:G>A (p.Val49Ile), rs41288797 (p.Val188Met), and rs114353144 (p.Val264Ile) variants were more common in white versus non-white participants. Otherwise, few statistically or clinically significant differences were observed after adjustment for multiple comparisons. These findings reinforce the current understanding of EF as a rare, benign, autosomal recessive condition.

Download Full-text

Causal Associations Between Blood Lipids and COVID-19 Risk: A Two-Sample Mendelian Randomization Study

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvbaha.121.316324 ◽

2021 ◽

Author(s):

Kun Zhang ◽

Shan-Shan Dong ◽

Yan Guo ◽

Shi-Hao Tang ◽

Hao Wu ◽

...

Keyword(s):

Total Cholesterol ◽

Blood Lipids ◽

Mendelian Randomization ◽

Low Density Lipoprotein ◽

Density Lipoprotein ◽

Outcome Data ◽

Causal Effects ◽

Lipoprotein Cholesterol ◽

Global Pandemic ◽

The Uk

Objective: Coronavirus disease 2019 (COVID-19) is a global pandemic caused by the severe acute respiratory syndrome coronavirus 2. It has been reported that dyslipidemia is correlated with COVID-19, and blood lipids levels, including total cholesterol, HDL-C (high-density lipoprotein cholesterol), and LDL-C (low-density lipoprotein cholesterol) levels, were significantly associated with disease severity. However, the causalities of blood lipids on COVID-19 are not clear. Approach and Results: We performed 2-sample Mendelian randomization (MR) analyses to explore the causal effects of blood lipids on COVID-19 susceptibility and severity. Using the outcome data from the UK Biobank (1221 cases and 4117 controls), we observed potential positive causal effects of dyslipidemia (odds ratio [OR], 1.27 [95% CI, 1.08–1.49], P =3.18×10 −3 ), total cholesterol (OR, 1.19 [95% CI, 1.07–1.32], P =8.54×10 −4 ), and ApoB (apolipoprotein B; OR, 1.18 [95% CI, 1.07–1.29], P =1.01×10 −3 ) on COVID-19 susceptibility after Bonferroni correction. In addition, the effects of total cholesterol (OR, 1.01 [95% CI, 1.00–1.02], P =2.29×10 −2 ) and ApoB (OR, 1.01 [95% CI, 1.00–1.02], P =2.22×10 −2 ) on COVID-19 susceptibility were also identified using outcome data from the host genetics initiative (14 134 cases and 1 284 876 controls). Conclusions: In conclusion, we found that higher total cholesterol and ApoB levels might increase the risk of COVID-19 infection.

Download Full-text

LPA and APOE are associated with statin selection in the UK Biobank

10.1101/2020.08.28.272765 ◽

2020 ◽

Author(s):

Adam Lavertu ◽

Gregory McInnes ◽

Yosuke Tanigawa ◽

Russ B Altman ◽

Manuel A. Rivas

Keyword(s):

Statin Therapy ◽

High Intensity ◽

Drug Response ◽

Association Studies ◽

Low Density Lipoprotein ◽

Density Lipoprotein ◽

Treatment Decisions ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

The Uk

AbstractGenetics plays a key role in drug response, affecting efficacy and toxicity. Pharmacogenomics aims to understand how genetic variation influences drug response and develop clinical guidelines to aid clinicians in personalized treatment decisions informed by genetics. Although pharmacogenomics has not been broadly adopted into clinical practice, genetics influences treatment decisions regardless. Physicians adjust patient care based on observed response to medication, which may occur as a result of genetic variants harbored by the patient. Here we seek to understand the genetics of drug selection in statin therapy, a class of drugs widely used for high cholesterol treatment. Genetics are known to play an important role in statin efficacy and toxicity, leading to significant changes in patient outcome. We performed genome-wide association studies (GWAS) on statin selection among 59,198 participants in the UK Biobank and found that variants known to influence statin efficacy are significantly associated with statin selection. Specifically, we find that carriers of variants in APOE and LPA that are known to decrease efficacy of treatment are more likely to be on atorvastatin, a stronger statin. Additionally, carriers of the APOE and LPA variants are more likely to be on a higher intensity dose (a dose that reduces low-density lipoprotein cholesterol by greater than 40%) of atorvastatin than non-carriers (APOE: p(high intensity) = 0.16, OR = 1.7, P = 1.64 × 10−4, LPA: p(high intensity) = 0.17, OR = 1.4, P = 1.14 × 10−2). These findings represent the largest genetic association study of statin selection and statin dose association to date and provide evidence for the role of LPA and APOE in statin response, furthering the possibility of personalized statin therapy.

Download Full-text

Identity-by-descent detection across 487,409 British samples reveals fine-scale population structure, evolutionary history, and trait associations

10.1101/2020.04.20.029819 ◽

2020 ◽

Cited By ~ 3

Author(s):

Juba Nait Saada ◽

Georgios Kalantzis ◽

Derek Shyr ◽

Martin Robinson ◽

Alexander Gusev ◽

...

Keyword(s):

Population Structure ◽

Exome Sequencing ◽

Evolutionary History ◽

Genetic Relatedness ◽

Uk Biobank ◽

The Past ◽

Wide Range ◽

Shared Ancestry ◽

The Uk ◽

Common Ancestors

AbstractDetection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1,500 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the localization of a sample’s birth coordinates from genomic data. We sought evidence of recent positive selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle population structure, recent evolutionary history, and rare pathogenic variation.

Download Full-text