Rare coding variants in MYH6 are associated with atrial fibrillation: results from 45,596 exomes representing the general population

Abstract Background Atrial Fibrillation (AF) is the most common cardiac arrhythmia, and it is associated with serious complications; including an increased risk of stroke, heart failure, and death. It affects around 5% of the population above 65 years of age, and it is estimated that 2% of healthcare expenses are related to AF. The causes of AF are complex, and includes structural heart disease, hypertension, diabetes and genetic risk factors. To date 166 unique genetic loci have been identified to be associated with AF. While AF has traditionally been regarded as an electrical disease, structural genes, including the sarcomere gene, titin (TTN), has been associated with the disease. Recently, a large genome wide association study associated common variants in the gene MYH6 with AF. The gene encodes the protein alpha myosin heavy chain, and has previously been associated with sick-sinus syndrome and structural heart disease. Purpose We hypothesized that genetic variants in the sarcomere gene MYH6 were more prevalent in AF patients than non-AF patients supporting that this gene is important for the development of AF. Methods We analysed publicly available data from the UK Biobank, combining exome-sequencing data and health-related information on 45,596 participants. Using next-generation sequencing, we then examined the genetic variation in MYH6 in a cohort of 383 Danish, early-onset AF patients. The patients had onset of AF before age 40, had normal echocardiogram, and no other cardiovascular disease at onset of AF. Genetic variants were filtered by minor allele frequency (MAF) in the Genome Aggregation Database (GnomAD), and only rare variants with MAF<1% were included. We then predicted the potential deleteriousness of the variants using combined annotation dependent depletion (CADD) score. Results We found rare coding variants in MYH6 to be significantly associated with AF in exome-sequencing data on 45,596 participants from the UK Biobank (p=0.038). In our cohort of 383 Danish, early-onset AF patients with no other cardiovascular disease, we identified 12 rare, missense variants in MYH6. Of these variants, three were novel, and 11 had CADD scores >20, suggesting them to be in the top 1% of likely deleterious variants. Conclusion We identified rare genetic variants in MYH6 to be significantly associated with AF in a large population-based cohort. We also identified 12 rare coding variants in a highly selected cohort of early-onset AF patients. Most of these variants were predicted to be deleterious. Our results indicate that rare variants in MYH6 may increase susceptibility to AF, thus elaborating on the understanding of the pathophysiological mechanisms of AF, and the role of structural genes in the development of AF. Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): Novo Nordisk Foundation Pre-Graduate Scholarships

Download Full-text

Assessing the analytical validity of SNP-chips for detecting very rare pathogenic variants: implications for direct-to-consumer genetic testing

10.1101/696799 ◽

2019 ◽

Cited By ~ 14

Author(s):

Michael N Weedon ◽

Leigh Jackson ◽

James W Harrison ◽

Kate S Ruth ◽

Jessica Tyrrell ◽

...

Keyword(s):

Genetic Testing ◽

Genetic Variants ◽

Rare Variants ◽

Uk Biobank ◽

Sequencing Data ◽

Snp Chip ◽

Direct To Consumer ◽

Pathogenic Variants ◽

Variant Frequency ◽

The Uk

ABSTRACTObjectivesTo determine the analytical validity of SNP-chips for genotyping very rare genetic variants.DesignRetrospective study using data from two publicly available resources, the UK Biobank and the Personal Genome Project.SettingResearch biobanks and direct-to-consumer genetic testing in the UK and USA.Participants49,908 individuals recruited to UK Biobank, and 21 individuals who purchased consumer genetic tests and shared their data online via the Personal Genomes Project.Main outcome measuresWe assessed the analytical validity of genotypes from SNP-chips (index test) with sequencing data (reference standard). We evaluated the genotyping accuracy of the SNP-chips and split the results by variant frequency. We went on to select rare pathogenic variants in the BRCA1 and BRCA2 genes as an exemplar for detailed analysis of clinically-actionable variants in UK Biobank, and assessed BRCA-related cancers (breast, ovarian, prostate and pancreatic) in participants using cancer registry data.ResultsSNP-chip genotype accuracy is high overall; sensitivity, specificity and precision are all >99% for 108,574 common variants directly genotyped by the UK Biobank SNP-chips. However, the likelihood of a true positive result reduces dramatically with decreasing variant frequency; for variants with a frequency <0.001% in UK Biobank the precision is very low and only 16% of 4,711 variants from the SNP-chips confirm with sequencing data. Results are similar for SNP-chip data from the Personal Genomes Project, and 20/21 individuals have at least one rare pathogenic variant that has been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, the overall performance metrics of the SNP-chips in UK Biobank are sensitivity 34.6%, specificity 98.3% and precision 4.2%. Rates of BRCA-related cancers in individuals in UK Biobank with a positive SNP-chip result are similar to age-matched controls (OR 1.28, P=0.07, 95% CI: 0.98 to 1.67), while sequence-positive individuals have a significantly increased risk (OR 3.73, P=3.5×10−12, 95% CI: 2.57 to 5.40).ConclusionSNP-chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.SUMMARY BOXSection 1: What is already known on this topicSNP-chips are an accurate and affordable method for genotyping common genetic variants across the genome. They are often used by direct-to-consumer (DTC) genetic testing companies and research studies, but there several case reports suggesting they perform poorly for genotyping rare genetic variants when compared with sequencing.Section 2: What this study addsOur study confirms that SNP-chips are highly inaccurate for genotyping rare, clinically-actionable variants. Using large-scale SNP-chip and sequencing data from UK Biobank, we show that SNP-chips have a very low precision of <16% for detecting very rare variants (i.e. the majority of variants with population frequency of <0.001% are false positives). We observed a similar performance in a small sample of raw SNP-chip data from DTC genetic tests. Very rare variants assayed using SNP-chips should not be used to guide health decisions without validation.

Download Full-text

Novel genotyping algorithms for rare variants significantly improve the accuracy of Applied Biosystems™ Axiom™ array genotyping calls

10.1101/2021.09.13.459984 ◽

2021 ◽

Author(s):

Orna Mizrahi Man ◽

Marcos H Woehrmann ◽

Teresa A Webster ◽

Jeremy Gollub ◽

Adrian Bivol ◽

...

Keyword(s):

Positive Predictive Value ◽

Exome Sequencing ◽

Predictive Value ◽

Rare Variants ◽

Uk Biobank ◽

Sequencing Data ◽

Data Set ◽

Array Data ◽

Exome Sequencing Data ◽

The Uk

Objective: To significantly improve the positive predictive value (PPV) and sensitivity of Applied Biosystems™ Axiom™ array variant calling, by means of novel improvement to genotyping algorithms and careful quality control of array probesets. The improvement makes array genotyping more suitable for very rare variants. Design: Retrospective evaluation of UK Biobank array data re-genotyped with improved algorithms for rare variants. Participant: 488,359 people recruited to the UK Biobank with Axiom array genotyping data including 200,630 with exome sequencing data. Main Outcome Measures: A comparison of genotyping calls from array data to genotyping calls on a subset of variants with exome sequencing data. Results: Axiom genotyping [18] performed well, based on comparison to sequencing data, for over 100,000 common variants directly genotyped on the Axiom UK Biobank array and also exome sequenced by the UK Biobank Exome Sequencing Consortium. However, in a comparison to the initial exome sequencing results of the first 50K individuals, Weedon et al. [1] observed that when grouping these variants by the minor allele frequency (MAF) observed in UK Biobank, the concordance with sequencing and resulting positive predictive value (PPV) decreased with the number of heterozygous (Het) array calls per variant. An improved genotyping algorithm, Rare Heterozygous Adjustment (RHA) [16], released mid-2020 for genotyping on Axiom arrays, significantly improves PPV in all MAF ranges for the 50K data as well as when compared to the exome sequencing of 200K individuals, released after Weedon et al. [1] performed their comparison. The RHA algorithm improved PPVs in the 200K data in the lowest three frequency groups [0, 0.001%), [0.001%, 0.005%) and [0.005%, 0.01%) to 83%, 82% and 88%; respectively. PPV was above 95% for higher MAF ranges without algorithm improvement. PPVs are somewhat higher in the 200K dataset, due to a different "truth set" from exome sequencing and because monomorphic exome loci are not included in the joint genotyping calls for the 200K data set, as explained in the methods section. Sensitivity was higher in the 200K data set than in the original 50K data as well, especially for low MAF ranges. This increase is in part due to the larger data set over which sensitivity could be computed and in part due to the different WES algorithms used for the 200K data [7]. Filtering of a relatively small number of non-performing probesets (determined without reference to the exome sequencing data) significantly improved sensitivities for all MAF ranges, resulting in 70%, 88% and 94% respectively in the three lowest MAF ranges and greater than 98% and 99.9% for the two higher MAF ranges ([0.01%, 1%), [1%, 50%]). Conclusions: Improved algorithms for genotyping along with enhanced quality control of array probesets, significantly improve the positive predictive value and the sensitivity of array data, making it suitable for the detection of very rare variants. The probeset filtering methods developed have resulted in better probe designs for arrays and the new genotyping algorithm is part of the standard algorithm for all Axiom arrays since early 2020.

Download Full-text

Assessing the contribution of rare-to-common protein-coding variants to circulating metabolic biomarker levels via 412,394 UK Biobank exome sequences

10.1101/2021.12.24.21268381 ◽

2021 ◽

Author(s):

Abhishek Nag ◽

Lawrence Middleton ◽

Ryan S Dhindsa ◽

Dimitrios Vitsios ◽

Eleanor M Wigmore ◽

...

Keyword(s):

Gene Networks ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Protein Coding ◽

The Uk ◽

Metabolic Biomarkers ◽

Coding Variants

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.

Download Full-text

Dnmt3a-mutated clonal hematopoiesis promotes osteoporosis

Journal of Experimental Medicine ◽

10.1084/jem.20211872 ◽

2021 ◽

Vol 218 (12) ◽

Author(s):

Peter Geon Kim ◽

Abhishek Niroula ◽

Veronica Shkolnik ◽

Marie McConkey ◽

Amy E. Lin ◽

...

Keyword(s):

Blood Cells ◽

Somatic Mutations ◽

Murine Models ◽

Uk Biobank ◽

Sequencing Data ◽

Mineral Density ◽

Clonal Hematopoiesis ◽

Exome Sequencing Data ◽

Close Proximity ◽

The Uk

Osteoporosis is caused by an imbalance of osteoclasts and osteoblasts, occurring in close proximity to hematopoietic cells in the bone marrow. Recurrent somatic mutations that lead to an expanded population of mutant blood cells is termed clonal hematopoiesis of indeterminate potential (CHIP). Analyzing exome sequencing data from the UK Biobank, we found CHIP to be associated with increased incident osteoporosis diagnoses and decreased bone mineral density. In murine models, hematopoietic-specific mutations in Dnmt3a, the most commonly mutated gene in CHIP, decreased bone mass via increased osteoclastogenesis. Dnmt3a−/− demethylation opened chromatin and altered activity of inflammatory transcription factors. Bone loss was driven by proinflammatory cytokines, including Irf3-NF-κB–mediated IL-20 expression from Dnmt3a mutant macrophages. Increased osteoclastogenesis due to the Dnmt3a mutations was ameliorated by alendronate or IL-20 neutralization. These results demonstrate a novel source of osteoporosis-inducing inflammation.

Download Full-text

Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants

10.1101/2020.12.13.422582 ◽

2020 ◽

Author(s):

Quanli Wang ◽

Ryan S. Dhindsa ◽

Keren Carss ◽

Andrew R Harper ◽

Abhishek Nag ◽

...

Keyword(s):

Exome Sequencing ◽

Drug Targets ◽

Rare Variants ◽

Population Based ◽

Uk Biobank ◽

Loss Of Function ◽

Sequencing Data ◽

Phenotypic Data ◽

Protein Coding ◽

The Uk

The UK Biobank (UKB) represents an unprecedented population-based study of 502,543 participants with detailed phenotypic data and linkage to medical records. While the release of genotyping array data for this cohort has bolstered genomic discovery for common variants, the contribution of rare variants to this broad phenotype collection remains relatively unknown. Here, we use exome sequencing data from 177,882 UKB participants to evaluate the association between rare protein-coding variants with 10,533 binary and 1,419 quantitative phenotypes. We performed both a variant-level phenome-wide association study (PheWAS) and a gene-level collapsing analysis-based PheWAS tailored to detecting the aggregate contribution of rare variants. The latter revealed 911 statistically significant gene-phenotype relationships, with a median odds ratio of 15.7 for binary traits. Among the binary trait associations identified using collapsing analysis, 83% were undetectable using single variant association tests, emphasizing the power of collapsing analysis to detect signal in the setting of high allelic heterogeneity. As a whole, these genotype-phenotype associations were significantly enriched for loss-of-function mediated traits and currently approved drug targets. Using these results, we summarise the contribution of rare variants to common diseases in the context of the UKB phenome and provide an example of how novel gene-phenotype associations can aid in therapeutic target prioritisation.

Download Full-text

Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population

10.1101/2020.06.11.145938 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Curtis

Keyword(s):

Linear Regression ◽

Principal Components ◽

Rare Variants ◽

Linear Regression Analysis ◽

Uk Biobank ◽

Case Control Studies ◽

Test Statistic ◽

Functional Variants ◽

The Uk ◽

Coding Variants

AbstractWeighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.

Download Full-text

TH16. A PIPELINE TO PROCESS AND ANALYSE EXOME SEQUENCING DATA FROM 200,000 INDIVIDUALS IN THE UK BIOBANK

European Neuropsychopharmacology ◽

10.1016/j.euroneuro.2021.08.190 ◽

2021 ◽

Vol 51 ◽

pp. e202-e203

Author(s):

Eilidh Fenner ◽

James Walters ◽

Elliott Rees

Keyword(s):

Exome Sequencing ◽

Uk Biobank ◽

Sequencing Data ◽

Exome Sequencing Data ◽

The Uk

Download Full-text

Machine Learning Algorithms and Whole Exome Sequencing Data from Breast Cancer Patients in the UK Biobank Predict Survival

10.21203/rs.3.rs-115867/v2 ◽

2020 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Uk Biobank ◽

Breast Cancer Patients ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

The Uk

Abstract The authors have requested that this preprint be withdrawn due to erroneous posting.

Download Full-text

The impact of rare variation on gene expression across tissues

Nature ◽

10.1038/nature24267 ◽

2017 ◽

Vol 550 (7675) ◽

pp. 239-243 ◽

Cited By ~ 112

Author(s):

Xin Li ◽

◽

Yungil Kim ◽

Emily K. Tsang ◽

Joe R. Davis ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Rare Variants ◽

Disease Risk ◽

Association Studies ◽

Genetic Association Studies ◽

Sequencing Data ◽

Common Genetic Variants ◽

Coding Variants ◽

The Impact

Abstract Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk1,2,3,4. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants1,5. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles1,6,7, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues8,9,10,11, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release12. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

Download Full-text

Rare GATA6 variants associated with risk of congenital heart disease phenotypes in 200,000 UK Biobank exomes

10.1101/2021.05.04.21256616 ◽

2021 ◽

Author(s):

Simon G Williams ◽

Dominic Byrne ◽

Bernard Keavney

Keyword(s):

Congenital Heart Disease ◽

Heart Disease ◽

Congenital Heart ◽

Rare Variants ◽

Uk Biobank ◽

Sequencing Data ◽

Chd Risk ◽

Increased Risk ◽

Sequencing Studies ◽

The Uk

Several genes have been associated with congenital heart disease (CHD) risk in previous GWAS and sequencing studies, but studies involving larger numbers of case samples remain needed to facilitate further understanding of what remains a complex and largely uncharacterised genetic etiology. Here we use whole exome sequencing data from 200,000 samples in the UK Biobank to assess ultra-rare and potentially pathogenic variation associated with increased risk of CHD. Our findings indicate that rare variants in GATA6, presumably with a lesser effect on gene function than those causing severe CHD phenotypes, or buffered by other genetic and environmental effects during development, are also associated with minor CHD conditions, specifically bicuspid aortic valve, the most common CHD condition.

Download Full-text