scholarly journals Novel genotyping algorithms for rare variants significantly improve the accuracy of Applied Biosystems™ Axiom™ array genotyping calls

2021 ◽  
Author(s):  
Orna Mizrahi Man ◽  
Marcos H Woehrmann ◽  
Teresa A Webster ◽  
Jeremy Gollub ◽  
Adrian Bivol ◽  
...  

Objective: To significantly improve the positive predictive value (PPV) and sensitivity of Applied Biosystems™ Axiom™ array variant calling, by means of novel improvement to genotyping algorithms and careful quality control of array probesets. The improvement makes array genotyping more suitable for very rare variants. Design: Retrospective evaluation of UK Biobank array data re-genotyped with improved algorithms for rare variants. Participant: 488,359 people recruited to the UK Biobank with Axiom array genotyping data including 200,630 with exome sequencing data. Main Outcome Measures: A comparison of genotyping calls from array data to genotyping calls on a subset of variants with exome sequencing data. Results: Axiom genotyping [18] performed well, based on comparison to sequencing data, for over 100,000 common variants directly genotyped on the Axiom UK Biobank array and also exome sequenced by the UK Biobank Exome Sequencing Consortium. However, in a comparison to the initial exome sequencing results of the first 50K individuals, Weedon et al. [1] observed that when grouping these variants by the minor allele frequency (MAF) observed in UK Biobank, the concordance with sequencing and resulting positive predictive value (PPV) decreased with the number of heterozygous (Het) array calls per variant. An improved genotyping algorithm, Rare Heterozygous Adjustment (RHA) [16], released mid-2020 for genotyping on Axiom arrays, significantly improves PPV in all MAF ranges for the 50K data as well as when compared to the exome sequencing of 200K individuals, released after Weedon et al. [1] performed their comparison. The RHA algorithm improved PPVs in the 200K data in the lowest three frequency groups [0, 0.001%), [0.001%, 0.005%) and [0.005%, 0.01%) to 83%, 82% and 88%; respectively. PPV was above 95% for higher MAF ranges without algorithm improvement. PPVs are somewhat higher in the 200K dataset, due to a different "truth set" from exome sequencing and because monomorphic exome loci are not included in the joint genotyping calls for the 200K data set, as explained in the methods section. Sensitivity was higher in the 200K data set than in the original 50K data as well, especially for low MAF ranges. This increase is in part due to the larger data set over which sensitivity could be computed and in part due to the different WES algorithms used for the 200K data [7]. Filtering of a relatively small number of non-performing probesets (determined without reference to the exome sequencing data) significantly improved sensitivities for all MAF ranges, resulting in 70%, 88% and 94% respectively in the three lowest MAF ranges and greater than 98% and 99.9% for the two higher MAF ranges ([0.01%, 1%), [1%, 50%]). Conclusions: Improved algorithms for genotyping along with enhanced quality control of array probesets, significantly improve the positive predictive value and the sensitivity of array data, making it suitable for the detection of very rare variants. The probeset filtering methods developed have resulted in better probe designs for arrays and the new genotyping algorithm is part of the standard algorithm for all Axiom arrays since early 2020.

BMJ ◽  
2021 ◽  
pp. n214
Author(s):  
Weedon MN ◽  
Jackson L ◽  
Harrison JW ◽  
Ruth KS ◽  
Tyrrell J ◽  
...  

Abstract Objective To determine whether the sensitivity and specificity of SNP chips are adequate for detecting rare pathogenic variants in a clinically unselected population. Design Retrospective, population based diagnostic evaluation. Participants 49 908 people recruited to the UK Biobank with SNP chip and next generation sequencing data, and an additional 21 people who purchased consumer genetic tests and shared their data online via the Personal Genome Project. Main outcome measures Genotyping (that is, identification of the correct DNA base at a specific genomic location) using SNP chips versus sequencing, with results split by frequency of that genotype in the population. Rare pathogenic variants in the BRCA1 and BRCA2 genes were selected as an exemplar for detailed analysis of clinically actionable variants in the UK Biobank, and BRCA related cancers (breast, ovarian, prostate, and pancreatic) were assessed in participants through use of cancer registry data. Results Overall, genotyping using SNP chips performed well compared with sequencing; sensitivity, specificity, positive predictive value, and negative predictive value were all above 99% for 108 574 common variants directly genotyped on the SNP chips and sequenced in the UK Biobank. However, the likelihood of a true positive result decreased dramatically with decreasing variant frequency; for variants that are very rare in the population, with a frequency below 0.001% in UK Biobank, the positive predictive value was very low and only 16% of 4757 heterozygous genotypes from the SNP chips were confirmed with sequencing data. Results were similar for SNP chip data from the Personal Genome Project, and 20/21 individuals analysed had at least one false positive rare pathogenic variant that had been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, which are individually very rare, the overall performance metrics for the SNP chips versus sequencing in the UK Biobank were: sensitivity 34.6%, specificity 98.3%, positive predictive value 4.2%, and negative predictive value 99.9%. Rates of BRCA related cancers in UK Biobank participants with a positive SNP chip result were similar to those for age matched controls (odds ratio 1.31, 95% confidence interval 0.99 to 1.71) because the vast majority of variants were false positives, whereas sequence positive participants had a significantly increased risk (odds ratio 4.05, 2.72 to 6.03). Conclusions SNP chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.


2020 ◽  
Author(s):  
Quanli Wang ◽  
Ryan S. Dhindsa ◽  
Keren Carss ◽  
Andrew R Harper ◽  
Abhishek Nag ◽  
...  

The UK Biobank (UKB) represents an unprecedented population-based study of 502,543 participants with detailed phenotypic data and linkage to medical records. While the release of genotyping array data for this cohort has bolstered genomic discovery for common variants, the contribution of rare variants to this broad phenotype collection remains relatively unknown. Here, we use exome sequencing data from 177,882 UKB participants to evaluate the association between rare protein-coding variants with 10,533 binary and 1,419 quantitative phenotypes. We performed both a variant-level phenome-wide association study (PheWAS) and a gene-level collapsing analysis-based PheWAS tailored to detecting the aggregate contribution of rare variants. The latter revealed 911 statistically significant gene-phenotype relationships, with a median odds ratio of 15.7 for binary traits. Among the binary trait associations identified using collapsing analysis, 83% were undetectable using single variant association tests, emphasizing the power of collapsing analysis to detect signal in the setting of high allelic heterogeneity. As a whole, these genotype-phenotype associations were significantly enriched for loss-of-function mediated traits and currently approved drug targets. Using these results, we summarise the contribution of rare variants to common diseases in the context of the UKB phenome and provide an example of how novel gene-phenotype associations can aid in therapeutic target prioritisation.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
O.B Vad ◽  
C Paludan-Muller ◽  
G Ahlberg ◽  
L Andreasen ◽  
L Refsgaard ◽  
...  

Abstract Background Atrial Fibrillation (AF) is the most common cardiac arrhythmia, and it is associated with serious complications; including an increased risk of stroke, heart failure, and death. It affects around 5% of the population above 65 years of age, and it is estimated that 2% of healthcare expenses are related to AF. The causes of AF are complex, and includes structural heart disease, hypertension, diabetes and genetic risk factors. To date 166 unique genetic loci have been identified to be associated with AF. While AF has traditionally been regarded as an electrical disease, structural genes, including the sarcomere gene, titin (TTN), has been associated with the disease. Recently, a large genome wide association study associated common variants in the gene MYH6 with AF. The gene encodes the protein alpha myosin heavy chain, and has previously been associated with sick-sinus syndrome and structural heart disease. Purpose We hypothesized that genetic variants in the sarcomere gene MYH6 were more prevalent in AF patients than non-AF patients supporting that this gene is important for the development of AF. Methods We analysed publicly available data from the UK Biobank, combining exome-sequencing data and health-related information on 45,596 participants. Using next-generation sequencing, we then examined the genetic variation in MYH6 in a cohort of 383 Danish, early-onset AF patients. The patients had onset of AF before age 40, had normal echocardiogram, and no other cardiovascular disease at onset of AF. Genetic variants were filtered by minor allele frequency (MAF) in the Genome Aggregation Database (GnomAD), and only rare variants with MAF<1% were included. We then predicted the potential deleteriousness of the variants using combined annotation dependent depletion (CADD) score. Results We found rare coding variants in MYH6 to be significantly associated with AF in exome-sequencing data on 45,596 participants from the UK Biobank (p=0.038). In our cohort of 383 Danish, early-onset AF patients with no other cardiovascular disease, we identified 12 rare, missense variants in MYH6. Of these variants, three were novel, and 11 had CADD scores >20, suggesting them to be in the top 1% of likely deleterious variants. Conclusion We identified rare genetic variants in MYH6 to be significantly associated with AF in a large population-based cohort. We also identified 12 rare coding variants in a highly selected cohort of early-onset AF patients. Most of these variants were predicted to be deleterious. Our results indicate that rare variants in MYH6 may increase susceptibility to AF, thus elaborating on the understanding of the pathophysiological mechanisms of AF, and the role of structural genes in the development of AF. Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): Novo Nordisk Foundation Pre-Graduate Scholarships


2021 ◽  
Vol 218 (12) ◽  
Author(s):  
Peter Geon Kim ◽  
Abhishek Niroula ◽  
Veronica Shkolnik ◽  
Marie McConkey ◽  
Amy E. Lin ◽  
...  

Osteoporosis is caused by an imbalance of osteoclasts and osteoblasts, occurring in close proximity to hematopoietic cells in the bone marrow. Recurrent somatic mutations that lead to an expanded population of mutant blood cells is termed clonal hematopoiesis of indeterminate potential (CHIP). Analyzing exome sequencing data from the UK Biobank, we found CHIP to be associated with increased incident osteoporosis diagnoses and decreased bone mineral density. In murine models, hematopoietic-specific mutations in Dnmt3a, the most commonly mutated gene in CHIP, decreased bone mass via increased osteoclastogenesis. Dnmt3a−/− demethylation opened chromatin and altered activity of inflammatory transcription factors. Bone loss was driven by proinflammatory cytokines, including Irf3-NF-κB–mediated IL-20 expression from Dnmt3a mutant macrophages. Increased osteoclastogenesis due to the Dnmt3a mutations was ameliorated by alendronate or IL-20 neutralization. These results demonstrate a novel source of osteoporosis-inducing inflammation.


2019 ◽  
Author(s):  
Michael N Weedon ◽  
Leigh Jackson ◽  
James W Harrison ◽  
Kate S Ruth ◽  
Jessica Tyrrell ◽  
...  

ABSTRACTObjectivesTo determine the analytical validity of SNP-chips for genotyping very rare genetic variants.DesignRetrospective study using data from two publicly available resources, the UK Biobank and the Personal Genome Project.SettingResearch biobanks and direct-to-consumer genetic testing in the UK and USA.Participants49,908 individuals recruited to UK Biobank, and 21 individuals who purchased consumer genetic tests and shared their data online via the Personal Genomes Project.Main outcome measuresWe assessed the analytical validity of genotypes from SNP-chips (index test) with sequencing data (reference standard). We evaluated the genotyping accuracy of the SNP-chips and split the results by variant frequency. We went on to select rare pathogenic variants in the BRCA1 and BRCA2 genes as an exemplar for detailed analysis of clinically-actionable variants in UK Biobank, and assessed BRCA-related cancers (breast, ovarian, prostate and pancreatic) in participants using cancer registry data.ResultsSNP-chip genotype accuracy is high overall; sensitivity, specificity and precision are all >99% for 108,574 common variants directly genotyped by the UK Biobank SNP-chips. However, the likelihood of a true positive result reduces dramatically with decreasing variant frequency; for variants with a frequency <0.001% in UK Biobank the precision is very low and only 16% of 4,711 variants from the SNP-chips confirm with sequencing data. Results are similar for SNP-chip data from the Personal Genomes Project, and 20/21 individuals have at least one rare pathogenic variant that has been incorrectly genotyped. For pathogenic variants in the BRCA1 and BRCA2 genes, the overall performance metrics of the SNP-chips in UK Biobank are sensitivity 34.6%, specificity 98.3% and precision 4.2%. Rates of BRCA-related cancers in individuals in UK Biobank with a positive SNP-chip result are similar to age-matched controls (OR 1.28, P=0.07, 95% CI: 0.98 to 1.67), while sequence-positive individuals have a significantly increased risk (OR 3.73, P=3.5×10−12, 95% CI: 2.57 to 5.40).ConclusionSNP-chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.SUMMARY BOXSection 1: What is already known on this topicSNP-chips are an accurate and affordable method for genotyping common genetic variants across the genome. They are often used by direct-to-consumer (DTC) genetic testing companies and research studies, but there several case reports suggesting they perform poorly for genotyping rare genetic variants when compared with sequencing.Section 2: What this study addsOur study confirms that SNP-chips are highly inaccurate for genotyping rare, clinically-actionable variants. Using large-scale SNP-chip and sequencing data from UK Biobank, we show that SNP-chips have a very low precision of <16% for detecting very rare variants (i.e. the majority of variants with population frequency of <0.001% are false positives). We observed a similar performance in a small sample of raw SNP-chip data from DTC genetic tests. Very rare variants assayed using SNP-chips should not be used to guide health decisions without validation.


2021 ◽  
Author(s):  
Simon G Williams ◽  
Dominic Byrne ◽  
Bernard Keavney

Several genes have been associated with congenital heart disease (CHD) risk in previous GWAS and sequencing studies, but studies involving larger numbers of case samples remain needed to facilitate further understanding of what remains a complex and largely uncharacterised genetic etiology. Here we use whole exome sequencing data from 200,000 samples in the UK Biobank to assess ultra-rare and potentially pathogenic variation associated with increased risk of CHD. Our findings indicate that rare variants in GATA6, presumably with a lesser effect on gene function than those causing severe CHD phenotypes, or buffered by other genetic and environmental effects during development, are also associated with minor CHD conditions, specifically bicuspid aortic valve, the most common CHD condition.


Nature ◽  
2021 ◽  
Author(s):  
Quanli Wang ◽  
Ryan S. Dhindsa ◽  
Keren Carss ◽  
Andrew R. Harper ◽  
Abhishek Nag ◽  
...  

AbstractGenome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variation to common disease remains relatively unexplored. The UK Biobank (UKB) contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the impact of rare variation on a broad collection of traits1,2. Here, we studied the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UKB participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UKB participants of African, East Asian, or South Asian ancestry. Together, our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal (http://azphewas.com/).


Sign in / Sign up

Export Citation Format

Share Document