scholarly journals Imputation-Based Genomic Coverage Assessments of Current Genotyping Arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank

2017 ◽  
Author(s):  
Sarah C. Nelson ◽  
Jane M. Romm ◽  
Kimberly F. Doheny ◽  
Elizabeth W. Pugh ◽  
Cathy C. Laurie

Genotyping arrays have been widely adopted as an efficient means to interrogate variation across the human genome. Genetic variants may be observed either directly, via genotyping, or indirectly, through linkage disequilibrium with a genotyped variant. The total proportion of genomic variation captured by an array, either directly or indirectly, is referred to as “genomic coverage.” Here we use genotype imputation and Phase 3 of the 1000 Genomes Project to assess genomic coverage of several modern genotyping arrays. We find that in general, coverage increases with increasing array density. However, arrays designed to cover specific populations may yield better coverage in those populations compared to denser arrays not tailored to the given population. Ultimately, array choice involves trade-offs between cost, density, and coverage, and our work helps inform investigators weighing these choices and trade-offs.

2021 ◽  
Author(s):  
Filip Ruzicka ◽  
Luke Holman ◽  
Tim Connallon

AbstractMutations that increase fitness in one sex may decrease fitness in the other. Such “sexually antagonistic” (SA) genetic variants can constrain adaptation and increase variability for fitness components (e.g., survival, fertility, and disease susceptibility). However, detecting SA selection in genomes is immensely challenging, as it requires prohibitively large datasets that combine genomic sequences with individual fitness measurements. Here, we use genotypic and reproductive success data from ∼250,000 UK Biobank individuals to comprehensively assess the extent of SA genetic variation in humans. We first develop new theoretical models for signals of SA selection spanning a full generational life cycle—including SA polymorphisms affecting survival, reproductive success and overall fitness. Comparing our models with UK Biobank data, we uncover multiple empirical signals of polygenic SA selection, including sex-differential effects of genetic variants on each fitness component, and positive correlations between sex-differential effects and minor allele frequencies. We show that these signals cannot be explained by simple models of sex differences in purifying selection, or by potential confounders such as population structure and sequence mapping errors. We further show that candidate SA sites disproportionately affect functional genomic regions, including polymorphisms associated with quantitative traits and disease. Finally, we examine historical evolutionary processes affecting candidate SA sites, which are consistent with the drift-dominated dynamics predicted by previous theory. Overall, our results support SA genomic variation in humans and highlight its broader functional and evolutionary consequences.


BMJ ◽  
2019 ◽  
pp. l476 ◽  
Author(s):  
Shan Luo ◽  
Shiu Lun Au Yeung ◽  
Jie V Zhao ◽  
Stephen Burgess ◽  
C Mary Schooling

Abstract Objective To determine whether endogenous testosterone has a causal role in thromboembolism, heart failure, and myocardial infarction. Design Two sample mendelian randomisation study using genetic variants as instrumental variables, randomly allocated at conception, to infer causality as additional randomised evidence. Setting Reduction by Dutasteride of Prostate Cancer Events (REDUCE) randomised controlled trial, UK Biobank, and CARDIoGRAMplusC4D 1000 Genomes based genome wide association study. Participants 3225 men of European ancestry aged 50-75 in REDUCE; 392 038 white British men and women aged 40-69 from the UK Biobank; and 171 875 participants of about 77% European descent, from CARDIoGRAMplusC4D 1000 Genomes based study for validation. Main outcome measures Thromboembolism, heart failure, and myocardial infarction based on self reports, hospital episodes, and death. Results Of the UK Biobank participants, 13 691 had thromboembolism (6208 men, 7483 women), 1688 had heart failure (1186, 502), and 12 882 had myocardial infarction (10 136, 2746). In men, endogenous testosterone genetically predicted by variants in the JMJD1C gene region was positively associated with thromboembolism (odds ratio per unit increase in log transformed testosterone (nmol/L) 2.09, 95% confidence interval 1.27 to 3.46) and heart failure (7.81, 2.56 to 23.8), but not myocardial infarction (1.17, 0.78 to 1.75). Associations were less obvious in women. In the validation study, genetically predicted testosterone (based on JMJD1C gene region variants) was positively associated with myocardial infarction (1.37, 1.03 to 1.82). No excess heterogeneity was observed among genetic variants in their associations with the outcomes. However, testosterone genetically predicted by potentially pleiotropic variants in the SHBG gene region had no association with the outcomes. Conclusions Endogenous testosterone was positively associated with thromboembolism, heart failure, and myocardial infarction in men. Rates of these conditions are higher in men than women. Endogenous testosterone can be controlled with existing treatments and could be a modifiable risk factor for thromboembolism and heart failure.


2017 ◽  
Author(s):  
Sina Rüeger ◽  
Aaron McDaid ◽  
Zoltán Kutalik

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jessica Tyrrell ◽  
Jie Zheng ◽  
Robin Beaumont ◽  
Kathryn Hinton ◽  
Tom G. Richardson ◽  
...  

AbstractLarge studies such as UK Biobank are increasingly used for GWAS and Mendelian randomization (MR) studies. However, selection into and dropout from studies may bias genetic and phenotypic associations. We examine genetic factors affecting participation in four optional components in up to 451,306 UK Biobank participants. We used GWAS to identify genetic variants associated with participation, MR to estimate effects of phenotypes on participation, and genetic correlations to compare participation bias across different studies. 32 variants were associated with participation in one of the optional components (P < 6 × 10−9), including loci with links to intelligence and Alzheimer’s disease. Genetic correlations demonstrated that participation bias was common across studies. MR showed that longer educational duration, older menarche and taller stature increased participation, whilst higher levels of adiposity, dyslipidaemia, neuroticism, Alzheimer’s and schizophrenia reduced participation. Our effect estimates can be used for sensitivity analysis to account for selective participation biases in genetic or non-genetic analyses.


Nutrients ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 2218
Author(s):  
Shuai Yuan ◽  
Paul Carter ◽  
Amy M. Mason ◽  
Stephen Burgess ◽  
Susanna C. Larsson

Coffee consumption has been linked to a lower risk of cardiovascular disease in observational studies, but whether the associations are causal is not known. We conducted a Mendelian randomization investigation to assess the potential causal role of coffee consumption in cardiovascular disease. Twelve independent genetic variants were used to proxy coffee consumption. Summary-level data for the relations between the 12 genetic variants and cardiovascular diseases were taken from the UK Biobank with up to 35,979 cases and the FinnGen consortium with up to 17,325 cases. Genetic predisposition to higher coffee consumption was not associated with any of the 15 studied cardiovascular outcomes in univariable MR analysis. The odds ratio per 50% increase in genetically predicted coffee consumption ranged from 0.97 (95% confidence interval (CI), 0.63, 1.50) for intracerebral hemorrhage to 1.26 (95% CI, 1.00, 1.58) for deep vein thrombosis in the UK Biobank and from 0.86 (95% CI, 0.50, 1.49) for subarachnoid hemorrhage to 1.34 (95% CI, 0.81, 2.22) for intracerebral hemorrhage in FinnGen. The null findings remained in multivariable Mendelian randomization analyses adjusted for genetically predicted body mass index and smoking initiation, except for a suggestive positive association for intracerebral hemorrhage (odds ratio 1.91; 95% CI, 1.03, 3.54) in FinnGen. This Mendelian randomization study showed limited evidence that coffee consumption affects the risk of developing cardiovascular disease, suggesting that previous observational studies may have been confounded.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Paul Carter ◽  
Mathew Vithayathil ◽  
Siddhartha Kar ◽  
Rahul Potluri ◽  
Amy M Mason ◽  
...  

Laboratory studies have suggested oncogenic roles of lipids, as well as anticarcinogenic effects of statins. Here we assess the potential effect of statin therapy on cancer risk using evidence from human genetics. We obtained associations of lipid-related genetic variants with the risk of overall and 22 site-specific cancers for 367,703 individuals in the UK Biobank. In total, 75,037 individuals had a cancer event. Variants in the HMGCR gene region, which represent proxies for statin treatment, were associated with overall cancer risk (odds ratio [OR] per one standard deviation decrease in low-density lipoprotein [LDL] cholesterol 0.76, 95% confidence interval [CI] 0.65–0.88, p=0.0003) but variants in gene regions representing alternative lipid-lowering treatment targets (PCSK9, LDLR, NPC1L1, APOC3, LPL) were not. Genetically predicted LDL-cholesterol was not associated with overall cancer risk (OR per standard deviation increase 1.01, 95% CI 0.98–1.05, p=0.50). Our results predict that statins reduce cancer risk but other lipid-lowering treatments do not. This suggests that statins reduce cancer risk through a cholesterol independent pathway.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2019 ◽  
Vol 21 (1) ◽  
Author(s):  
Ravi K. Narang ◽  
Ruth Topless ◽  
Murray Cadzow ◽  
Greg Gamble ◽  
Lisa K. Stamp ◽  
...  

2019 ◽  
Vol 49 (4) ◽  
pp. 1147-1158 ◽  
Author(s):  
Jessica M B Rees ◽  
Christopher N Foley ◽  
Stephen Burgess

Abstract Background Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. Although the approach has been used in applied investigations, little methodological advice is available on how to design or perform a factorial Mendelian randomization analysis. Previous analyses have employed a 2 × 2 approach, using dichotomized genetic scores to divide the population into four subgroups as in a factorial randomized trial. Methods We describe two distinct contexts for factorial Mendelian randomization: investigating interactions between risk factors, and investigating interactions between pharmacological interventions on risk factors. We propose two-stage least squares methods using all available genetic variants and their interactions as instrumental variables, and using continuous genetic scores as instrumental variables rather than dichotomized scores. We illustrate our methods using data from UK Biobank to investigate the interaction between body mass index and alcohol consumption on systolic blood pressure. Results Simulated and real data show that efficiency is maximized using the full set of interactions between genetic variants as instruments. In the applied example, between 4- and 10-fold improvement in efficiency is demonstrated over the 2 × 2 approach. Analyses using continuous genetic scores are more efficient than those using dichotomized scores. Efficiency is improved by finding genetic variants that divide the population at a natural break in the distribution of the risk factor, or else divide the population into more equal-sized groups. Conclusions Previous factorial Mendelian randomization analyses may have been underpowered. Efficiency can be improved by using all genetic variants and their interactions as instrumental variables, rather than the 2 × 2 approach.


PLoS ONE ◽  
2012 ◽  
Vol 7 (11) ◽  
pp. e50610 ◽  
Author(s):  
Dana B. Hancock ◽  
Joshua L. Levy ◽  
Nathan C. Gaddis ◽  
Laura J. Bierut ◽  
Nancy L. Saccone ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document