scholarly journals Inferring the mode and strength of ongoing selection

2021 ◽  
Author(s):  
Gustavo Valadares Barroso ◽  
Kirk Lohmueller

Genome sequence data is no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics towards sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges for using these data to improve health and agriculture in the pursuit of a sustainable future. Existing population genetic methods are designed to model hundreds of randomly sampled sequences, but are not optimized for extracting the information contained in the larger and richer datasets that are beginning to emerge, with thousands of closely related individuals. Here we develop a new method called TIDES (Trio-based Inference of Dominance and Selection) that uses data from tens of thousands of family trios to make inferences about natural selection acting in a single generation. TIDES further improves on the state-of-the-art by making no assumptions regarding demography, linkage or dominance. We discuss how our method paves the way for studying natural selection from new angles.

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Paul Carter ◽  
Mathew Vithayathil ◽  
Siddhartha Kar ◽  
Rahul Potluri ◽  
Amy M Mason ◽  
...  

Laboratory studies have suggested oncogenic roles of lipids, as well as anticarcinogenic effects of statins. Here we assess the potential effect of statin therapy on cancer risk using evidence from human genetics. We obtained associations of lipid-related genetic variants with the risk of overall and 22 site-specific cancers for 367,703 individuals in the UK Biobank. In total, 75,037 individuals had a cancer event. Variants in the HMGCR gene region, which represent proxies for statin treatment, were associated with overall cancer risk (odds ratio [OR] per one standard deviation decrease in low-density lipoprotein [LDL] cholesterol 0.76, 95% confidence interval [CI] 0.65–0.88, p=0.0003) but variants in gene regions representing alternative lipid-lowering treatment targets (PCSK9, LDLR, NPC1L1, APOC3, LPL) were not. Genetically predicted LDL-cholesterol was not associated with overall cancer risk (OR per standard deviation increase 1.01, 95% CI 0.98–1.05, p=0.50). Our results predict that statins reduce cancer risk but other lipid-lowering treatments do not. This suggests that statins reduce cancer risk through a cholesterol independent pathway.


2021 ◽  
Author(s):  
Duncan S Palmer ◽  
Wei Zhou ◽  
Liam Abbott ◽  
Nik Baya ◽  
Claire Churchhouse ◽  
...  

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.


2020 ◽  
Author(s):  
David Curtis

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.


2021 ◽  
Author(s):  
David Curtis

AbstractAimsThe study aimed to identify specific genes and functional genetic variants affecting susceptibility to two alcohol related phenotypes: heavy drinking and problem drinking.MethodsPhenotypic and exome sequence data was downloaded from the UK Biobank. Reported drinks in the last 24 hours was used to define heavy drinking while responses to a mental health questionnaire defined problem drinking. Gene-wise weighted burden analysis was applied, with genetic variants which were rarer and/or had a more severe functional effect being weighted more highly. Additionally, previously reported variants of interest were analysed inidividually.ResultsOf exome sequenced subjects, for heavy drinking there were 8,166 cases and 84,461 controls while for problem drinking there were 7,811 cases and 59,606 controls. No gene was formally significant after correction for multiple testing but three genes possibly related to autism were significant at p < 0.001, FOXP1, ARHGAP33 and CDH9, along with VGF which may also be of psychiatric interest. Well established associations with rs1229984 in ADH1B and rs671 in ALDH2 were confirmed but previously reported variants in ALDH1B1 and GRM3 were not associated with either phenotype.ConclusionsThis large study fails to conclusively implicate any novel genes or variants. It is possible that more definitive results will be obtained when sequence data for the remaining UK Biobank participants becomes available and/or if data can be obtained for a more extreme phenotype such as alcohol dependence disorder. This research has been conducted using the UK Biobank Resource.Short summaryTests for association of rare, functional genetic variants with heavy drinking and problem drinking confirm the known effects of variants in ADH1B and ALDH2 but fail to implicate novel variants or genes. Results for three genes potentially related to autism suggest they might exert a protective effect.


2020 ◽  
Vol 36 (16) ◽  
pp. 4519-4520
Author(s):  
Ying Zhou ◽  
Sharon R Browning ◽  
Brian L Browning

Abstract Motivation Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. Results We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. Availability and implementation https://github.com/YingZhou001/IBDkin. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
David Curtis

AbstractIt is plausible that variants in the ACE2 and TMPRSS2 genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects of whom 74 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in ACE2 or TMPRSS2. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not a major determinant of whether infection with SARS-CoV-2 results in severe symptoms. This research has been conducted using the UK Biobank Resource.


2021 ◽  
Author(s):  
David Curtis

AbstractIntroductionA number of genes have been identified in which rare variants can cause obesity. Here we analyse a sample of exome sequenced subjects from UK Biobank using BMI as a phenotype.MethodsThere were 199,807 exome sequenced subjects for whom BMI was recorded. Weighted burden analysis of rare, functional variants was carried out, incorporating population principal components and sex as covariates. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. Statistical significance was summarised as the signed log 10 of the p value (SLP), given a positive sign if the weighted burden score was positively correlated with BMI.ResultsTwo genes were exome-wide significant, MC4R (SLP = 15.79) and PCSK1 (SLP = 6.61). In MC4R, disruptive variants were associated with an increase in BMI of 2.72 units and probably damaging nonsynonymous variants with an increase of 2.02 units. In PCSK1, disruptive variants were associated with a BMI increase of 2.29 and protein-altering variants with an increase of 0.34. Results for other genes were not formally significant after correction for multiple testing, although SIRT1, ZBED6 and NPC2 were noted to be of potential interest.ConclusionBecause the UK Biobank consists of a self-selected sample of relatively healthy volunteers, the effect sizes noted may be underestimates. The results demonstrate the effects of very rare variants on BMI and suggest that other genes and variants will be definitively implicated when the sequence data for additional subjects becomes available.This research has been conducted using the UK Biobank Resource.


2019 ◽  
Author(s):  
Hakhamanesh Mostafavi ◽  
Arbel Harpak ◽  
Dalton Conley ◽  
Jonathan K Pritchard ◽  
Molly Przeworski

AbstractFields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group, the prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.


2021 ◽  
Author(s):  
Konrad Karczewski ◽  
Matthew Solomonson ◽  
Katherine R Chao ◽  
Julia K Goodrich ◽  
Grace Tiao ◽  
...  

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 3,700 phenotypes using single-variant and gene tests of 281,850 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside a browser framework for rapidly exploring rare variant association results.


Author(s):  
Joseph D. Szustakowski ◽  
Suganthi Balasubramanian ◽  
Ariella Sasson ◽  
Shareef Khalid ◽  
Paola G. Bronson ◽  
...  

AbstractThe UK Biobank Exome Sequencing Consortium (UKB-ESC) is a unique private/public partnership between the UK Biobank and eight biopharma companies that will sequence the exomes of all ∼500,000 UK Biobank participants. Here we describe early results from the exome sequence data generated by this consortium for the first ∼200,000 UKB subjects and the key features of this project that enabled the UKB-ESC to come together and generate this data.Exome sequencing data from the first 200,643 UKB enrollees are now accessible to the research community. Approximately 10M variants were observed within the targeted regions, including: 8,086,176 SNPs, 370,958 indels and 1,596,984 multi-allelic variants. Of the ∼8M variants observed, 84.5% are coding variants and include 2,139,318 (25.3%) synonymous, 4,549,694 (53.8%) missense, 453,733 (5.4%) predicted loss-of-function (LOF) variants (initiation codon loss, premature stop codons, stop codon loss, splicing and frameshift variants) affecting at least one coding transcript. This open access data provides a rich resource of coding variants for rare variant genetic studies, and is particularly valuable for drug discovery efforts that utilize rare, functionally consequential variants.Over the past decade, the biopharma industry has increasingly leveraged human genetics as part of their drug discovery and development strategies. This shift was motivated by technical advances that enabled cost-effective human genetics research at scale, the emergence of electronic health records and biobanks, and a maturing understanding of how human genetics can increase the probability of successful drug development. Recognizing the need for large-scale human genetics data to drive drug discovery, and the unique value of the open data access policies and contribution terms of the UK Biobank, the UKB-ESC was formed. This precompetitive collaboration has further strengthened the ties between academia and industry and provided teams an unprecedented opportunity to interact with and learn from the wider research community.


Sign in / Sign up

Export Citation Format

Share Document