Weighted burden analysis in 200 000 exome-sequenced UK Biobank subjects characterises effects of rare genetic variants on BMI

Mapping Intimacies ◽

10.1101/2021.01.20.21250151 ◽

2021 ◽

Author(s):

David Curtis

Keyword(s):

Multiple Testing ◽

Rare Variants ◽

Sequence Data ◽

Statistical Significance ◽

Positive Sign ◽

P Value ◽

Uk Biobank ◽

Functional Variants ◽

Rare Genetic Variants ◽

The Uk

AbstractIntroductionA number of genes have been identified in which rare variants can cause obesity. Here we analyse a sample of exome sequenced subjects from UK Biobank using BMI as a phenotype.MethodsThere were 199,807 exome sequenced subjects for whom BMI was recorded. Weighted burden analysis of rare, functional variants was carried out, incorporating population principal components and sex as covariates. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. Statistical significance was summarised as the signed log 10 of the p value (SLP), given a positive sign if the weighted burden score was positively correlated with BMI.ResultsTwo genes were exome-wide significant, MC4R (SLP = 15.79) and PCSK1 (SLP = 6.61). In MC4R, disruptive variants were associated with an increase in BMI of 2.72 units and probably damaging nonsynonymous variants with an increase of 2.02 units. In PCSK1, disruptive variants were associated with a BMI increase of 2.29 and protein-altering variants with an increase of 0.34. Results for other genes were not formally significant after correction for multiple testing, although SIRT1, ZBED6 and NPC2 were noted to be of potential interest.ConclusionBecause the UK Biobank consists of a self-selected sample of relatively healthy volunteers, the effect sizes noted may be underestimates. The results demonstrate the effects of very rare variants on BMI and suggest that other genes and variants will be definitively implicated when the sequence data for additional subjects becomes available.This research has been conducted using the UK Biobank Resource.

Download Full-text

Analysis of exome-sequenced UK Biobank subjects implicates genes affecting risk of hyperlipidaemia

10.1101/2020.07.09.20150334 ◽

2020 ◽

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Lipid Lowering ◽

P Value ◽

Data Sets ◽

Uk Biobank ◽

Functional Studies ◽

Exome Sequence Data ◽

Rare Genetic Variants ◽

The Uk

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.

Download Full-text

Analysis of 50,000 exome-sequenced UK Biobank subjects fails to identify genes influencing probability of psychiatric referral

10.1101/2020.07.16.20155267 ◽

2020 ◽

Author(s):

David Curtis

Keyword(s):

Multiple Testing ◽

Rare Variants ◽

Sequence Data ◽

Statistical Significance ◽

Major Effect ◽

Self Report ◽

Uk Biobank ◽

Functional Variants ◽

Exome Sequence Data ◽

Exome Sequence

Background Depression is moderately heritable but there is no common genetic variant which has a major effect on susceptibility. It is possible that some very rare variants could have substantial effect sizes and these could be identified from exome sequence data. Methods Data from 50,000 exome-sequenced UK Biobank participants was analysed. Subjects were treated as cases if they had reported having seen a psychiatrist for "nerves, anxiety, tension or depression". Gene-wise weighted burden analysis was performed to see if there were any genes or sets of genes for which there was an excess of rare, functional variants in cases. Results There were 5,872 cases and 43,862 controls. There were 22,028 informative genes but none produced a statistically significant result after correction for multiple testing. Of the 25 genes individually significant at p<0.001 none appeared to be a biologically plausible candidate. No set of genes achieved statistical significance after correction for multiple testing and those with the lowest p values again did not appear to be biologically plausible candidates. Limitations The phenotype is based on self-report and the cases are likely to somewhat heterogeneous. The number of cases is on the low side for a study of exome sequence data. Conclusions The results conform exactly with the expectation under the null hypothesis. It seems unlikely that depression genetics research will produce findings that might have a substantial clinical impact until far larger samples become available.

Download Full-text

Analysis of 200 000 exome-sequenced UK Biobank subjects illustrates the contribution of rare genetic variants to hyperlipidaemia

Journal of Medical Genetics ◽

10.1136/jmedgenet-2021-107752 ◽

2021 ◽

pp. jmedgenet-2021-107752

Author(s):

David Curtis

Keyword(s):

X Chromosome ◽

Principal Components ◽

Rare Variants ◽

P Value ◽

Uk Biobank ◽

Lipid Levels ◽

Large Samples ◽

Rare Genetic Variants ◽

The Uk ◽

Strength Of Association

BackgroundA few genes have previously been identified in which very rare variants can have major effects on lipid levels.MethodsWeighted burden analysis of rare variants was applied to exome sequenced UK Biobank subjects with hyperlipidaemia as the phenotype, of whom 44 054 were designated cases and 156 578 controls, with the strength of association characterised by the signed log 10 p value (SLP).ResultsWith principal components included as covariates there was a tendency for genes on the X chromosome to produce strongly negative SLPs, and this was found to be due to the fact that rare X chromosome variants were identified less frequently in men than women. The test performed well when both principal components and sex were included as covariates and strongly implicated LDLR (SLP=50.08) and PCSK9 (SLP=−10.42) while also highlighting other genes previously found to be associated with lipid levels. Variants classified by SIFT as deleterious have on average a twofold effect and their cumulative frequency is such that they are present in approximately 1.5% of the population.ConclusionThese analyses shed further light on the way that genetic variation contributes to risk of hyperlipidaemia and in particular that there are very many protein-altering variants which have on average moderate effects and whose effects can be detected when large samples of exome-sequenced subjects are available. This research has been conducted using the UK Biobank Resource.

Download Full-text

Investigation of association of rare, functional genetic variants with heavy drinking and problem drinking in exome sequenced UK Biobank participants

10.1101/2021.02.04.21251145 ◽

2021 ◽

Author(s):

David Curtis

Keyword(s):

Genetic Variants ◽

Multiple Testing ◽

Sequence Data ◽

Problem Drinking ◽

Heavy Drinking ◽

Uk Biobank ◽

Exome Sequence Data ◽

The Uk ◽

Or Genes ◽

Mental Health Questionnaire

AbstractAimsThe study aimed to identify specific genes and functional genetic variants affecting susceptibility to two alcohol related phenotypes: heavy drinking and problem drinking.MethodsPhenotypic and exome sequence data was downloaded from the UK Biobank. Reported drinks in the last 24 hours was used to define heavy drinking while responses to a mental health questionnaire defined problem drinking. Gene-wise weighted burden analysis was applied, with genetic variants which were rarer and/or had a more severe functional effect being weighted more highly. Additionally, previously reported variants of interest were analysed inidividually.ResultsOf exome sequenced subjects, for heavy drinking there were 8,166 cases and 84,461 controls while for problem drinking there were 7,811 cases and 59,606 controls. No gene was formally significant after correction for multiple testing but three genes possibly related to autism were significant at p < 0.001, FOXP1, ARHGAP33 and CDH9, along with VGF which may also be of psychiatric interest. Well established associations with rs1229984 in ADH1B and rs671 in ALDH2 were confirmed but previously reported variants in ALDH1B1 and GRM3 were not associated with either phenotype.ConclusionsThis large study fails to conclusively implicate any novel genes or variants. It is possible that more definitive results will be obtained when sequence data for the remaining UK Biobank participants becomes available and/or if data can be obtained for a more extreme phenotype such as alcohol dependence disorder. This research has been conducted using the UK Biobank Resource.Short summaryTests for association of rare, functional genetic variants with heavy drinking and problem drinking confirm the known effects of variants in ADH1B and ALDH2 but fail to implicate novel variants or genes. Results for three genes potentially related to autism suggest they might exert a protective effect.

Download Full-text

Variants in ACE2 and TMPRSS2 genes are not major determinants of COVID-19 severity in UK Biobank subjects

10.1101/2020.05.01.20085860 ◽

2020 ◽

Cited By ~ 4

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Severe Disease ◽

Major Effect ◽

Uk Biobank ◽

Exome Sequence Data ◽

Dna Sequence Variants ◽

The Uk ◽

And Function ◽

Exome Sequence

AbstractIt is plausible that variants in the ACE2 and TMPRSS2 genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects of whom 74 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in ACE2 or TMPRSS2. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not a major determinant of whether infection with SARS-CoV-2 results in severe symptoms. This research has been conducted using the UK Biobank Resource.

Download Full-text

Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population

10.1101/2020.06.11.145938 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Curtis

Keyword(s):

Linear Regression ◽

Principal Components ◽

Rare Variants ◽

Linear Regression Analysis ◽

Uk Biobank ◽

Case Control Studies ◽

Test Statistic ◽

Functional Variants ◽

The Uk ◽

Coding Variants

AbstractWeighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.

Download Full-text

Variants in ACE2 and TMPRSS2 Genes Are Not Major Determinants of COVID-19 Severity in UK Biobank Subjects

Human Heredity ◽

10.1159/000515200 ◽

2021 ◽

pp. 1-3

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Severe Disease ◽

Major Effect ◽

Uk Biobank ◽

Exome Sequence Data ◽

Dna Sequence Variants ◽

The Uk ◽

And Function ◽

Exome Sequence

It is plausible that variants in the ACE2 and TMPRSS2 genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects, of whom 82 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in ACE2 or TMPRSS2. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not the main explanation for why some people develop severe symptoms in response to infection with SARS-CoV-2. This research was conducted using the UK Biobank Resource.

Download Full-text

Analysis of 200,000 exome-sequenced UK Biobank subjects illustrates the contribution of rare genetic variants to hyperlipidaemia

10.1101/2021.01.05.20249090 ◽

2021 ◽

Author(s):

David Curtis

Keyword(s):

X Chromosome ◽

Principal Components ◽

Rare Variants ◽

P Value ◽

Uk Biobank ◽

Loss Of Function ◽

Lipid Levels ◽

Large Samples ◽

Rare Genetic Variants ◽

Strength Of Association

AbstractA few genes have previously been identified in which very rare variants can have major effects on lipid levels. Weighted burden analysis of rare variants was applied to 200,000 exome sequenced UK Biobank subjects with hyperlipidaemia as the phenotype, with the strength of association characterised by the signed log 10 p value (SLP). With principal components included as covariates there was a tendency for genes on the X chromosome to produce strongly negative SLPs, and this was found to be due to the fact that rare X chromosome variants were identified less frequently in males than females. The test performed well when both principal components and sex were included as covariates and strongly implicated LDLR (SLP = 50.08) and PCSK9 (SLP = −10.42) while also highlighting other genes previously found to be associated with lipid levels. Category-specific analyses of variants in these two genes revealed that, while there were loss of function variants with major effects, there were much larger numbers of protein-altering variants which had moderate effects on risk of hyperlipidaemia. Variants classified by SIFT as deleterious have on average a two-fold effect and their cumulative frequency is such that they are present in approximately 1.5% of the population. There was no evidence for association of HUWE1, which had produced statistically significant results in an earlier analysis of a subset of 50,000 exomes, and with hindsight this result had been caused by the excess of X gene variants in females. These analyses shed further light on the way that genetic variation contributes to risk of hyperlipidaemia and in particular that there are very many protein-altering variants which have on average moderate effects and whose effects can be detected when large samples of exome-sequenced subjects are available.

Download Full-text

Intake of B vitamins in UK dwelling South Asian and White Caucasian women: Results from the D-FINES study

Proceedings of The Nutrition Society ◽

10.1017/s0029665120004127 ◽

2020 ◽

Vol 79 (OCE2) ◽

Author(s):

Andrea Darling ◽

Kourosh Ahmadi ◽

Susan Lanham-New

Keyword(s):

South Asian ◽

Multiple Testing ◽

Statistical Significance ◽

B Vitamins ◽

P Value ◽

Vitamin Intake ◽

Food Standards ◽

Caucasian Women ◽

The Uk ◽

B Vitamin

AbstractAdequate intakes of the B vitamins are essential for health; however there is a lack of data concerning B vitamin intakes in UK dwelling South Asian (SA) groups. We aimed to investigate whether UK SA women meet the LRNI for B vitamins, and whether their intake differs from same-age White Caucasian (WC) women. We used summer 2006 dietary intake data from the Food Standards Agency (FSA) funded D-FINES study (Vitamin D, Food Intake, Nutrition and Exposure to Sunlight in Southern England, project N05064). After removal of over- and under-reporters (energy: BMR ratio < 1 or > 1.6) there were n = 29 SA and n = 146 WC subjects. The two groups did not differ significantly in age and BMI. Overall mean (SD) for age was 50.6 (13.6) years and for BMI was 26.8 (4.8). In SA, 41% were Bangladeshi or Pakistani, 28% were Indian and 31% were of other ethnicity. Independent T-tests, using log transformed data, showed no statistically significant differences for any B vitamin (Bonferroni revised p value: < 0.008). Results were as follows, giving median (IQR): Thiamine (mg) 1.5 (0.5) SA vs 1.4 (0.5) WC; (P = 0.8); Riboflavin (mg) 1.3 (0.5) SA vs. 1.5 (0.6) WC (P = 0.08); Niacin (mg) 30.7 (13.7) SA vs 33.3 (9.8) WC (P = 0.4); B6 (mg) 1.7 (0.5) SA vs 1.9 (0.7) WC (P = 0.2); B12 (micrograms) 2.8 (0.05) SA vs 3.6 (2.5) WC (P = 0.02); Folate (micrograms) 213 (93) SA vs 231 (82) WC (P = 0.8). In terms of percentages below the LRNI: Thiamine 0% SA and 0.7% (n = 1) WC; Riboflavin 0% SA and 1.4% (n = 2) WC; B12 10% (n = 3) SA and 0% WC. For Niacin, B6 and Folate no women in either group were below the LRNI. Overall, there were no ethnic differences in B vitamin intake by ethnicity. There was a trend for a slightly lower B12 intake in SA but this did not reach statistical significance after adjustment for multiple testing. It is of concern that 10% of SA did not meet the LRNI for B12. Of this 10%, the majority were not vegetarian or vegan. The sample size for SA was very small and further research is now required in a larger sample to confirm this finding. The D-FINES study was funded by the UK FSA (N05064). The views expressed are those of the authors alone.

Download Full-text

Multiple Linear Regression Allows Weighted Burden Analysis of Rare Coding Variants in an Ethnically Heterogeneous Population

Human Heredity ◽

10.1159/000512576 ◽

2021 ◽

pp. 1-10

Author(s):

David Curtis

Keyword(s):

Linear Regression ◽

Principal Components ◽

Rare Variants ◽

Linear Regression Analysis ◽

Uk Biobank ◽

Case Control Studies ◽

Test Statistic ◽

Functional Variants ◽

The Uk ◽

Coding Variants

Weighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates, such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using body mass index as the phenotype, the method produces a very inflated test statistic. However, this is almost completely corrected by including 20 population principal components as covariates. When this is done, the top 30 genes include a few which are quite plausibly associated with the phenotype, including <i>LYPLAL1</i> and <i>NSDHL</i>. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.

Download Full-text