scholarly journals Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population

Author(s):  
David Curtis

AbstractWeighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.

2021 ◽  
pp. 1-10
Author(s):  
David Curtis

Weighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates, such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using body mass index as the phenotype, the method produces a very inflated test statistic. However, this is almost completely corrected by including 20 population principal components as covariates. When this is done, the top 30 genes include a few which are quite plausibly associated with the phenotype, including <i>LYPLAL1</i> and <i>NSDHL</i>. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


2021 ◽  
Author(s):  
David Curtis

AbstractIntroductionA number of genes have been identified in which rare variants can cause obesity. Here we analyse a sample of exome sequenced subjects from UK Biobank using BMI as a phenotype.MethodsThere were 199,807 exome sequenced subjects for whom BMI was recorded. Weighted burden analysis of rare, functional variants was carried out, incorporating population principal components and sex as covariates. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. Statistical significance was summarised as the signed log 10 of the p value (SLP), given a positive sign if the weighted burden score was positively correlated with BMI.ResultsTwo genes were exome-wide significant, MC4R (SLP = 15.79) and PCSK1 (SLP = 6.61). In MC4R, disruptive variants were associated with an increase in BMI of 2.72 units and probably damaging nonsynonymous variants with an increase of 2.02 units. In PCSK1, disruptive variants were associated with a BMI increase of 2.29 and protein-altering variants with an increase of 0.34. Results for other genes were not formally significant after correction for multiple testing, although SIRT1, ZBED6 and NPC2 were noted to be of potential interest.ConclusionBecause the UK Biobank consists of a self-selected sample of relatively healthy volunteers, the effect sizes noted may be underestimates. The results demonstrate the effects of very rare variants on BMI and suggest that other genes and variants will be definitively implicated when the sequence data for additional subjects becomes available.This research has been conducted using the UK Biobank Resource.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
O.B Vad ◽  
C Paludan-Muller ◽  
G Ahlberg ◽  
L Andreasen ◽  
L Refsgaard ◽  
...  

Abstract Background Atrial Fibrillation (AF) is the most common cardiac arrhythmia, and it is associated with serious complications; including an increased risk of stroke, heart failure, and death. It affects around 5% of the population above 65 years of age, and it is estimated that 2% of healthcare expenses are related to AF. The causes of AF are complex, and includes structural heart disease, hypertension, diabetes and genetic risk factors. To date 166 unique genetic loci have been identified to be associated with AF. While AF has traditionally been regarded as an electrical disease, structural genes, including the sarcomere gene, titin (TTN), has been associated with the disease. Recently, a large genome wide association study associated common variants in the gene MYH6 with AF. The gene encodes the protein alpha myosin heavy chain, and has previously been associated with sick-sinus syndrome and structural heart disease. Purpose We hypothesized that genetic variants in the sarcomere gene MYH6 were more prevalent in AF patients than non-AF patients supporting that this gene is important for the development of AF. Methods We analysed publicly available data from the UK Biobank, combining exome-sequencing data and health-related information on 45,596 participants. Using next-generation sequencing, we then examined the genetic variation in MYH6 in a cohort of 383 Danish, early-onset AF patients. The patients had onset of AF before age 40, had normal echocardiogram, and no other cardiovascular disease at onset of AF. Genetic variants were filtered by minor allele frequency (MAF) in the Genome Aggregation Database (GnomAD), and only rare variants with MAF&lt;1% were included. We then predicted the potential deleteriousness of the variants using combined annotation dependent depletion (CADD) score. Results We found rare coding variants in MYH6 to be significantly associated with AF in exome-sequencing data on 45,596 participants from the UK Biobank (p=0.038). In our cohort of 383 Danish, early-onset AF patients with no other cardiovascular disease, we identified 12 rare, missense variants in MYH6. Of these variants, three were novel, and 11 had CADD scores &gt;20, suggesting them to be in the top 1% of likely deleterious variants. Conclusion We identified rare genetic variants in MYH6 to be significantly associated with AF in a large population-based cohort. We also identified 12 rare coding variants in a highly selected cohort of early-onset AF patients. Most of these variants were predicted to be deleterious. Our results indicate that rare variants in MYH6 may increase susceptibility to AF, thus elaborating on the understanding of the pathophysiological mechanisms of AF, and the role of structural genes in the development of AF. Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): Novo Nordisk Foundation Pre-Graduate Scholarships


2021 ◽  
pp. jmedgenet-2021-107752
Author(s):  
David Curtis

BackgroundA few genes have previously been identified in which very rare variants can have major effects on lipid levels.MethodsWeighted burden analysis of rare variants was applied to exome sequenced UK Biobank subjects with hyperlipidaemia as the phenotype, of whom 44 054 were designated cases and 156 578 controls, with the strength of association characterised by the signed log 10 p value (SLP).ResultsWith principal components included as covariates there was a tendency for genes on the X chromosome to produce strongly negative SLPs, and this was found to be due to the fact that rare X chromosome variants were identified less frequently in men than women. The test performed well when both principal components and sex were included as covariates and strongly implicated LDLR (SLP=50.08) and PCSK9 (SLP=−10.42) while also highlighting other genes previously found to be associated with lipid levels. Variants classified by SIFT as deleterious have on average a twofold effect and their cumulative frequency is such that they are present in approximately 1.5% of the population.ConclusionThese analyses shed further light on the way that genetic variation contributes to risk of hyperlipidaemia and in particular that there are very many protein-altering variants which have on average moderate effects and whose effects can be detected when large samples of exome-sequenced subjects are available. This research has been conducted using the UK Biobank Resource.


2019 ◽  
Vol 6 (2) ◽  
pp. 197
Author(s):  
Sri Winarsih ◽  
Ahmad Alim Bachri ◽  
Akhid Yulianto

<em>Results of multiple linear regression analysis in this study produces constant of 0354 stating that if there is no work Morivasi ( x1 ) and job satisfaction ( x2 ) then job satisfaction is equal to 0.354 . Regression coefficient of work motivation ( x1 ) of 0.396 states that any additions ( as a positive sign ) 1 point will increase the job satisfaction of job satisfaction on job satisfaction assuming 0.396 ( x2 ) fixed . Job satisfaction regression coefficient ( x2 ) of 0.688 states that any additions ( as a positive sign ) 1 point of work motivation will increase employee job satisfaction in 0688 with the notion of work motivation ( x1 ) remains.Significant test simultaneously / together ( test statistic F ) result in calculated F value of 78 145. At Kalsel Bank Syariah Kandangan or it can be said that the work motivation ( x1 ) and job satisfaction ( x2 ) jointly affect the performance of employees at Bank Syariah Kandangan South Kalimantan. Calculations using the t -test,  concluded that motivation is a significant effect on the performance of employees at Bank Syariah Kandangan Kalsel zero  hypothesis ( Ho ) is rejected and Ha accepted, so this hypothesis has been tested empirically.</em><br />


2020 ◽  
Author(s):  
David Curtis

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.


2020 ◽  
Author(s):  
Roni Rasnic ◽  
Nathan Linial ◽  
Michal Linial

AbstractIt is estimated that up to 10% of cancer incidents are attributed to inherited genetic alterations. Despite extensive research, there are still gaps in our understanding of genetic predisposition to cancer. It was theorized that ultra-rare variants partially account for the missing heritable component. We harness the UK BioBank dataset of ∼500,000 individuals, 14% of which were diagnosed with cancer, to detect ultra-rare, possibly high-penetrance cancer predisposition variants. We report on 115 cancer-exclusive ultra-rare variations (CUVs) and nominate 26 variants with additional independent evidence as cancer predisposition variants. We conclude that population cohorts are valuable source for expanding the collection of novel cancer predisposition genes.


2019 ◽  
Author(s):  
Christopher DeBoever ◽  
AJ Venkatakrishnan ◽  
Joseph M Paggi ◽  
Franziska M. Heydenreich ◽  
Suli-Anne Laurin ◽  
...  

AbstractG protein-coupled receptors (GPCRs) drive an array of critical physiological functions and are an important class of drug targets, though a map of which GPCR genetic variants are associated with phenotypic variation is lacking. We performed a phenome-wide association analysis for 269 common protein-altering variants in 156 GPCRs and 275 phenotypes, including disease outcomes and diverse quantitative measurements, using 337,205 UK Biobank participants and identified 138 associations. We discovered novel associations between GPCR variants and migraine risk, hypothyroidism, and dietary consumption. We also demonstrated experimentally that variants in the β2 adrenergic receptor (ADRB2) associated with immune cell counts and pulmonary function and variants in the gastric inhibitory polypeptide receptor (GIPR) associated with food intake and body size affect downstream signaling pathways. Overall, this study provides a map of genetic associations for GPCR coding variants across a wide variety of phenotypes, which can inform future drug discovery efforts targeting GPCRs.


2020 ◽  
Author(s):  
Hai Yang ◽  
Rui Chen ◽  
Quan Wang ◽  
Qiang Wei ◽  
Ying Ji ◽  
...  

Abstract Analysis of whole genome-sequencing (WGS) for genetics of disease is still a challenge due to lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in genetics of human diseases, we hypothesize that noncoding rare variants discovered in WGS play a regulatory role in predisposing disease risk. With thousands of tissue- and cell type-specific epigenomic features, we propose TVAR, a multi-label learning based deep neural network that predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in GTEx. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to learn shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe consistently better performance of TVAR compared to other competing tools.


Sign in / Sign up

Export Citation Format

Share Document