scholarly journals UKB.COVID19: an R package for UK Biobank COVID-19 data processing and analysis

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 830
Author(s):  
Longfei Wang ◽  
Victoria E Jackson ◽  
Liam G Fearnley ◽  
Melanie Bahlo

COVID-19 caused by SARS-CoV-2 has resulted in a global pandemic with a rapidly developing global health and economic crisis. Variations in the disease have been observed and have been associated with the genomic sequence of either the human host or the pathogen. Worldwide scientists scrambled initially to recruit patient cohorts to try and identify risk factors. A resource that presented itself early on was the UK Biobank (UKBB), which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. To enable COVID-19 studies, UKBB is now receiving COVID-19 test data for their participants every two weeks. In addition, UKBB is delivering more frequent updates of death and hospital inpatient data (including critical care admissions) on the UKBB Data Portal. This frequently changing dataset requires a tool that can rapidly process and analyse up-to-date data. We developed an R package specifically for the UKBB COVID-19 data, which summarises COVID-19 test results, performs association tests between COVID-19 susceptibility/severity and potential risk factors such as age, sex, blood type, comorbidities and generates input files for genome-wide association studies (GWAS). By applying the R package to data released in April 2021, we found that age, body mass index, socioeconomic status and smoking are positively associated with COVID-19 susceptibility, severity, and mortality. Males are at a higher risk of COVID-19 infection than females. People staying in aged care homes have a higher chance of being exposed to SARS-CoV-2. By performing GWAS, we replicated the 3p21.31 genetic finding for COVID-19 susceptibility and severity. The ability to iteratively perform such analyses is highly relevant since the UKBB data is updated frequently. As a caveat, users must arrange their own access to the UKBB data to use the R package.

2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Joline W J Beulens ◽  
Joey Nap ◽  
Arno R van der Slik ◽  
Arnaud Zaldumbide ◽  
...  

Abstract Numerous large genome-wide association studies have been performed to understand the influence of genetics on traits. Many identified risk loci are in non-coding and intergenic regions, which complicates understanding how genes and their downstream pathways are influenced. An integrative data approach is required to understand the mechanism and consequences of identified risk loci. Here, we developed the R-package CONQUER. Data for SNPs of interest are acquired from static- and dynamic repositories (build GRCh38/hg38), including GTExPortal, Epigenomics Project, 4D genome database and genome browsers. All visualizations are fully interactive so that the user can immediately access the underlying data. CONQUER is a user-friendly tool to perform an integrative approach on multiple SNPs where risk loci are not seen as individual risk factors but rather as a network of risk factors.


Author(s):  
Sayoni Das ◽  
Krystyna Taylor ◽  
Matthew Pearson ◽  
James Kozubek ◽  
Marcin Pawlowski ◽  
...  

ABSTRACTBACKGROUNDCoronavirus disease 2019 (COVID-19) is a novel coronavirus strain disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease is highly transmissible and severe disease including viral sepsis has been reported in up to 16% of hospitalized cases. The admission characteristics associated with increased odds of hospital mortality among confirmed cases of COVID-19 include severe hypoxia, low platelet count, elevated bilirubin, hypoalbuminemia and reduced glomerular filtration rate. These symptoms correlate highly with severe sepsis cases. The diseases also share similar comorbidity risks including dementia, type 2 diabetes mellitus, coronary heart disease, hypertension and chronic renal failure. Sepsis has been observed in up to 59% of hospitalized COVID-19 patients.It is highly desirable to identify risk factors and novel therapy/drug repurposing avenues for late-stage severe COVID-19 patients. This would enable better protection of at-risk populations and clinical stratification of COVID-19 patients according to their risk for developing life threatening disease.METHODSAs there is currently insufficient data available for confirmed COVID-19 patients correlating their genomic profile, disease severity and outcome, co-morbidities and treatments as well as epidemiological risk factors (such as ethnicity, blood group, smoking, BMI etc.), a direct study of the impact of host genomics on disease severity and outcomes is not yet possible. We therefore ran a study on the UK Biobank sepsis cohort as a surrogate to identify sepsis associated signatures and genes, and correlated these with COVID-19 patients.Sepsis is itself a life-threatening inflammatory health condition with a mortality rate of approximately 20%. Like the initial studies for COVID-19 patients, standard genome wide association studies (GWAS) have previously failed to identify more than a handful of genetic variants that predispose individuals to developing sepsis.RESULTSWe used a combinatorial association approach to analyze a sepsis population derived from UK Biobank. We identified 70 sepsis risk-associated genes, which provide insights into the disease mechanisms underlying sepsis pathogenesis. Many of these targets can be grouped by common mechanisms of action such as endothelial cell dysfunction, PI3K/mTOR pathway signaling, immune response regulation, aberrant GABA and neurogenic signaling.CONCLUSIONThis study has identified 70 sepsis related genes, many of them for the first time, that can reasonably be considered to be potentially relevant to severe COVID-19 patients. We have further identified 59 drug repurposing candidates for 13 of these targets that can be used for the development of novel therapeutic strategies to increase the survival rate of patients who develop sepsis and potentially severe COVID-19.


Gut ◽  
2021 ◽  
pp. gutjnl-2020-323906
Author(s):  
Jue-Sheng Ong ◽  
Jiyuan An ◽  
Xikun Han ◽  
Matthew H Law ◽  
Priyanka Nandakumar ◽  
...  

ObjectiveGastro-oesophageal reflux disease (GERD) has heterogeneous aetiology primarily attributable to its symptom-based definitions. GERD genome-wide association studies (GWASs) have shown strong genetic overlaps with established risk factors such as obesity and depression. We hypothesised that the shared genetic architecture between GERD and these risk factors can be leveraged to (1) identify new GERD and Barrett’s oesophagus (BE) risk loci and (2) explore potentially heterogeneous pathways leading to GERD and oesophageal complications.DesignWe applied multitrait GWAS models combining GERD (78 707 cases; 288 734 controls) and genetically correlated traits including education attainment, depression and body mass index. We also used multitrait analysis to identify BE risk loci. Top hits were replicated in 23andMe (462 753 GERD cases, 24 099 BE cases, 1 484 025 controls). We additionally dissected the GERD loci into obesity-driven and depression-driven subgroups. These subgroups were investigated to determine how they relate to tissue-specific gene expression and to risk of serious oesophageal disease (BE and/or oesophageal adenocarcinoma, EA).ResultsWe identified 88 loci associated with GERD, with 59 replicating in 23andMe after multiple testing corrections. Our BE analysis identified seven novel loci. Additionally we showed that only the obesity-driven GERD loci (but not the depression-driven loci) were associated with genes enriched in oesophageal tissues and successfully predicted BE/EA.ConclusionOur multitrait model identified many novel risk loci for GERD and BE. We present strong evidence for a genetic underpinning of disease heterogeneity in GERD and show that GERD loci associated with depressive symptoms are not strong predictors of BE/EA relative to obesity-driven GERD loci.


2021 ◽  
Vol 10 ◽  
pp. 204800402110236
Author(s):  
Julia Ramírez ◽  
Stefan van Duijvenboden ◽  
William J Young ◽  
Michele Orini ◽  
Aled R Jones ◽  
...  

The electrocardiogram (ECG) is a commonly used clinical tool that reflects cardiac excitability and disease. Many parameters are can be measured and with the improvement of methodology can now be quantified in an automated fashion, with accuracy and at scale. Furthermore, these measurements can be heritable and thus genome wide association studies inform the underpinning biological mechanisms. In this review we describe how we have used the resources in UK Biobank to undertake such work. In particular, we focus on a substudy uniquely describing the response to exercise performed at scale with accompanying genetic information.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


Author(s):  
Nasa Sinnott-Armstrong ◽  
Sahin Naqvi ◽  
Manuel Rivas ◽  
Jonathan K Pritchard

SummaryGenome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. However, for most traits it remains difficult to interpret what genes and biological processes are impacted by the top hits. Here, as a contrast, we describe UK Biobank GWAS results for three molecular traits—urate, IGF-1, and testosterone—that are biologically simpler than most diseases, and for which we know a great deal in advance about the core genes and pathways. Unlike most GWAS of complex traits, for all three traits we find that most top hits are readily interpretable. We observe huge enrichment of significant signals near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of variation in each trait, including insights into differences in testosterone regulation between females and males. Meanwhile, in other respects the results are reminiscent of GWAS for more-complex traits. In particular, even these molecular traits are highly polygenic, with most of the variance coming not from core genes, but from thousands to tens of thousands of variants spread across most of the genome. Given that diseases are often impacted by many distinct biological processes, including these three, our results help to illustrate why so many variants can affect risk for any given disease.


2019 ◽  
Author(s):  
Daniel B. Rosoff ◽  
George Davey Smith ◽  
Nehal Mehta ◽  
Toni-Kim Clarke ◽  
Falk W. Lohoff

ABSTRACTAlcohol and tobacco use, two major modifiable risk factors for cardiovascular disease (CVD), are often consumed together. Using large publicly available genome-wide association studies (results from > 940,000 participants), we conducted two-sample multivariable Mendelian randomization (MR) to simultaneously assess the independent effects of alcohol and tobacco use on CVD risk factors and events. We found genetic instruments associated with increased alcohol use, controlling for tobacco use, associated with increased high-density-lipoprotein-cholesterol (HDL-C), decreased triglycerides, but not with coronary heart disease (CHD), myocardial infarction (MI), nor stroke; and instruments for increased tobacco use, controlling for alcohol use, associated with decreased HDL-C, increased triglycerides, and increased risk of CHD and MI. Exploratory analysis found associations with HDL-C, LDL-C, and intermediate-density-lipoprotein metabolites. Consistency of results across complementary methods accommodating different MR assumptions strengthened causal inference, providing strong genetic evidence for the causal effects of modifiable lifestyle risk factors on CVD risk.


Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


Sign in / Sign up

Export Citation Format

Share Document