scholarly journals UKBCC: a cohort curation package for UK Biobank

2020 ◽  
Author(s):  
Isabell Kiral ◽  
Nathalie Willems ◽  
Benjamin Goudey

AbstractSummaryThe UK Biobank (UKB) has quickly become a critical resource for researchers conducting a wide-range of biomedical studies (Bycroft et al., 2018). The database is constructed from heterogeneous data sources, employs several different encoding schemes, and is disparately distributed throughout UKB servers. Consequently, querying these data remains complicated, making it difficult to quickly identify participants who meet a given set of criteria. We have developed UK Biobank Cohort Curator (UKBCC), a Python tool that allows researchers to rapidly construct cohorts based on a set of search terms. Here, we describe the UKBCC implementation, critical sub-modules and functions, and outline its usage through an example use case for replicable cohort creation.AvailabilityUKBCC is available through PyPi (https://pypi.org/project/ukbcc) and as open source code on GitHub (https://github.com/tool-bin/ukbcc)[email protected]

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Piril Hepsomali ◽  
John A. Groeger

AbstractAccumulating evidence suggests that dietary interventions might have potential to be used as a strategy to protect against age-related cognitive decline and neurodegeneration, as there are associations between some nutrients, food groups, dietary patterns, and some domains of cognition. In this study, we aimed to conduct the largest investigation of diet and cognition to date, through systematically examining the UK Biobank (UKB) data to find out whether dietary quality and food groups play a role on general cognitive ability. This cross-sectional population-based study involved 48,749 participants. UKB data on food frequency questionnaire and cognitive function were used. Also, healthy diet, partial fibre intake, and milk intake scores were calculated. Adjusted models included age, sex, and BMI. We observed associations between better general cognitive ability and higher intakes of fish, and unprocessed red meat; and moderate intakes of fibre, and milk. Surprisingly, we found that diet quality, vegetable intake, high and low fibre and milk intake were inversely associated with general cognitive ability. Our results suggest that fish and unprocessed red meat and/or nutrients that are found in fish and unprocessed red meat might be beneficial for general cognitive ability. However, results should be interpreted in caution as the same food groups may affect other domains of cognition or mental health differently. These discrepancies in the current state of evidence invites further research to examine domain-specific effects of dietary patterns/food groups on a wide range of cognitive and affective outcomes with a special focus on potential covariates that may have an impact on diet and cognition relationship.


2020 ◽  
Author(s):  
Quan Do ◽  
Ho Bich Hai ◽  
Pierre Larmande

AbstractSummaryCurrently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programming API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results.Availability and implementationhttps://github.com/SouthGreenPlatform/PyRiceDocumentationhttps://[email protected] informationMITSupplementary informationSupplementary data are available online.


2016 ◽  
Author(s):  
Brent S. Pedersen ◽  
Ryan M. Layer ◽  
Aaron R. Quinlan

ABSTRACTBackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.ResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel “chromosome sweeping” algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.ConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.AvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.


2021 ◽  
Author(s):  
Phuoc Truong Nguyen ◽  
Ilya Plyusnin ◽  
Tarja Sironen ◽  
Olli Vapalahti ◽  
Ravi Kant ◽  
...  

AbstractBackgroundSARS-CoV-2 related research has increased in importance worldwide since December 2019. Several new variants of SARS-CoV-2 have emerged globally, of which the most notable and concerning currently are the UK variant B.1.1.7, the South African variant B1.351 and the Brazilian variant P.1. Detecting and monitoring novel variants is essential in SARS-CoV-2 surveillance. While there are several tools for assembling virus genomes and performing lineage analyses to investigate SARS-CoV-2, each is limited to performing singular or a few functions separately.ResultsDue to the lack of publicly available pipelines, which could perform fast reference-based assemblies on raw SARS-CoV-2 sequences in addition to identifying lineages to detect variants of concern, we have developed an open source bioinformatic pipeline called HaVoC (Helsinki university Analyzer for Variants Of Concern). HaVoC can reference assemble raw sequence reads and assign the corresponding lineages to SARS-CoV-2 sequences.ConclusionsHaVoC is a pipeline utilizing several bioinformatic tools to perform multiple necessary analyses for investigating genetic variance among SARS-CoV-2 samples. The pipeline is particularly useful for those who need a more accessible and fast tool to detect and monitor the spread of SARS-CoV-2 variants of concern during local outbreaks. HaVoC is currently being used in Finland for monitoring the spread of SARS-CoV-2 variants. HaVoC user manual and source code are available at https://www.helsinki.fi/en/projects/havoc and https://bitbucket.org/auto_cov_pipeline/havoc, respectively.


Author(s):  
Juba Nait Saada ◽  
Georgios Kalantzis ◽  
Derek Shyr ◽  
Martin Robinson ◽  
Alexander Gusev ◽  
...  

AbstractDetection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1,500 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the localization of a sample’s birth coordinates from genomic data. We sought evidence of recent positive selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle population structure, recent evolutionary history, and rare pathogenic variation.


2018 ◽  
Vol 34 (4) ◽  
pp. 961-979
Author(s):  
Rain Opik ◽  
Toomas Kirt ◽  
Innar Liiv

Abstract This article presents a visual method for representing the complex labor market internal structure from the perspective of similar occupations based on shared skills; and a prototype tool for interacting with the visualization, together with an extended description of graph construction and the necessary data processing for linking multiple heterogeneous data sources. Since the labor market is not an isolated phenomenon and is constantly impacted by external trends and interventions, the presented method is designed to enable adding extra layers of external information. For instance, what is the impact of a megatrend or an intervention on the labor market? Which parts of the labor market are the most vulnerable to an approaching megatrend or planned intervention? A case study analyzing the labor market together with the megatrend of job automation and computerization is presented. The source code of the prototype is released as open source for repeatability.


Author(s):  
Fatemeh Safizadeh ◽  
Thi Ngoc Mai Nguyen ◽  
Hermann Brenner ◽  
Ben Schöttker

Aim: The risk-benefit profile of angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) in coronavirus disease 2019 (Covid-19) is still a matter of debate. With growing evidence on the protective effect of this group of commonly used antihypertensives in Covid-19, we aimed to thoroughly investigate the association between the use of major classes of antihypertensive medications and Covid-19 outcomes in comparison with the use of ACEIs and ARBs. Methods: We conducted a population-based study in patients with pre-existing hypertension in the UK Biobank. Multivariable logistic regression analysis was performed adjusting for a wide range of confounders. Results: The use of either beta-blockers (BBs), calcium-channel blockers (CCBs), or diuretics was associated with a higher risk of Covid-19 hospitalization compared to ACEI use (adjusted OR, 1.63; 95% CI, 1.40 to 1.90) and ARB use (adjusted OR, 1.50; 95% CI, 1.27 to 1.77). The risk of 28-day mortality among Covid-19 patients was also increased among users of BBs, CCBs or diuretics when compared to ACEI users (adjusted OR, 1.64; 95% CI, 1.23 to 2.19) but not when compared to ARB users (adjusted OR, 1.18; 95% CI, 0.87 to 1.59). However, no associations were observed when the same analysis was conducted among hospitalized Covid-19 patients only. Conclusion: Our results suggest protective effects of blocking of the renin-angiotensin-aldosterone system on Covid-19 hospitalization and mortality among patients with pharmaceutically treated hypertension, which should be addressed by randomized controlled trials. If confirmed, this finding could have high clinical relevance for treating hypertension during the SARS-CoV-2 pandemic.


2017 ◽  
Author(s):  
Adrian Cortes ◽  
Calliope A. Dendrou ◽  
Allan Motyer ◽  
Luke Jostins ◽  
Damjan Vukcevic ◽  
...  

Genetic discovery from the multitude of phenotypes extractable from routine healthcare data has the ability to radically transform our understanding of the human phenome, thereby accelerating progress towards precision medicine. However, a critical question when analysing high-dimensional and heterogeneous data is how to interrogate increasingly specific subphenotypes whilst retaining statistical power to detect genetic associations. Here we develop and employ a novel Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to jointly analyse genetic variants against UK Biobank healthcare phenotypes. Our method displays a more than 20% increase in power to detect genetic effects over other approaches, such that we uncover the broader burden of genetic variation: we identify associations with over 2,000 diagnostic terms. We find novel associations with common immune-mediated diseases (IMD), we reveal the extent of genetic sharing between specific IMDs, and we expose differences in disease perception or diagnosis with potential clinical implications.


2017 ◽  
Author(s):  
Jeremy J. Berg ◽  
Xinjun Zhang ◽  
Graham Coop

AbstractOur understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.Author’s Note on Failure to ReplicateAfter this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.


2021 ◽  
Author(s):  
Konrad Karczewski ◽  
Matthew Solomonson ◽  
Katherine R Chao ◽  
Julia K Goodrich ◽  
Grace Tiao ◽  
...  

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 3,700 phenotypes using single-variant and gene tests of 281,850 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside a browser framework for rapidly exploring rare variant association results.


Sign in / Sign up

Export Citation Format

Share Document