Incident disease associations with mosaic chromosomal alterations on autosomes, X and Y chromosomes: insights from a phenome-wide association study in the UK Biobank

Abstract Background Mosaic chromosomal alterations (mCAs) are large chromosomal gains, losses and copy-neutral losses of heterozygosity (LOH) in peripheral leukocytes. While many individuals with detectable mCAs have no notable adverse outcomes, mCA-associated gene dosage alterations as well as clonal expansion of mutated leukocyte clones could increase susceptibility to disease. Results We performed a phenome-wide association study (PheWAS) using existing data from 482,396 UK Biobank (UKBB) participants to investigate potential associations between mCAs and incident disease. Of the 1290 ICD codes we examined, our adjusted analysis identified a total of 50 incident disease outcomes associated with mCAs at PheWAS significance levels. We observed striking differences in the diseases associated with each type of alteration, with autosomal mCAs most associated with increased hematologic malignancies, incident infections and possibly cancer therapy-related conditions. Alterations of chromosome X were associated with increased lymphoid leukemia risk and, mCAs of chromosome Y were linked to potential reduced metabolic disease risk. Conclusions Our findings demonstrate that a wide range of diseases are potential sequelae of mCAs and highlight the critical importance of careful covariate adjustment in mCA disease association studies.

Download Full-text

Identifying the potential role of insomnia on multimorbidity: A Mendelian randomization phenome-wide association study in UK Biobank

10.1101/2022.01.11.22269005 ◽

2022 ◽

Author(s):

Mark J Gibson ◽

Deborah A Lawlor ◽

Louise AC Millard

Keyword(s):

Health Outcomes ◽

Association Study ◽

Genetic Risk ◽

Association Studies ◽

Causal Effects ◽

Mendelian Randomisation ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Wide Range

Objectives: To identify the breadth of potential causal effects of insomnia on health outcomes and hence its possible role in multimorbidity. Design: Mendelian randomisation (MR) Phenome-wide association study (MR-PheWAS) with two-sample Mendelian randomisation follow-up. Setting: Individual data from UK Biobank and summary data from a number of genome-wide association studies. Participants: 336,975 unrelated white-British UK Biobank participants. Exposures: Standardised genetic risk of insomnia for the MR-PheWAS and genetically predicted insomnia for the two-sample MR follow-up, with insomnia instrumented by a genetic risk score (GRS) created from 129 single-nucleotide polymorphisms (SNPs). Main outcomes measures: 11,409 outcomes from UK Biobank extracted and processed by an automated pipeline (PHESANT). Potential causal effects (i.e., those passing a Bonferroni-corrected significance threshold) were followed up with two-sample MR in MR-Base, where possible. Results: 437 potential causal effects of insomnia were observed for a number of traits, including anxiety, stress, depression, mania, addiction, pain, body composition, immune, respiratory, endocrine, dental, musculoskeletal, cardiovascular and reproductive traits, as well as socioeconomic and behavioural traits. We were able to undertake two-sample MR for 71 of these 437 and found evidence of causal effects (with directionally concordant effect estimates across all analyses) for 25 of these. These included, for example, risk of anxiety disorders (OR=1.55 [95% confidence interval (CI): 1.30, 1.86] per category increase in insomnia), diseases of the oesophagus/stomach/duodenum (OR=1.32 [95% CI: 1.14, 1.53]) and spondylosis (OR=1.57 [95% CI: 1.22, 2.01]). Conclusion: Insomnia potentially causes a wide range of adverse health outcomes and behaviours. This has implications for developing interventions to prevent and treat a number of diseases in order to reduce multimorbidity and associated polypharmacy.

Download Full-text

Genome-wide genetic data on ~500,000 UK Biobank participants

10.1101/166298 ◽

2017 ◽

Cited By ~ 303

Author(s):

Clare Bycroft ◽

Colin Freeman ◽

Desislava Petkova ◽

Gavin Band ◽

Lloyd T. Elliott ◽

...

Keyword(s):

Quality Control ◽

Allelic Variation ◽

Association Studies ◽

Genetic Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genotype Data ◽

Uk Biobank ◽

Genome Wide ◽

Wide Range

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.

Download Full-text

Assessment of Polygenic Architecture and Risk Prediction based on Common Variants Across Fourteen Cancers

10.1101/723825 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yan Zhang ◽

Amber N. Wilcox ◽

Haoyu Zhang ◽

Parichoy Pal Choudhury ◽

Douglas F. Easton ◽

...

Keyword(s):

Association Studies ◽

Disease Incidence ◽

Effect Sizes ◽

European Ancestry ◽

Risk Scores ◽

Average Risk ◽

Genome Wide Association Studies ◽

Lymphoid Leukemia ◽

Polygenic Risk ◽

Wide Range

AbstractWe analyzed summary-level data from genome-wide association studies (GWAS) of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) contributing to risk, as well as the distribution of their associated effect sizes. All cancers evaluated showed polygenicity, involving at a minimum thousands of independent susceptibility variants. For some malignancies, particularly chronic lymphoid leukemia (CLL) and testicular cancer, there are a larger proportion of variants with larger effect sizes than those for other cancers. In contrast, most variants for lung and breast cancers have very small associated effect sizes. For different cancer sites, we estimate a wide range of GWAS sample sizes, required to explain 80% of GWAS heritability, varying from 60,000 cases for CLL to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores, compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that polygenic risk scores have substantial potential for risk stratification for relatively common cancers such as breast, prostate and colon, but limited potential for other cancer sites because of modest heritability and lower disease incidence.

Download Full-text

EraSOR: Erase Sample Overlap in polygenic score analyses

10.1101/2021.12.10.472164 ◽

2021 ◽

Author(s):

Shing Wan Choi ◽

Timothy Shin Heng Mak ◽

Clive J. Hoggart ◽

Paul F. O'Reilly

Keyword(s):

Association Studies ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Type 1 Error ◽

Wide Range ◽

Close Relatedness ◽

Target Data

Background: Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work. Results: Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification. Conclusion: The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.

Download Full-text

Systematic single-variant and gene-based association testing of 3,700 phenotypes in 281,850 UK Biobank exomes

10.1101/2021.06.19.21259117 ◽

2021 ◽

Author(s):

Konrad Karczewski ◽

Matthew Solomonson ◽

Katherine R Chao ◽

Julia K Goodrich ◽

Grace Tiao ◽

...

Keyword(s):

Sequence Data ◽

Association Studies ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Allelic Series ◽

Association Analyses ◽

Wide Range ◽

The Uk ◽

The Impact

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 3,700 phenotypes using single-variant and gene tests of 281,850 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside a browser framework for rapidly exploring rare variant association results.

Download Full-text

An integrated genome and phenome-wide association study approach to understanding Alzheimer's disease predisposition

10.1101/2022.01.03.22268705 ◽

2022 ◽

Author(s):

Archita Khaire ◽

Courtney E Wimberly ◽

Eleanor C Semmes ◽

Jillian H Hurst ◽

Kyle M Walsh

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Association Study ◽

Disease Risk ◽

Association Studies ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Distribution Width ◽

Amyloid Toxicity ◽

Load Risk

Background: Genome-wide association studies (GWAS) have identified common, heritable alleles that increase late-onset Alzheimer's disease (LOAD) risk. We recently published an analytic approach to integrate GWAS and phenome-wide association study (PheWAS) data, enabling identification of candidate traits and trait-associated variants impacting disease risk, and apply it here to LOAD. Methods: PheWAS was performed for 23 known LOAD-associated single nucleotide polymorphisms (SNPs) and 4:1 matched control SNPs using UK Biobank data. Traits enriched for association with LOAD SNPs were ascertained and used to identify trait-associated candidate SNPs to be tested for association with LOAD risk (17,008 cases; 37,154 controls). Results: LOAD-associated SNPs were significantly enriched for associations with 6/778 queried traits, including three platelet traits. The strongest enrichment was for platelet distribution width (PDW) (P=1.2x10-5), but no consistent direction of effect was observed between increased PDW and LOAD susceptibility across variants or in Mendelian randomization analysis. Of 384 PDW-associated SNPs identified by prior GWAS, 36 were nominally associated with LOAD risk and 5 survived false-discovery rate correction for multiple testing. Associations confirmed known LOAD risk loci near PICALM, CD2AP, SPI1, and NDUFAF6, and identified a novel risk locus in the epidermal growth factor receptor (EGFR) gene. Conclusions: Through integration of GWAS and PheWAS data, we identify substantial pleiotropy between genetic determinants of LOAD and of platelet morphology, and for the first time implicate EGFR - a mediator of Beta amyloid toxicity - in Alzheimer's disease susceptibility.

Download Full-text

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics

Bioinformatics ◽

10.1093/bioinformatics/bty999 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2495-2497 ◽

Cited By ~ 27

Author(s):

Gregory McInnes ◽

Yosuke Tanigawa ◽

Chris DeBoever ◽

Adam Lavertu ◽

Julia Eve Olivieri ◽

...

Keyword(s):

Association Studies ◽

Genetic Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Patient Privacy ◽

Web Based ◽

Genome Wide ◽

Wide Range ◽

The Uk

Abstract Summary Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. Availability and implementation GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.

Download Full-text

Calcification of abdominal aorta is an underappreciated cardiovascular disease risk factor

10.1101/2020.05.07.20094706 ◽

2020 ◽

Cited By ~ 1

Author(s):

Anurag Sethi ◽

Leland Taylor ◽

J Graham Ruby ◽

Jagadish Venkataraman ◽

Madeleine Cule ◽

...

Keyword(s):

Risk Factors ◽

Cardiovascular Disease ◽

Risk Factor ◽

Disease Risk ◽

Cardiovascular Outcomes ◽

Whole Body ◽

Uk Biobank ◽

Clinical Biomarkers ◽

Wide Range ◽

The Uk

AbstractBackgroundCalcification of the abdominal artery is an important contributor to cardiovascular disease in diabetic and chronic kidney disease (CKD) populations. However, prevalence of the pathology, risk factors, and long term disease outcomes in a general population have not been systematically analyzed.MethodWe developed machine learning models to quantify levels of abdominal aortic calcification (AAC) in 29,957 whole body dual-energy X-ray absorptiometry (DEXA) scans from the UK Biobank cohort. Using regression techniques we associated severity of calcification across a wide range of physiological parameters, clinical biomarkers, and environmental risk factors (406 in total). We performed a common variant genetic association study spanning 9,572,557 single-nucleotide polymorphisms to identify genetic loci relevant to AAC. We evaluated the prognostic value of AAC across 151 disease classes using Cox proportional hazard models. We further examined an epidemiological model of calcification on cardiovascular morbidity with and without LDL interactions.FindingsWe find evidence for AAC in >10.4% of the cohort despite low prevalence of diabetes (2.5%) and CKD (0.5%). Increased level of AAC is a strong prognostic indicator of cardiovascular outcomes for stenosis of precerebral arteries (HR~1.5), Myocardial Infarction (HR~1.5), & Ischemic Heart Disease (HR~1.33). We find that AAC is genetically correlated with cardiovascular-related traits and that the genetic signals are enriched in vascular and adipose tissue. We report three loci associated with AAC, with the strongest association occuring at the TWIST1/HDAC9 locus (beta=0.078, p-value=1.4e-11) in a region also associated with coronary artery disease. Surprisingly, we find that elevated but still within clinically normal levels of serum phosphate and glycated hemoglobin are linked to increased vascular calcification. Furthermore, we show AAC arises in the absence of hypercholesterolemia. By our estimate, AAC is an LDL-independent risk factor for cardiovascular outcomes, with risk similar to elevated LDL.DataThis research has been conducted using the UK Biobank Resource.

Download Full-text

Phenome-wide association study (PheWAS) of colorectal cancer risk SNP effects on health outcomes in UK Biobank

British Journal of Cancer ◽

10.1038/s41416-021-01655-9 ◽

2021 ◽

Author(s):

Xiaomeng Zhang ◽

Xue Li ◽

Yazhou He ◽

Philip J. Law ◽

Susan M. Farrington ◽

...

Keyword(s):

Colorectal Cancer ◽

Health Outcomes ◽

Association Study ◽

Diverticular Disease ◽

Genetic Predisposition ◽

Association Studies ◽

Sensitivity Analyses ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Uk Biobank

Abstract Background Associations between colorectal cancer (CRC) and other health outcomes have been reported, but these may be subject to biases, or due to limitations of observational studies. Methods We set out to determine whether genetic predisposition to CRC is also associated with the risk of other phenotypes. Under the phenome-wide association study (PheWAS) and tree-structured phenotypic model (TreeWAS), we studied 334,385 unrelated White British individuals (excluding CRC patients) from the UK Biobank cohort. We generated a polygenic risk score (PRS) from CRC genome-wide association studies as a measure of CRC risk. We performed sensitivity analyses to test the robustness of the results and searched the Danish Disease Trajectory Browser (DTB) to replicate the observed associations. Results Eight PheWAS phenotypes and 21 TreeWAS nodes were associated with CRC genetic predisposition by PheWAS and TreeWAS, respectively. The PheWAS detected associations were from neoplasms and digestive system disease group (e.g. benign neoplasm of colon, anal and rectal polyp and diverticular disease). The results from the TreeWAS corroborated the results from the PheWAS. These results were replicated in the observational data within the DTB. Conclusions We show that benign colorectal neoplasms share genetic aetiology with CRC using PheWAS and TreeWAS methods. Additionally, CRC genetic predisposition is associated with diverticular disease.

Download Full-text

Exploring the Role of Contactins across Psychological, Psychiatric and Cardiometabolic Traits within UK Biobank

Genes ◽

10.3390/genes11111326 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1326

Author(s):

Julia Morris ◽

Soddy Sau Yu Leung ◽

Mark E.S. Bailey ◽

Breda Cullen ◽

Amy Ferguson ◽

...

Keyword(s):

Mental Illness ◽

Genetic Variation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Psychological Traits ◽

Medication Effects ◽

Increased Risk ◽

Wide Range ◽

Shared Risk

Individuals with severe mental illness have an increased risk of cardiometabolic diseases compared to the general population. Shared risk factors and medication effects explain part of this excess risk; however, there is growing evidence to suggest that shared biology (including genetic variation) is likely to contribute to comorbidity between mental and physical illness. Contactins are a family of genes involved in development of the nervous system and implicated, though genome-wide association studies, in a wide range of psychological, psychiatric and cardiometabolic conditions. Contactins are plausible candidates for shared pathology between mental and physical health. We used data from UK Biobank to systematically assess how genetic variation in contactin genes was associated with a wide range of psychological, psychiatric and cardiometabolic conditions. We also investigated whether associations for cardiometabolic and psychological traits represented the same or distinct signals and how the genetic variation might influence the measured traits. We identified: A novel genetic association between variation in CNTN1 and current smoking; two independent signals in CNTN4 for BMI; and demonstrated that associations between CNTN5 and neuroticism were distinct from those between CNTN5 and blood pressure/HbA1c. There was no evidence that the contactin genes contributed to shared aetiology between physical and mental illness

Download Full-text