Collider bias undermines our understanding of COVID-19 disease risk and severity

StandfirstObservational data on COVID-19 including hypothesised risk factors for infection and progression are accruing rapidly. Here, we highlight the challenge of interpreting observational evidence from non-random samples of the population, which may be affected by collider bias. We illustrate these issues using data from the UK Biobank in which individuals tested for COVID-19 are highly selected for a wide range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the sampling mechanisms that leave aetiological studies of COVID-19 infection and progression particularly susceptible to collider bias. We also describe several tools and strategies that could help mitigate the effects of collider bias in extant studies of COVID-19 and make available a web app for performing sensitivity analyses. While bias due to non-random sampling should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.Key messagesCollider bias can occur in studies that non-randomly sample people from the population of interest. This bias can distort associations between variables or induce spurious associations.It may be possible to estimate the underlying selection model or run sensitivity analyses to examine the credibility of the threat of collider bias, but it is difficult to prove that bias has been reduced or eliminated.Tested samples in the UK Biobank cohort are highly selected for a range of traits.Sampling strategies that are resilient to collider bias issues should be used at the design stage of data collection where possible.Where this is not possible, linkage or collection of data on the target population can help in sensitivity and validation analyses.

Download Full-text

Collider bias undermines our understanding of COVID-19 disease risk and severity

Nature Communications ◽

10.1038/s41467-020-19478-2 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 19

Author(s):

Gareth J. Griffith ◽

Tim T. Morris ◽

Matthew J. Tudball ◽

Annie Herbert ◽

Giulia Mancano ◽

...

Keyword(s):

Observational Studies ◽

Disease Risk ◽

Observational Evidence ◽

Design Stage ◽

Sampling Strategies ◽

Uk Biobank ◽

Active Infection ◽

Disease Outcomes ◽

Representative Samples ◽

Collider Bias

AbstractNumerous observational studies have attempted to identify risk factors for infection with SARS-CoV-2 and COVID-19 disease outcomes. Studies have used datasets sampled from patients admitted to hospital, people tested for active infection, or people who volunteered to participate. Here, we highlight the challenge of interpreting observational evidence from such non-representative samples. Collider bias can induce associations between two or more variables which affect the likelihood of an individual being sampled, distorting associations between these variables in the sample. Analysing UK Biobank data, compared to the wider cohort the participants tested for COVID-19 were highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the mechanisms inducing these problems, and approaches that could help mitigate them. While collider bias should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.

Download Full-text

Integrating large-scale neuroimaging research datasets: harmonisation of white matter hyperintensity measurements across Whitehall and UK Biobank datasets

10.1101/2020.07.28.208579 ◽

2020 ◽

Author(s):

Valentina Bordin ◽

Ilaria Bertani ◽

Irene Mattioli ◽

Vaanathi Sundaresan ◽

Paul McCarthy ◽

...

Keyword(s):

White Matter ◽

Large Scale ◽

White Matter Hyperintensity ◽

List Type ◽

Uk Biobank ◽

Physiological Variables ◽

Healthy Elderly ◽

Processing Strategies ◽

Wide Range ◽

The Uk

ABSTRACTLarge scale neuroimaging datasets present the possibility of providing normative distributions for a wide variety of neuroimaging markers, which would vastly improve the clinical utility of these measures. However, a major challenge is our current poor ability to integrate measures across different large-scale datasets, due to inconsistencies in imaging and non-imaging measures across the different protocols and populations. Here we explore the harmonisation of white matter hyperintensity (WMH) measures across two major studies of healthy elderly populations, the Whitehall II imaging sub-study and the UK Biobank. We identify pre-processing strategies that maximise the consistency across datasets and utilise multivariate regression to characterise sample differences contributing to study-level differences in WMH variations. We also present a parser to harmonise WMH-relevant non-imaging variables across the two datasets. We show that we can provide highly calibrated WMH measures from these datasets with: (1) the inclusion of a number of specific standardised processing steps; and (2) appropriate modelling of sample differences through the alignment of demographic, cognitive and physiological variables. These results open up a wide range of applications for the study of WMHs and other neuroimaging markers across extensive databases of clinical data.HIGHLIGHTSWe harmonised measures of WMHs across two studies on healthy ageingSpecific pre-processing strategies can increase comparability across studiesModelling of biological differences is crucial to provide calibrated measures

Download Full-text

Calcification of abdominal aorta is an underappreciated cardiovascular disease risk factor

10.1101/2020.05.07.20094706 ◽

2020 ◽

Cited By ~ 1

Author(s):

Anurag Sethi ◽

Leland Taylor ◽

J Graham Ruby ◽

Jagadish Venkataraman ◽

Madeleine Cule ◽

...

Keyword(s):

Risk Factors ◽

Cardiovascular Disease ◽

Risk Factor ◽

Disease Risk ◽

Cardiovascular Outcomes ◽

Whole Body ◽

Uk Biobank ◽

Clinical Biomarkers ◽

Wide Range ◽

The Uk

AbstractBackgroundCalcification of the abdominal artery is an important contributor to cardiovascular disease in diabetic and chronic kidney disease (CKD) populations. However, prevalence of the pathology, risk factors, and long term disease outcomes in a general population have not been systematically analyzed.MethodWe developed machine learning models to quantify levels of abdominal aortic calcification (AAC) in 29,957 whole body dual-energy X-ray absorptiometry (DEXA) scans from the UK Biobank cohort. Using regression techniques we associated severity of calcification across a wide range of physiological parameters, clinical biomarkers, and environmental risk factors (406 in total). We performed a common variant genetic association study spanning 9,572,557 single-nucleotide polymorphisms to identify genetic loci relevant to AAC. We evaluated the prognostic value of AAC across 151 disease classes using Cox proportional hazard models. We further examined an epidemiological model of calcification on cardiovascular morbidity with and without LDL interactions.FindingsWe find evidence for AAC in >10.4% of the cohort despite low prevalence of diabetes (2.5%) and CKD (0.5%). Increased level of AAC is a strong prognostic indicator of cardiovascular outcomes for stenosis of precerebral arteries (HR~1.5), Myocardial Infarction (HR~1.5), & Ischemic Heart Disease (HR~1.33). We find that AAC is genetically correlated with cardiovascular-related traits and that the genetic signals are enriched in vascular and adipose tissue. We report three loci associated with AAC, with the strongest association occuring at the TWIST1/HDAC9 locus (beta=0.078, p-value=1.4e-11) in a region also associated with coronary artery disease. Surprisingly, we find that elevated but still within clinically normal levels of serum phosphate and glycated hemoglobin are linked to increased vascular calcification. Furthermore, we show AAC arises in the absence of hypercholesterolemia. By our estimate, AAC is an LDL-independent risk factor for cardiovascular outcomes, with risk similar to elevated LDL.DataThis research has been conducted using the UK Biobank Resource.

Download Full-text

COMPARISON OF ATHEROSCLEROTIC CARDIOVASCULAR DISEASE RISK PREDICTION BY LIPOPROTEIN(A) LEVELS BETWEEN PERSONS WITH AND WITHOUT PRIOR CARDIOVASCULAR DISEASE: THE UK BIOBANK

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(21)02842-4 ◽

2021 ◽

Vol 77 (18) ◽

pp. 1484

Author(s):

Nathan D. Wong ◽

Yanglu Zhao ◽

Ailin Barseghian El-Farra ◽

Michael Wilkinson

Keyword(s):

Cardiovascular Disease ◽

Risk Prediction ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Atherosclerotic Cardiovascular Disease ◽

Uk Biobank ◽

Lipoprotein A ◽

Atherosclerotic Cardiovascular Disease Risk ◽

The Uk

Download Full-text

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

Genes ◽

10.3390/genes12070991 ◽

2021 ◽

Vol 12 (7) ◽

pp. 991

Author(s):

Erik Widen ◽

Timothy G. Raben ◽

Louis Lello ◽

Stephen D. H. Hsu

Keyword(s):

Disease Risk ◽

Smoking Status ◽

Glycated Haemoglobin ◽

European Ancestry ◽

Risk Scores ◽

Common Disease ◽

Atherosclerotic Cardiovascular Disease ◽

Uk Biobank ◽

Lipoprotein A ◽

The Uk

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.

Download Full-text

Diet and general cognitive ability in the UK Biobank dataset

Scientific Reports ◽

10.1038/s41598-021-91259-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Piril Hepsomali ◽

John A. Groeger

Keyword(s):

Cognitive Ability ◽

Dietary Patterns ◽

Red Meat ◽

Milk Intake ◽

Special Focus ◽

General Cognitive Ability ◽

Food Groups ◽

Uk Biobank ◽

Wide Range ◽

The Uk

AbstractAccumulating evidence suggests that dietary interventions might have potential to be used as a strategy to protect against age-related cognitive decline and neurodegeneration, as there are associations between some nutrients, food groups, dietary patterns, and some domains of cognition. In this study, we aimed to conduct the largest investigation of diet and cognition to date, through systematically examining the UK Biobank (UKB) data to find out whether dietary quality and food groups play a role on general cognitive ability. This cross-sectional population-based study involved 48,749 participants. UKB data on food frequency questionnaire and cognitive function were used. Also, healthy diet, partial fibre intake, and milk intake scores were calculated. Adjusted models included age, sex, and BMI. We observed associations between better general cognitive ability and higher intakes of fish, and unprocessed red meat; and moderate intakes of fibre, and milk. Surprisingly, we found that diet quality, vegetable intake, high and low fibre and milk intake were inversely associated with general cognitive ability. Our results suggest that fish and unprocessed red meat and/or nutrients that are found in fish and unprocessed red meat might be beneficial for general cognitive ability. However, results should be interpreted in caution as the same food groups may affect other domains of cognition or mental health differently. These discrepancies in the current state of evidence invites further research to examine domain-specific effects of dietary patterns/food groups on a wide range of cognitive and affective outcomes with a special focus on potential covariates that may have an impact on diet and cognition relationship.

Download Full-text

Development and Validation of a Novel Dementia Risk Score in the UK Biobank Cohort

10.31234/osf.io/5yvjr ◽

2021 ◽

Author(s):

Melis Anatürk ◽

Raihaan Patel ◽

Georgios Georgiopoulos ◽

Danielle Newby ◽

Anya Topiwala ◽

...

Keyword(s):

At Risk ◽

Cardiovascular Risk ◽

Risk Score ◽

Disease Risk ◽

Risk Index ◽

Risk Scores ◽

Uk Biobank ◽

Dementia Risk ◽

History Of ◽

The Uk

INTRODUCTION: Current prognostic models of dementia have had limited success in consistently identifying at-risk individuals. We aimed to develop and validate a novel dementia risk score (DRS) using the UK Biobank cohort.METHODS: After randomly dividing the sample into a training (n=166,487, 80%) and test set (n=41,621, 20%), logistic LASSO regression and standard logistic regression were used to develop the UKB-DRS.RESULTS: The score consisted of age, sex, education, apolipoprotein E4 genotype, a history of diabetes, stroke, and depression, and a family history of dementia. The UKB-DRS had good-to-strong discrimination accuracy in the UKB hold-out sample (AUC [95%CI]=0.79 [0.77, 0.82]) and in an external dataset (Whitehall II cohort, AUC [95%CI]=0.83 [0.79,0.87]). The UKB-DRS also significantly outperformed four published risk scores (i.e., Australian National University Alzheimer’s Disease Risk Index (ANU-ADRI), Cardiovascular Risk Factors, Aging, and Dementia score (CAIDE), Dementia Risk Score (DRS), and the Framingham Cardiovascular Risk Score (FRS) across both test sets.CONCLUSION: The UKB-DRS represents a novel easy-to-use tool that could be used for routine care or targeted selection of at-risk individuals into clinical trials.

Download Full-text

Polygenic Hyperlipidemia Increases Coronary Artery Disease Risk In The Uk Biobank

Atherosclerosis ◽

10.1016/j.atherosclerosis.2019.06.266 ◽

2019 ◽

Vol 287 ◽

pp. e92

Author(s):

P. Ripatti ◽

J.T. Rämö ◽

S. Söderlund ◽

I. Surakka ◽

A.S. Havulinna ◽

...

Keyword(s):

Coronary Artery Disease ◽

Coronary Artery ◽

Disease Risk ◽

Coronary Artery Disease Risk ◽

Uk Biobank ◽

Artery Disease ◽

The Uk

Download Full-text

Altered Cortical Brain Structure and Increased Risk for Disease Seen Decades After Perinatal Exposure to Maternal Smoking: A Study of 9000 Adults in the UK Biobank

Cerebral Cortex ◽

10.1093/cercor/bhz060 ◽

2019 ◽

Vol 29 (12) ◽

pp. 5217-5233 ◽

Cited By ~ 4

Author(s):

Lauren E Salminen ◽

Rand R Wilcox ◽

Alyssa H Zhu ◽

Brandalyn C Riedel ◽

Christopher R K Ching ◽

...

Keyword(s):

Brain Structure ◽

Brain Mri ◽

Population Based ◽

Smoke Exposure ◽

Sensitivity Analyses ◽

Uk Biobank ◽

Increased Risk ◽

Increase Risk ◽

Sensory Cortices ◽

The Uk

Abstract Secondhand smoke exposure is a major public health risk that is especially harmful to the developing brain, but it is unclear if early exposure affects brain structure during middle age and older adulthood. Here we analyzed brain MRI data from the UK Biobank in a population-based sample of individuals (ages 44–80) who were exposed (n = 2510) or unexposed (n = 6079) to smoking around birth. We used robust statistical models, including quantile regressions, to test the effect of perinatal smoke exposure (PSE) on cortical surface area (SA), thickness, and subcortical volumes. We hypothesized that PSE would be associated with cortical disruption in primary sensory areas compared to unexposed (PSE−) adults. After adjusting for multiple comparisons, SA was significantly lower in the pericalcarine (PCAL), inferior parietal (IPL), and regions of the temporal and frontal cortex of PSE+ adults; these abnormalities were associated with increased risk for several diseases, including circulatory and endocrine conditions. Sensitivity analyses conducted in a hold-out group of healthy participants (exposed, n = 109, unexposed, n = 315) replicated the effect of PSE on SA in the PCAL and IPL. Collectively our results show a negative, long term effect of PSE on sensory cortices that may increase risk for disease later in life.

Download Full-text

Identity-by-descent detection across 487,409 British samples reveals fine-scale population structure, evolutionary history, and trait associations

10.1101/2020.04.20.029819 ◽

2020 ◽

Cited By ~ 3

Author(s):

Juba Nait Saada ◽

Georgios Kalantzis ◽

Derek Shyr ◽

Martin Robinson ◽

Alexander Gusev ◽

...

Keyword(s):

Population Structure ◽

Exome Sequencing ◽

Evolutionary History ◽

Genetic Relatedness ◽

Uk Biobank ◽

The Past ◽

Wide Range ◽

Shared Ancestry ◽

The Uk ◽

Common Ancestors

AbstractDetection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1,500 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the localization of a sample’s birth coordinates from genomic data. We sought evidence of recent positive selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle population structure, recent evolutionary history, and rare pathogenic variation.

Download Full-text