Collider bias undermines our understanding of COVID-19 disease risk and severity
StandfirstObservational data on COVID-19 including hypothesised risk factors for infection and progression are accruing rapidly. Here, we highlight the challenge of interpreting observational evidence from non-random samples of the population, which may be affected by collider bias. We illustrate these issues using data from the UK Biobank in which individuals tested for COVID-19 are highly selected for a wide range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the sampling mechanisms that leave aetiological studies of COVID-19 infection and progression particularly susceptible to collider bias. We also describe several tools and strategies that could help mitigate the effects of collider bias in extant studies of COVID-19 and make available a web app for performing sensitivity analyses. While bias due to non-random sampling should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.Key messagesCollider bias can occur in studies that non-randomly sample people from the population of interest. This bias can distort associations between variables or induce spurious associations.It may be possible to estimate the underlying selection model or run sensitivity analyses to examine the credibility of the threat of collider bias, but it is difficult to prove that bias has been reduced or eliminated.Tested samples in the UK Biobank cohort are highly selected for a range of traits.Sampling strategies that are resilient to collider bias issues should be used at the design stage of data collection where possible.Where this is not possible, linkage or collection of data on the target population can help in sensitivity and validation analyses.