Fairness in Cardiac Magnetic Resonance Imaging: Assessing sex and racial bias in deep learning-based segmentation
Background: Artificial intelligence (AI) techniques have been proposed for automation of cine CMR segmentation for functional quantification. However, in other applications AI models have been shown to have potential for sex and/or racial bias. Objectives: To perform the first analysis of sex/racial bias in AI-based cine CMR segmentation using a large-scale database. Methods: A state-of-the-art deep learning (DL) model was used for automatic segmentation of both ventricles and the myocardium from cine short-axis CMR. The dataset consisted of end-diastole and end-systole short-axis cine CMR images of 5,903 subjects from the UK Biobank database (61.5±7.1 years, 52% male, 81% white). To assess sex and racial bias, we compared Dice scores and errors in measurements of biventricular volumes and function between patients grouped by race and sex. To investigate whether segmentation bias could be explained by potential confounders, a multivariate linear regression and ANCOVA were performed. Results: We found statistically significant differences in Dice scores (white ~94% vs minority ethnic groups 86-89%) as well as in absolute/relative errors in volumetric and functional measures, showing that the AI model was biased against minority racial groups, even after correction for possible confounders. Conclusions: We have shown that racial bias can exist in DL-based cine CMR segmentation models. We believe that this bias is due to the unbalanced nature of the training data (combined with physiological differences). This is supported by the results which show racial bias but not sex bias when trained using the UK Biobank database, which is sex-balanced but not race-balanced.