Fairness in Cardiac Magnetic Resonance Imaging: Assessing sex and racial bias in deep learning-based segmentation

Background: Artificial intelligence (AI) techniques have been proposed for automation of cine CMR segmentation for functional quantification. However, in other applications AI models have been shown to have potential for sex and/or racial bias. Objectives: To perform the first analysis of sex/racial bias in AI-based cine CMR segmentation using a large-scale database. Methods: A state-of-the-art deep learning (DL) model was used for automatic segmentation of both ventricles and the myocardium from cine short-axis CMR. The dataset consisted of end-diastole and end-systole short-axis cine CMR images of 5,903 subjects from the UK Biobank database (61.5±7.1 years, 52% male, 81% white). To assess sex and racial bias, we compared Dice scores and errors in measurements of biventricular volumes and function between patients grouped by race and sex. To investigate whether segmentation bias could be explained by potential confounders, a multivariate linear regression and ANCOVA were performed. Results: We found statistically significant differences in Dice scores (white ~94% vs minority ethnic groups 86-89%) as well as in absolute/relative errors in volumetric and functional measures, showing that the AI model was biased against minority racial groups, even after correction for possible confounders. Conclusions: We have shown that racial bias can exist in DL-based cine CMR segmentation models. We believe that this bias is due to the unbalanced nature of the training data (combined with physiological differences). This is supported by the results which show racial bias but not sex bias when trained using the UK Biobank database, which is sex-balanced but not race-balanced.

Download Full-text

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Remote Sensing ◽

10.3390/rs13030364 ◽

2021 ◽

Vol 13 (3) ◽

pp. 364

Author(s):

Han Gao ◽

Jinhui Guo ◽

Peng Guo ◽

Xiuwan Chen

Keyword(s):

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Large Scale ◽

High Spatial Resolution ◽

Training Data ◽

Aerial Images ◽

Rural Landscapes ◽

Feature Representations ◽

Object Based

Recently, deep learning has become the most innovative trend for a variety of high-spatial-resolution remote sensing imaging applications. However, large-scale land cover classification via traditional convolutional neural networks (CNNs) with sliding windows is computationally expensive and produces coarse results. Additionally, although such supervised learning approaches have performed well, collecting and annotating datasets for every task are extremely laborious, especially for those fully supervised cases where the pixel-level ground-truth labels are dense. In this work, we propose a new object-oriented deep learning framework that leverages residual networks with different depths to learn adjacent feature representations by embedding a multibranch architecture in the deep learning pipeline. The idea is to exploit limited training data at different neighboring scales to make a tradeoff between weak semantics and strong feature representations for operational land cover mapping tasks. We draw from established geographic object-based image analysis (GEOBIA) as an auxiliary module to reduce the computational burden of spatial reasoning and optimize the classification boundaries. We evaluated the proposed approach on two subdecimeter-resolution datasets involving both urban and rural landscapes. It presented better classification accuracy (88.9%) compared to traditional object-based deep learning methods and achieves an excellent inference time (11.3 s/ha).

Download Full-text

Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group

PLoS ONE ◽

10.1371/journal.pone.0140533 ◽

2015 ◽

Vol 10 (10) ◽

pp. e0140533 ◽

Cited By ~ 39

Author(s):

Rebecca Woodfield ◽

Ian Grant ◽

Cathie L. M. Sudlow ◽

◽

Keyword(s):

Systematic Review ◽

Electronic Health Record ◽

Large Scale ◽

Epidemiological Studies ◽

Health Record ◽

Uk Biobank ◽

Electronic Health Record Data ◽

Stroke Outcomes ◽

Record Data ◽

The Uk

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning

Database ◽

10.1093/database/baz116 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Tao Chen ◽

Mingfen Wu ◽

Hexi Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Relation Extraction ◽

Training Model ◽

Biomedical Literature ◽

Training Data ◽

Fine Tuning ◽

Learning Approaches ◽

Additional Time ◽

Clinical Records

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.

Download Full-text

LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets

10.1101/375337 ◽

2018 ◽

Cited By ~ 21

Author(s):

Carla Márquez-Luna ◽

Steven Gazal ◽

Po-Ru Loh ◽

Samuel S. Kim ◽

Nicholas Furlotte ◽

...

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Causal Effect ◽

Complex Trait ◽

Training Data ◽

Data Sets ◽

Uk Biobank ◽

Validation Data ◽

Functional Regions ◽

The Uk

AbstractGenetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.

Download Full-text

S23MACHINE LEARNING METHODS TO PREDICT UNMEASURED PHENOTYPES IN LARGE-SCALE BIOBANK STUDIES: PROOF OF PRINCIPLE USING AUDIT IN THE UK BIOBANK

European Neuropsychopharmacology ◽

10.1016/j.euroneuro.2019.08.024 ◽

2019 ◽

Vol 29 ◽

pp. S125-S126

Author(s):

Amanda Gentry ◽

Roseann Peterson ◽

Alexis Edwards ◽

Brien Riley ◽

B. Todd Webb

Keyword(s):

Large Scale ◽

Uk Biobank ◽

Learning Methods ◽

The Uk ◽

Proof Of Principle

Download Full-text

Genetic variation in the SIM1 locus is associated with erectile dysfunction

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1809872115 ◽

2018 ◽

Vol 115 (43) ◽

pp. 11018-11023 ◽

Cited By ~ 14

Author(s):

Eric Jorgenson ◽

Navneet Matharu ◽

Melody R. Palmer ◽

Jie Yin ◽

Jun Shan ◽

...

Keyword(s):

Risk Factors ◽

Erectile Dysfunction ◽

Sexual Function ◽

Odds Ratio ◽

Large Scale ◽

Genome Wide Association Study ◽

Adult Health ◽

Uk Biobank ◽

Melanocortin System ◽

The Uk

Erectile dysfunction affects millions of men worldwide. Twin studies support the role of genetic risk factors underlying erectile dysfunction, but no specific genetic variants have been identified. We conducted a large-scale genome-wide association study of erectile dysfunction in 36,649 men in the multiethnic Kaiser Permanente Northern California Genetic Epidemiology Research in Adult Health and Aging cohort. We also undertook replication analyses in 222,358 men from the UK Biobank. In the discovery cohort, we identified a single locus (rs17185536-T) on chromosome 6 near the single-minded family basic helix-loop-helix transcription factor 1 (SIM1) gene that was significantly associated with the risk of erectile dysfunction (odds ratio = 1.26, P = 3.4 × 10−25). The association replicated in the UK Biobank sample (odds ratio = 1.25, P = 6.8 × 10−14), and the effect is independent of known erectile dysfunction risk factors, including body mass index (BMI). The risk locus resides on the same topologically associating domain as SIM1 and interacts with the SIM1 promoter, and the rs17185536-T risk allele showed differential enhancer activity. SIM1 is part of the leptin–melanocortin system, which has an established role in body weight homeostasis and sexual function. Because the variants associated with erectile dysfunction are not associated with differences in BMI, our findings suggest a mechanism that is specific to sexual function.

Download Full-text

Fully Automated Myocardial Strain Estimation from Cardiovascular MRI–tagged Images Using a Deep Learning Framework in the UK Biobank

Radiology: Cardiothoracic Imaging ◽

10.1148/ryct.2020190032 ◽

2020 ◽

Vol 2 (1) ◽

pp. e190032

Author(s):

Edward Ferdian ◽

Avan Suinesiaputra ◽

Kenneth Fung ◽

Nay Aung ◽

Elena Lukaschuk ◽

...

Keyword(s):

Deep Learning ◽

Myocardial Strain ◽

Uk Biobank ◽

Cardiovascular Mri ◽

Learning Framework ◽

Strain Estimation ◽

The Uk

Download Full-text

Large-scale analysis of iliopsoas muscle volumes in the UK Biobank

Scientific Reports ◽

10.1038/s41598-020-77351-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Julie A. Fitzpatrick ◽

Nicolas Basty ◽

Madeleine Cule ◽

Yi Liu ◽

Jimmy D. Bell ◽

...

Keyword(s):

Large Scale ◽

Large Population ◽

Psoas Muscle ◽

Magnetic Resonance Images ◽

Muscle Volume ◽

Iliopsoas Muscle ◽

Uk Biobank ◽

Cross Sectional ◽

Automated Method ◽

The Uk

AbstractPsoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurement and manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of the psoas and iliacus muscles) using a convolutional neural network. Magnetic resonance images were obtained from the UK Biobank for 5000 participants, balanced for age, gender and BMI. Ninety manual annotations were available for model training and validation. The model showed excellent performance against out-of-sample data (average dice score coefficient of 0.9046 ± 0.0058 for six-fold cross-validation). Iliopsoas muscle volumes were successfully measured in all 5000 participants. Iliopsoas volume was greater in male compared with female subjects. There was a small but significant asymmetry between left and right iliopsoas muscle volumes. We also found that iliopsoas volume was significantly related to height, BMI and age, and that there was an acceleration in muscle volume decrease in men with age. Our method provides a robust technique for measuring iliopsoas muscle volume that can be applied to large cohorts.

Download Full-text