scholarly journals Algorithmic Fairness and Bias Mitigation for Clinical Machine Learning: Insights from Rapid COVID-19 Diagnosis by Adversarial Learning

Author(s):  
Jenny Yang ◽  
Andrew AS Soltan ◽  
Yang Yang ◽  
David A Clifton

Machine learning is becoming increasingly promi- nent in healthcare. Although its benefits are clear, growing attention is being given to how machine learning may exacerbate existing biases and disparities. In this study, we introduce an adversarial training framework that is capable of mitigating biases that may have been acquired through data collection or magnified during model development. For example, if one class is over-presented or errors/inconsistencies in practice are reflected in the training data, then a model can be biased by these. To evaluate our adversarial training framework, we used the statistical definition of equalized odds. We evaluated our model for the task of rapidly predicting COVID-19 for patients presenting to hospital emergency departments, and aimed to mitigate regional (hospital) and ethnic biases present. We trained our framework on a large, real-world COVID-19 dataset and demonstrated that adversarial training demonstrably improves outcome fairness (with respect to equalized odds), while still achieving clinically-effective screening performances (NPV>0.98). We compared our method to the benchmark set by related previous work, and performed prospective and external validation on four independent hospital cohorts. Our method can be generalized to any outcomes, models, and definitions of fairness.

2021 ◽  
Vol 9 ◽  
Author(s):  
Fu-Sheng Chou ◽  
Laxmi V. Ghimire

Background: Pediatric myocarditis is a rare disease. The etiologies are multiple. Mortality associated with the disease is 5–8%. Prognostic factors were identified with the use of national hospitalization databases. Applying these identified risk factors for mortality prediction has not been reported.Methods: We used the Kids' Inpatient Database for this project. We manually curated fourteen variables as predictors of mortality based on the current knowledge of the disease, and compared performance of mortality prediction between linear regression models and a machine learning (ML) model. For ML, the random forest algorithm was chosen because of the categorical nature of the variables. Based on variable importance scores, a reduced model was also developed for comparison.Results: We identified 4,144 patients from the database for randomization into the primary (for model development) and testing (for external validation) datasets. We found that the conventional logistic regression model had low sensitivity (~50%) despite high specificity (>95%) or overall accuracy. On the other hand, the ML model struck a good balance between sensitivity (89.9%) and specificity (85.8%). The reduced ML model with top five variables (mechanical ventilation, cardiac arrest, ECMO, acute kidney injury, ventricular fibrillation) were sufficient to approximate the prediction performance of the full model.Conclusions: The ML algorithm performs superiorly when compared to the linear regression model for mortality prediction in pediatric myocarditis in this retrospective dataset. Prospective studies are warranted to further validate the applicability of our model in clinical settings.


Cancers ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3817
Author(s):  
Shi-Jer Lou ◽  
Ming-Feng Hou ◽  
Hong-Tai Chang ◽  
Chong-Chi Chiu ◽  
Hao-Hsien Lee ◽  
...  

No studies have discussed machine learning algorithms to predict recurrence within 10 years after breast cancer surgery. This study purposed to compare the accuracy of forecasting models to predict recurrence within 10 years after breast cancer surgery and to identify significant predictors of recurrence. Registry data for breast cancer surgery patients were allocated to a training dataset (n = 798) for model development, a testing dataset (n = 171) for internal validation, and a validating dataset (n = 171) for external validation. Global sensitivity analysis was then performed to evaluate the significance of the selected predictors. Demographic characteristics, clinical characteristics, quality of care, and preoperative quality of life were significantly associated with recurrence within 10 years after breast cancer surgery (p < 0.05). Artificial neural networks had the highest prediction performance indices. Additionally, the surgeon volume was the best predictor of recurrence within 10 years after breast cancer surgery, followed by hospital volume and tumor stage. Accurate recurrence within 10 years prediction by machine learning algorithms may improve precision in managing patients after breast cancer surgery and improve understanding of risk factors for recurrence within 10 years after breast cancer surgery.


2014 ◽  
Vol 14 (S2) ◽  
Author(s):  
Francesca Mataloni ◽  
Mariangela D’Ovidio ◽  
Mirko Di Martino ◽  
Paolo Sciattella ◽  
Marina Davoli ◽  
...  

2020 ◽  
Author(s):  
Govinda KC ◽  
Giovanni Bocci ◽  
Srijan Verma ◽  
Mahmudulla Hassan ◽  
Jayme Holmes ◽  
...  

<p>Strategies for drug discovery and repositioning are an urgent need with respect to COVID-19. We developed "REDIAL-2020", a suite of machine learning models for estimating small molecule activity from molecular structure, for a range of SARS-CoV-2 related assays. Each classifier is based on three distinct types of descriptors (fingerprint, physicochemical, and pharmacophore) for parallel model development. These models were trained using high throughput screening data from the NCATS COVID19 portal (https://opendata.ncats.nih.gov/covid19/index.html), with multiple categorical machine learning algorithms. The “best models” are combined in an ensemble consensus predictor that outperforms single models where external validation is available. This suite of machine learning models is available through the DrugCentral web portal (<a href="https://drugdiscovery.utep.edu/redial">http://drugcentral.org/Redial</a>). Acceptable input formats are: drug name, PubChem CID, or SMILES; the output is an estimate of anti-SARS-CoV-2 activities. The web application reports estimated activity across three areas (<i>viral entry</i>, <i>viral replication,</i> and <i>live virus infectivity</i>) spanning six independent models, followed by a similarity search that displays the most similar molecules to the query among experimentally determined data. The ML models have 60% to 74% external predictivity, based on three separate datasets. Complementing the NCATS COVID19 portal, REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment. The source code and specific models are available through Github (<a href="https://github.com/sirimullalab/ncats_covid">https://github.com/sirimullalab/</a>redial-2020), or via Docker Hub (https://hub.docker.com/r/sirimullalab/redial-2020) for users preferring a containerized version.</p>


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11988
Author(s):  
Kuan-Han Wu ◽  
Fu-Jen Cheng ◽  
Hsiang-Ling Tai ◽  
Jui-Cheng Wang ◽  
Yii-Ting Huang ◽  
...  

Background A feasible and accurate risk prediction systems for emergency department (ED) patients is urgently required. The Modified Early Warning Score (MEWS) is a wide-used tool to predict clinical outcomes in ED. Literatures showed that machine learning (ML) had better predictability in specific patient population than traditional scoring system. By analyzing a large multicenter dataset, we aim to develop a ML model to predict in-hospital morality of the adult non traumatic ED patients for different time stages, and comparing performance with other ML models and MEWS. Methods A retrospective observational cohort study was conducted in five Taiwan EDs including two tertiary medical centers and three regional hospitals. All consecutively adult (>17 years old) non-traumatic patients admit to ED during a 9-year period (January first, 2008 to December 31th, 2016) were included. Exclusion criteria including patients with (1) out-of-hospital cardiac arrest and (2) discharge against medical advice and transferred to other hospital (3) missing collect variables. The primary outcome was in-hospital mortality and were categorized into 6, 24, 72, 168 hours mortality. MEWS was calculated by systolic blood pressure, pulse rate, respiratory rate, body temperature, and level of consciousness. An ensemble supervised stacking ML model was developed and compared to sensitive and unsensitive Xgboost, Random Forest, and Adaboost. We conducted a performance test and examine both the area under the receiver operating characteristic (AUROC) and the area under the precision and recall curve (AUPRC) as the comparative measures. Result After excluding 182,001 visits (7.46%), study group was consisted of 24,37,326 ED visits. The dataset was split into 67% training data and 33% test data for ML model development. There was no statistically difference found in the characteristics between two groups. For the prediction of 6, 24, 72, 168 hours in-hospital mortality, the AUROC of MEW and ML mode was 0.897, 0.865, 0.841, 0.816 and 0.939, 0.928, 0.913, 0.902 respectively. The stacking ML model outperform other ML model as well. For the prediction of in-hospital mortality over 48-hours, AUPRC performance of MEWS drop below 0.1, while the AUPRC of ML mode was 0.317 in 6 hours and 0.2150 in 168 hours. For each time frame, ML model achieved statistically significant higher AUROC and AUPRC than MEWS (all P < 0.001). Both models showed decreasing prediction ability as time elapse, but there was a trend that the gap of AUROC values between two model increases gradually (P < 0.001). Three MEWS thresholds (score >3, >4, and >5) were determined as baselines for comparison, ML mode consistently showed improved or equally performance in sensitivity, PPV, NPV, but not in specific. Conclusion Stacking ML methods improve predicted in-hospital mortality than MEWS in adult non-traumatic ED patients, especially in the prediction of delayed mortality.


2017 ◽  
Vol 54 (2) ◽  
pp. 193-214 ◽  
Author(s):  
Michael Colaresi ◽  
Zuhaib Mahmood

Increasingly, scholars interested in understanding conflict processes have turned to evaluating out-of-sample forecasts to judge and compare the usefulness of their models. Research in this vein has made significant progress in identifying and avoiding the problem of overfitting sample data. Yet there has been less research providing strategies and tools to practically improve the out-of-sample performance of existing models and connect forecasting improvement to the goal of theory development in conflict studies. In this article, we fill this void by building on lessons from machine learning research. We highlight a set of iterative tasks, which David Blei terms ‘Box’s loop’, that can be summarized as build, compute, critique, and think. While the initial steps of Box’s loop will be familiar to researchers, the underutilized process of model criticism allows researchers to iteratively learn more useful representations of the data generation process from the discrepancies between the trained model and held-out data. To benefit from iterative model criticism, we advise researchers not only to split their available data into separate training and test sets, but also sample from their training data to allow for iterative model development, as is common in machine learning applications. Since practical tools for model criticism in particular are underdeveloped, we also provide software for new visualizations that build upon already existing tools. We use models of civil war onset to provide an illustration of how our machine learning-inspired research design can simultaneously improve out-of-sample forecasting performance and identify useful theoretical contributions. We believe these research strategies can complement existing designs to accelerate innovations across conflict processes.


2021 ◽  
Vol 11 (12) ◽  
pp. 1271
Author(s):  
Jaehyeong Cho ◽  
Jimyung Park ◽  
Eugene Jeong ◽  
Jihye Shin ◽  
Sangjeong Ahn ◽  
...  

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.


2020 ◽  
Author(s):  
Govinda KC ◽  
Giovanni Bocci ◽  
Srijan Verma ◽  
Mahmudulla Hassan ◽  
Jayme Holmes ◽  
...  

<p>Strategies for drug discovery and repositioning are an urgent need with respect to COVID-19. We developed "REDIAL-2020", a suite of machine learning models for estimating small molecule activity from molecular structure, for a range of SARS-CoV-2 related assays. Each classifier is based on three distinct types of descriptors (fingerprint, physicochemical, and pharmacophore) for parallel model development. These models were trained using high throughput screening data from the NCATS COVID19 portal (https://opendata.ncats.nih.gov/covid19/index.html), with multiple categorical machine learning algorithms. The “best models” are combined in an ensemble consensus predictor that outperforms single models where external validation is available. This suite of machine learning models is available through the DrugCentral web portal (<a href="https://drugdiscovery.utep.edu/redial">http://drugcentral.org/Redial</a>). Acceptable input formats are: drug name, PubChem CID, or SMILES; the output is an estimate of anti-SARS-CoV-2 activities. The web application reports estimated activity across three areas (<i>viral entry</i>, <i>viral replication,</i> and <i>live virus infectivity</i>) spanning six independent models, followed by a similarity search that displays the most similar molecules to the query among experimentally determined data. The ML models have 60% to 74% external predictivity, based on three separate datasets. Complementing the NCATS COVID19 portal, REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment. The source code and specific models are available through Github (<a href="https://github.com/sirimullalab/ncats_covid">https://github.com/sirimullalab/</a>redial-2020), or via Docker Hub (https://hub.docker.com/r/sirimullalab/redial-2020) for users preferring a containerized version.</p>


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yae Won Park ◽  
Dongmin Choi ◽  
Ji Eun Park ◽  
Sung Soo Ahn ◽  
Hwiyoung Kim ◽  
...  

AbstractThe purpose of this study was to establish a high-performing radiomics strategy with machine learning from conventional and diffusion MRI to differentiate recurrent glioblastoma (GBM) from radiation necrosis (RN) after concurrent chemoradiotherapy (CCRT) or radiotherapy. Eighty-six patients with GBM were enrolled in the training set after they underwent CCRT or radiotherapy and presented with new or enlarging contrast enhancement within the radiation field on follow-up MRI. A diagnosis was established either pathologically or clinicoradiologically (63 recurrent GBM and 23 RN). Another 41 patients (23 recurrent GBM and 18 RN) from a different institution were enrolled in the test set. Conventional MRI sequences (T2-weighted and postcontrast T1-weighted images) and ADC were analyzed to extract 263 radiomic features. After feature selection, various machine learning models with oversampling methods were trained with combinations of MRI sequences and subsequently validated in the test set. In the independent test set, the model using ADC sequence showed the best diagnostic performance, with an AUC, accuracy, sensitivity, specificity of 0.80, 78%, 66.7%, and 87%, respectively. In conclusion, the radiomics models models using other MRI sequences showed AUCs ranging from 0.65 to 0.66 in the test set. The diffusion radiomics may be helpful in differentiating recurrent GBM from RN..


Sign in / Sign up

Export Citation Format

Share Document