scholarly journals Non-empirical problems in fair machine learning

Author(s):  
Teresa Scantamburlo

AbstractThe problem of fair machine learning has drawn much attention over the last few years and the bulk of offered solutions are, in principle, empirical. However, algorithmic fairness also raises important conceptual issues that would fail to be addressed if one relies entirely on empirical considerations. Herein, I will argue that the current debate has developed an empirical framework that has brought important contributions to the development of algorithmic decision-making, such as new techniques to discover and prevent discrimination, additional assessment criteria, and analyses of the interaction between fairness and predictive accuracy. However, the same framework has also suggested higher-order issues regarding the translation of fairness into metrics and quantifiable trade-offs. Although the (empirical) tools which have been developed so far are essential to address discrimination encoded in data and algorithms, their integration into society elicits key (conceptual) questions such as: What kind of assumptions and decisions underlies the empirical framework? How do the results of the empirical approach penetrate public debate? What kind of reflection and deliberation should stakeholders have over available fairness metrics? I will outline the empirical approach to fair machine learning, i.e. how the problem is framed and addressed, and suggest that there are important non-empirical issues that should be tackled. While this work will focus on the problem of algorithmic fairness, the lesson can extend to other conceptual problems in the analysis of algorithmic decision-making such as privacy and explainability.

2019 ◽  
Vol 46 (3) ◽  
pp. 205-211 ◽  
Author(s):  
Thomas Grote ◽  
Philipp Berens

In recent years, a plethora of high-profile scientific publications has been reporting about machine learning algorithms outperforming clinicians in medical diagnosis or treatment recommendations. This has spiked interest in deploying relevant algorithms with the aim of enhancing decision-making in healthcare. In this paper, we argue that instead of straightforwardly enhancing the decision-making capabilities of clinicians and healthcare institutions, deploying machines learning algorithms entails trade-offs at the epistemic and the normative level. Whereas involving machine learning might improve the accuracy of medical diagnosis, it comes at the expense of opacity when trying to assess the reliability of given diagnosis. Drawing on literature in social epistemology and moral responsibility, we argue that the uncertainty in question potentially undermines the epistemic authority of clinicians. Furthermore, we elucidate potential pitfalls of involving machine learning in healthcare with respect to paternalism, moral responsibility and fairness. At last, we discuss how the deployment of machine learning algorithms might shift the evidentiary norms of medical diagnosis. In this regard, we hope to lay the grounds for further ethical reflection of the opportunities and pitfalls of machine learning for enhancing decision-making in healthcare.


2020 ◽  
pp. 002224292095734
Author(s):  
Chiara Longoni ◽  
Luca Cian

Rapid development and adoption of AI, machine learning, and natural language processing applications challenge managers and policy makers to harness these transformative technologies. In this context, the authors provide evidence of a novel “word-of-machine” effect, the phenomenon by which utilitarian/hedonic attribute trade-offs determine preference for, or resistance to, AI-based recommendations compared with traditional word of mouth, or human-based recommendations. The word-of-machine effect stems from a lay belief that AI recommenders are more competent than human recommenders in the utilitarian realm and less competent than human recommenders in the hedonic realm. As a consequence, importance or salience of utilitarian attributes determine preference for AI recommenders over human ones, and importance or salience of hedonic attributes determine resistance to AI recommenders over human ones (Studies 1–4). The word-of machine effect is robust to attribute complexity, number of options considered, and transaction costs. The word-of-machine effect reverses for utilitarian goals if a recommendation needs matching to a person’s unique preferences (Study 5) and is eliminated in the case of human–AI hybrid decision making (i.e., augmented rather than artificial intelligence; Study 6). An intervention based on the consider-the-opposite protocol attenuates the word-of-machine effect (Studies 7a–b).


2017 ◽  
Author(s):  
Juan Mateos-Garcia

Algorithmic decision-making systems based on artificial intelligence and machine learning are enabling unprecedented levels of personalisation, recommendation and matching. Unfortunately, these systems are fallible, and their failures have costs. I develop a formal model of algorithmic decision-making and its supervision to explore the trade- offs between more (algorithm-facilitated) beneficial deci- sions and more (algorithm-caused) costly errors. The model highlights the importance of algorithm accuracy and human supervision in high-stakes environments where the costs of error are high, and shows how decreasing returns to scale in algorithmic accuracy, increasing incentives to ’game’ popular algorithms, and cost inflation in human supervision might constrain optimal levels of algorithmic decision-making.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1552-P
Author(s):  
KAZUYA FUJIHARA ◽  
MAYUKO H. YAMADA ◽  
YASUHIRO MATSUBAYASHI ◽  
MASAHIKO YAMAMOTO ◽  
TOSHIHIRO IIZUKA ◽  
...  

2018 ◽  
Author(s):  
David John Stracuzzi ◽  
Michael Christopher Darling ◽  
Matthew Gregor Peterson ◽  
Maximillian Gene Chen

2020 ◽  
Author(s):  
Uzair Bhatti

BACKGROUND In the era of health informatics, exponential growth of information generated by health information systems and healthcare organizations demands expert and intelligent recommendation systems. It has become one of the most valuable tools as it reduces problems such as information overload while selecting and suggesting doctors, hospitals, medicine, diagnosis etc according to patients’ interests. OBJECTIVE Recommendation uses Hybrid Filtering as one of the most popular approaches, but the major limitations of this approach are selectivity and data integrity issues.Mostly existing recommendation systems & risk prediction algorithms focus on a single domain, on the other end cross-domain hybrid filtering is able to alleviate the degree of selectivity and data integrity problems to a better extent. METHODS We propose a novel algorithm for recommendation & predictive model using KNN algorithm with machine learning algorithms and artificial intelligence (AI). We find the factors that directly impact on diseases and propose an approach for predicting the correct diagnosis of different diseases. We have constructed a series of models with good reliability for predicting different surgery complications and identified several novel clinical associations. We proposed a novel algorithm pr-KNN to use KNN for prediction and recommendation of diseases RESULTS Beside that we compared the performance of our algorithm with other machine algorithms and found better performance of our algorithm, with predictive accuracy improving by +3.61%. CONCLUSIONS The potential to directly integrate these predictive tools into EHRs may enable personalized medicine and decision-making at the point of care for patient counseling and as a teaching tool. CLINICALTRIAL dataset for the trials of patient attached


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a Drug Early Warning System Model (DEWSM), it included drug injections and vital signs as this research important features. We use it to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose this new model for detecting cardiac arrest via drug classification and by using a sliding window; we apply learning-based algorithms to time-series data for a DEWSM. By treating drug features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model, we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits (intravenous therapy), and replenishers and regulators of water and electrolytes (fluid and electrolyte supplement). The best AUROC of bits is 85%, it means the medical expert suggest the drug features: bits, it will affect the vital signs, and then the evaluate this model correctly classified patients with CPR reach 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. It can be seen that the use of new AI technology will achieve better results, currently comparable to the accuracy of traditional common RF, and the LSTM model can be adjusted in the future to obtain better results. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. The National Early Warning Score (NEWS) only focuses on the score of vital signs, and does not include factors related to drug injections. In this study, the experimental results of adding the drug injections are better than only vital signs. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, we use traditional machine learning methods and deep learning (using LSTM method as the main processing time series data) as the basis for comparison of this research. The proposed DEWSM, which offers 4-hour predictions, is better than the NEWS in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


2019 ◽  
Vol 3 (s1) ◽  
pp. 60-61
Author(s):  
Kadie Clancy ◽  
Esmaeel Dadashzadeh ◽  
Christof Kaltenmeier ◽  
JB Moses ◽  
Shandong Wu

OBJECTIVES/SPECIFIC AIMS: This retrospective study aims to create and train machine learning models using a radiomic-based feature extraction method for two classification tasks: benign vs. pathologic PI and operation of benefit vs. operation not needed. The long-term goal of our study is to build a computerized model that incorporates both radiomic features and critical non-imaging clinical factors to improve current surgical decision-making when managing PI patients. METHODS/STUDY POPULATION: Searched radiology reports from 2010-2012 via the UPMC MARS Database for reports containing the term “pneumatosis” (subsequently accounting for negations and age restrictions). Our inclusion criteria included: patient age 18 or older, clinical data available at time of CT diagnosis, and PI visualized on manual review of imaging. Cases with intra-abdominal free air were excluded. Collected CT imaging data and an additional 149 clinical data elements per patient for a total of 75 PI cases. Data collection of an additional 225 patients is ongoing. We trained models for two clinically-relevant prediction tasks. The first (referred to as prediction task 1) classifies between benign and pathologic PI. Benign PI is defined as either lack of intraoperative visualization of transmural intestinal necrosis or successful non-operative management until discharge. Pathologic PI is defined as either intraoperative visualization of transmural PI or withdrawal of care and subsequent death during hospitalization. The distribution of data samples for prediction task 1 is 47 benign cases and 38 pathologic cases. The second (referred to as prediction task 2) classifies between whether the patient benefitted from an operation or not. “Operation of benefit” is defined as patients with PI, be it transmural or simply mucosal, who benefited from an operation. “Operation not needed” is defined as patients who were safely discharged without an operation or patients who had an operation, but nothing was found. The distribution of data samples for prediction task 2 is 37 operation not needed cases and 38 operation of benefit cases. An experienced surgical resident from UPMC manually segmented 3D PI ROIs from the CT scans (5 mm Axial cut) for each case. The most concerning ~10-15 cm segment of bowel for necrosis with a 1 cm margin was selected. A total of 7 slices per patient were segmented for consistency. For both prediction task 1 and prediction task 2, we independently completed the following procedure for testing and training: 1.) Extracted radiomic features from the 3D PI ROIs that resulted in 99 total features. 2.) Used LASSO feature selection to determine the subset of the original 99 features that are most significant for performance of the prediction task. 3.) Used leave-one-out cross-validation for testing and training to account for the small dataset size in our preliminary analysis. Implemented and trained several machine learning models (AdaBoost, SVM, and Naive Bayes). 4.) Evaluated the trained models in terms of AUC and Accuracy and determined the ideal model structure based on these performance metrics. RESULTS/ANTICIPATED RESULTS: Prediction Task 1: The top-performing model for this task was an SVM model trained using 19 features. This model had an AUC of 0.79 and an accuracy of 75%. Prediction Task 2: The top-performing model for this task was an SVM model trained using 28 features. This model had an AUC of 0.74 and an accuracy of 64%. DISCUSSION/SIGNIFICANCE OF IMPACT: To the best of our knowledge, this is the first study to use radiomic-based machine learning models for the prediction of tissue ischemia, specifically intestinal ischemia in the setting of PI. In this preliminary study, which serves as a proof of concept, the performance of our models has demonstrated the potential of machine learning based only on radiomic imaging features to have discriminative power for surgical decision-making problems. While many non-imaging-related clinical factors play a role in the gestalt of clinical decision making when PI presents, we have presented radiomic-based models that may augment this decision-making process, especially for more difficult cases when clinical features indicating acute abdomen are absent. It should be noted that prediction task 2, whether or not a patient presenting with PI would benefit from an operation, has lower performance than prediction task 1 and is also a more challenging task for physicians in real clinical environments. While our results are promising and demonstrate potential, we are currently working to increase our dataset to 300 patients to further train and assess our models. References DuBose, Joseph J., et al. “Pneumatosis Intestinalis Predictive Evaluation Study (PIPES): a multicenter epidemiologic study of the Eastern Association for the Surgery of Trauma.” Journal of Trauma and Acute Care Surgery 75.1 (2013): 15-23. Knechtle, Stuart J., Andrew M. Davidoff, and Reed P. Rice. “Pneumatosis intestinalis. Surgical management and clinical outcome.” Annals of Surgery 212.2 (1990): 160.


Sign in / Sign up

Export Citation Format

Share Document