Confederated learning in healthcare: training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale Health System Intelligence (Preprint)

2020 ◽  
Author(s):  
Dianbo Liu ◽  
Kathe Fox ◽  
Griffin Weber ◽  
Tim Miller

BACKGROUND A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. OBJECTIVE Machine learning can be conducted in a federated manner on patient datasets with the same set of variables, but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning.” We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. METHODS The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements.The confederated learning method we developed consists of three steps: Step 1) Conditional generative adversarial networks with matching loss (cGAN) were trained using data from the central analyzer to infer one data type from another, for example, inferring medications using diagnoses. Generative (cGAN) models were used in this study because a considerable percentage of individuals has not paired data types. For instance, a patient may only have his or her diagnoses in the database but not medication information due to insurance enrolment. cGAN can utilize data with paired information by minimizing matching loss and data without paired information by minimizing adversarial loss. Step 2) Missing data types from each silo were inferred using the model trained in step 1. Step 3) Task-specific models, such as a model to predict diagnoses of diabetes, were trained in a federated manner across all silos simultaneously. RESULTS We conducted experiments to train disease prediction models using confederated learning on a large nationwide health insurance dataset from the U.S that is split into 99 silos. The models stratify individuals by their risk of diabetes, psychological disorders or ischemic heart disease in the next two years, using diagnoses, medication claims and clinical lab test records of patients (See Methods section for details). The goal of these experiments is to test whether a confederated learning approach can simultaneously address the two types of separation mentioned above. CONCLUSIONS we demonstrated that health data distributed across silos separated by individual and data type can be used to train machine learning models without moving or aggregating data. Our method obtains predictive accuracy competitive to a centralized upper bound in predicting risks of diabetes, psychological disorders or ischemic heart disease using previous diagnoses, medications and lab tests as inputs. We compared the performance of a confederated learning approach with models trained on centralized data, only data with the central analyzer or a single data type across silos. The experimental results suggested that confederated learning trained predictive models efficiently across disconnected silos. CLINICALTRIAL NA

2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


Author(s):  
Hao Li ◽  
Zhijian Liu

Measuring the performance of solar energy and heat transfer systems requires a lot of time, economic cost, and manpower. Meanwhile, directly predicting their performance is challenging due to the complicated internal structures. Fortunately, a knowledge-based machine learning method can provide a promising prediction and optimization strategy for the performance of energy systems. In this chapter, the authors show how they utilize the machine learning models trained from a large experimental database to perform precise prediction and optimization on a solar water heater (SWH) system. A new energy system optimization strategy based on a high-throughput screening (HTS) process is proposed. This chapter consists of: 1) comparative studies on varieties of machine learning models (artificial neural networks [ANNs], support vector machine [SVM], and extreme learning machine [ELM]) to predict the performances of SWHs; 2) development of an ANN-based software to assist the quick prediction; and 3) introduction of a computational HTS method to design a high-performance SWH system.


2022 ◽  
pp. 181-194
Author(s):  
Bala Krishna Priya G. ◽  
Jabeen Sultana ◽  
Usha Rani M.

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.


2021 ◽  
Vol 2021 (3) ◽  
pp. 453-473
Author(s):  
Nathan Reitinger ◽  
Michelle L. Mazurek

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.


Author(s):  
M Preethi ◽  
J Selvakumar

This paper describes various methods of data mining, big data and machine learning models for predicting the heart disease. Data mining and machine learning plays an important role in building an important model for medical system to predict heart disease or cardiovascular disease. Medical experts can help the patients by detecting the cardiovascular disease before occurring. Now-a-days heart disease is one of the most significant causes of fatality. The prediction of heart disease is a critical challenge in the clinical area. But time to time, several techniques are discovered to predict the heart disease in data mining. In this survey paper, many techniques were described for predicting the heart disease.


Author(s):  
Hitesh Shrivastava

The project aims to help the users get the idea if he/she may be suffering from heart disease or not. Web development is the work involved in developing a Website for the web (world wide web) or an intranet (a private network).Web development can range from Developing an easy single static page of plain text to complex web applications, electronic business and social networking services. The main goal of this website (SYMPTOMATIC ASSISTANCE) is to predict the possibility of having Heart disease. For this , the user needs to provide some information regarding their health. Such as blood pressure, glucose, cigarettes per day etc. According to which the website will respond. This will make people aware and help them improve their health. Machine learning is a method of data analysis that automates analytical model building. It is a branch of AI supported the thought that systems can learn from data, identify patterns and make decisions with minimal human intervention. For this we chose the best dataset from kaggle and used it in the best possible way to predict the output with high accuracy. For being able to predict the correct output, we applied a few machine learning models and chose the best fitted algorithm according to accuracy. For connecting machine Learning models with the webpages we used Flask. Flask is a micro framework written in Python. It is classified as a microframework because it doesn't require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. However, Flask supports extensions which will add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.At the end will deploy our project using Heroku. Heroku is a cloud Platform as a service (PaaS) supporting several programming languages. One of the first cloud platforms, This project will make it easy for the user to know their health closely.


10.2196/24572 ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. e24572
Author(s):  
Juan Carlos Quiroz ◽  
You-Zhen Feng ◽  
Zhong-Yuan Cheng ◽  
Dana Rezazadegan ◽  
Ping-Kang Chen ◽  
...  

Background COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.


Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2288
Author(s):  
Kaixiang Su ◽  
Jiao Wu ◽  
Dongxiao Gu ◽  
Shanlin Yang ◽  
Shuyuan Deng ◽  
...  

Increasingly, machine learning methods have been applied to aid in diagnosis with good results. However, some complex models can confuse physicians because they are difficult to understand, while data differences across diagnostic tasks and institutions can cause model performance fluctuations. To address this challenge, we combined the Deep Ensemble Model (DEM) and tree-structured Parzen Estimator (TPE) and proposed an adaptive deep ensemble learning method (TPE-DEM) for dynamic evolving diagnostic task scenarios. Different from previous research that focuses on achieving better performance with a fixed structure model, our proposed model uses TPE to efficiently aggregate simple models more easily understood by physicians and require less training data. In addition, our proposed model can choose the optimal number of layers for the model and the type and number of basic learners to achieve the best performance in different diagnostic task scenarios based on the data distribution and characteristics of the current diagnostic task. We tested our model on one dataset constructed with a partner hospital and five UCI public datasets with different characteristics and volumes based on various diagnostic tasks. Our performance evaluation results show that our proposed model outperforms other baseline models on different datasets. Our study provides a novel approach for simple and understandable machine learning models in tasks with variable datasets and feature sets, and the findings have important implications for the application of machine learning models in computer-aided diagnosis.


Sign in / Sign up

Export Citation Format

Share Document