Validating the Early Prediction Model for COPD Patients Care through a Federated Machine Learning Architecture on FAIR Data (Preprint)

2021 ◽  
Author(s):  
Celia ALVAREZ-ROMERO ◽  
Alicia MARTÍNEZ-GARCÍA ◽  
Jara Eloisa TERNERO-VEGA ◽  
Pablo DÍAZ-JIMÉNEZ ◽  
Carlos JIMÉNEZ-DE-JUAN ◽  
...  

BACKGROUND Due to the nature of health data, its sharing and reuse for research are limited by legal, technical and ethical implications. In this sense, to address that challenge, and facilitate and promote the discovery of scientific knowledge, the FAIR (Findable, Accessible, Interoperable, and Reusable) principles help organizations to share research data in a secure, appropriate and useful way for other researchers. OBJECTIVE The objective of this study was the FAIRification of health research existing datasets and applying a federated machine learning architecture on top of the FAIRified datasets of different health research performing organizations. The whole FAIR4Health solution was validated through the assessment of the generated model for real-time prediction of 30-days readmission risk in patients with Chronic Obstructive Pulmonary Disease (COPD). METHODS The application of the FAIR principles in health research datasets in three different health care settings enabled a retrospective multicenter study for the generation of federated machine learning models, aiming to develop the early prediction model for 30-days readmission risk in COPD patients. This prediction model was implemented upon the FAIR4Health platform and, finally, an observational prospective study with 30-days follow-up was carried out in two health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective parts of the study. RESULTS The prediction model for the 30-days hospital readmission risk was trained using the retrospective data of 4.944 COPD patients. The assessment of the prediction model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients in total for the observational prospective study from April 2021 to September 2021. The significant accuracy (0.98) and precision (0.25) of the prediction model generated upon the FAIR4Health platform was observed and, as a result, the generated prediction of 30-day readmission risk was confirmed in 87% of the cases. CONCLUSIONS A clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified datasets from different health research performing organizations, providing an assessment for predicting 30-days readmission risk in COPD patients. This demonstration allowed to state the relevance and need of implementing a FAIR data policy to facilitate data sharing and reuse in health research.

Water ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 2516 ◽  
Author(s):  
Changhyun Choi ◽  
Jeonghwan Kim ◽  
Jungwook Kim ◽  
Hung Soo Kim

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.


2018 ◽  
Author(s):  
Jaram Park ◽  
Jeong-Whun Kim ◽  
Borim Ryu ◽  
Eunyoung Heo ◽  
Se Young Jung ◽  
...  

BACKGROUND Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension. OBJECTIVE Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level. METHODS We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients’ medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics. RESULTS Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered. CONCLUSIONS We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.


2021 ◽  
Author(s):  
Xurui Jin ◽  
Yiyang Sun ◽  
Tinglong Zhu ◽  
Yu Leng ◽  
Shuyi Guan ◽  
...  

AbstractBackground and aimMortality risk stratification was vital for targeted intervention. This study aimed at building the prediction model of all-cause mortality among Chinese dwelling elderly with different methods including regression models and machine learning models and to compare the performance of machine learning models with regression model on predicting mortality. Additionally, this study also aimed at ranking the predictors of mortality within different models and comparing the predictive value of different groups of predictors using the model with best performance.MethodI used data from the sub-study of Chinese Longitudinal Healthy Longevity Survey (CLHLS) - Healthy Ageing and Biomarkers Cohort Study (HABCS). The baseline survey of HABCS was conducted in 2008 and covered similar domains that CLHLS has investigated and shared the sampling strategy. The follow-up of HABCS was conducted every 2-3 years till 2018.The analysis sample included 2,448 participants from HABCS. I used totally 117 predictors to build the prediction model for survival using the HABCS cohort, including 61 questionnaire, 41 biomarker and 15 genetics predictors. Four models were built (XG-Boost, random survival forest [RSF], Cox regression with all variables and Cox-backward). We used C-index and integrated Brier score (Brier score for the two years’ mortality prediction model) to evaluate the performance of those models.ResultsThe XG-Boost model and RSF model shows slightly better predictive performance than Cox models and Cox-backward models based on the C-index and integrated Brier score in predicting surviving. Age. Activity of daily living and Mini-Mental State Examination score were identified as the top 3 predictors in the XG-Boost and RSF models. Biomarker and questionnaire predictors have a similar predictive value, while genetic predictors have no addictive predictive value when combined with questionnaire or biomarker predictors.ConclusionIn this work, it is shown that machine learning techniques can be a useful tool for both prediction and its performance sightly outperformed the regression model in predicting survival.


2020 ◽  
Author(s):  
Mahdieh Montazeri ◽  
Roxana ZahediNasab ◽  
Ali Farahani ◽  
Hadis Mohseni ◽  
Fahimeh Ghasemian

BACKGROUND Accurate and timely diagnosis and effective prognosis of the disease is important to provide the best possible care for patients with COVID-19 and reduce the burden on the health care system. Machine learning methods can play a vital role in the diagnosis of COVID-19 by processing chest x-ray images. OBJECTIVE The aim of this study is to summarize information on the use of intelligent models for the diagnosis and prognosis of COVID-19 to help with early and timely diagnosis, minimize prolonged diagnosis, and improve overall health care. METHODS A systematic search of databases, including PubMed, Web of Science, IEEE, ProQuest, Scopus, bioRxiv, and medRxiv, was performed for COVID-19–related studies published up to May 24, 2020. This study was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. All original research articles describing the application of image processing for the prediction and diagnosis of COVID-19 were considered in the analysis. Two reviewers independently assessed the published papers to determine eligibility for inclusion in the analysis. Risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool. RESULTS Of the 629 articles retrieved, 44 articles were included. We identified 4 prognosis models for calculating prediction of disease severity and estimation of confinement time for individual patients, and 40 diagnostic models for detecting COVID-19 from normal or other pneumonias. Most included studies used deep learning methods based on convolutional neural networks, which have been widely used as a classification algorithm. The most frequently reported predictors of prognosis in patients with COVID-19 included age, computed tomography data, gender, comorbidities, symptoms, and laboratory findings. Deep convolutional neural networks obtained better results compared with non–neural network–based methods. Moreover, all of the models were found to be at high risk of bias due to the lack of information about the study population, intended groups, and inappropriate reporting. CONCLUSIONS Machine learning models used for the diagnosis and prognosis of COVID-19 showed excellent discriminative performance. However, these models were at high risk of bias, because of various reasons such as inadequate information about study participants, randomization process, and the lack of external validation, which may have resulted in the optimistic reporting of these models. Hence, our findings do not recommend any of the current models to be used in practice for the diagnosis and prognosis of COVID-19.


2020 ◽  
Vol 19 (05) ◽  
pp. 1177-1187
Author(s):  
Fuad Aleskerov ◽  
Sergey Demin ◽  
Michael B. Richman ◽  
Sergey Shvydun ◽  
Theodore B. Trafalis ◽  
...  

Tornado prediction variables are analyzed using machine learning and decision analysis techniques. A model based on several choice procedures and the superposition principle is applied for different methods of data analysis. The constructed model has been tested on a database of tornadic events. It is shown that the tornado prediction model developed herein is more efficient than a previous set of machine learning models, opening the way to more accurate decisions.


JAMA ◽  
2020 ◽  
Vol 323 (4) ◽  
pp. 305 ◽  
Author(s):  
Andrew L. Beam ◽  
Arjun K. Manrai ◽  
Marzyeh Ghassemi

Author(s):  
Jiancheng Ye ◽  
Liang Yao ◽  
Jiahong Shen ◽  
Rethavathi Janarthanam ◽  
Yuan Luo

Abstract Background Diabetes mellitus is a prevalent metabolic disease characterized by chronic hyperglycemia. The avalanche of healthcare data is accelerating precision and personalized medicine. Artificial intelligence and algorithm-based approaches are becoming more and more vital to support clinical decision-making. These methods are able to augment health care providers by taking away some of their routine work and enabling them to focus on critical issues. However, few studies have used predictive modeling to uncover associations between comorbidities in ICU patients and diabetes. This study aimed to use Unified Medical Language System (UMLS) resources, involving machine learning and natural language processing (NLP) approaches to predict the risk of mortality. Methods We conducted a secondary analysis of Medical Information Mart for Intensive Care III (MIMIC-III) data. Different machine learning modeling and NLP approaches were applied. Domain knowledge in health care is built on the dictionaries created by experts who defined the clinical terminologies such as medications or clinical symptoms. This knowledge is valuable to identify information from text notes that assert a certain disease. Knowledge-guided models can automatically extract knowledge from clinical notes or biomedical literature that contains conceptual entities and relationships among these various concepts. Mortality classification was based on the combination of knowledge-guided features and rules. UMLS entity embedding and convolutional neural network (CNN) with word embeddings were applied. Concept Unique Identifiers (CUIs) with entity embeddings were utilized to build clinical text representations. Results The best configuration of the employed machine learning models yielded a competitive AUC of 0.97. Machine learning models along with NLP of clinical notes are promising to assist health care providers to predict the risk of mortality of critically ill patients. Conclusion UMLS resources and clinical notes are powerful and important tools to predict mortality in diabetic patients in the critical care setting. The knowledge-guided CNN model is effective (AUC = 0.97) for learning hidden features.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jing Tao ◽  
Zhenming Yuan ◽  
Li Sun ◽  
Kai Yu ◽  
Zhifen Zhang

Abstract Background Birthweight is an important indicator during the fetal development process to protect the maternal and infant safety. However, birthweight is difficult to be directly measured, and is usually roughly estimated by the empirical formulas according to the experience of the doctors in clinical practice. Methods This study attempts to combine multiple electronic medical records with the B-ultrasonic examination of pregnant women to construct a hybrid birth weight predicting classifier based on long short-term memory (LSTM) networks. The clinical data were collected from 5,759 Chinese pregnant women who have given birth, with more than 57,000 obstetric electronic medical records. We evaluated the prediction by the mean relative error (MRE) and the accuracy rate of different machine learning classifiers at different predicting periods for first delivery and multiple deliveries. Additionally, we evaluated the classification accuracies of different classifiers respectively for the Small-for-Gestational-age (SGA), Large-for-Gestational-Age (LGA) and Appropriate-for-Gestational-Age (AGA) groups. Results The results show that the accuracy rate of the prediction model using Convolutional Neuron Networks (CNN), Random Forest (RF), Linear-Regression, Support Vector Regression (SVR), Back Propagation Neural Network(BPNN), and the proposed hybrid-LSTM at the 40th pregnancy week for first delivery were 0.498, 0.662, 0.670, 0.680, 0.705 and 0.793, respectively. Among the groups of less than 39th pregnancy week, the 39th pregnancy week and more than 40th week, the hybrid-LSTM model obtained the best accuracy and almost the least MRE compared with those of machine learning models. Not surprisingly, all the machine learning models performed better than the empirical formula. In the SGA, LGA and AGA group experiments, the average accuracy by the empirical formula, logistic regression (LR), BPNN, CNN, RF and Hybrid-LSTM were 0.780, 0.855, 0.890, 0.906, 0.916 and 0.933, respectively. Conclusions The results of this study are helpful for the birthweight prediction and development of guidelines for clinical delivery treatments. It is also useful for the implementation of a decision support system using the temporal machine learning prediction model, as it can assist the clinicians to make correct decisions during the obstetric examinations and remind pregnant women to manage their weight.


2021 ◽  
Vol 309 ◽  
pp. 01034
Author(s):  
G. Karuna ◽  
K. Pravallika ◽  
Karanam Madhavi ◽  
V. Srilakshmi ◽  
K. Swaraja ◽  
...  

Today we all are suffering from Covid-19, a novel virus and it is the most harmful disease across the world which mainly comes under the domain of health care research. Healthcare system gives importance to health states of the population or individual. Healthcare plays a vital role in promoting physical and mental health and well- being of people around the world. Efficient health care system leads to country’s economy, industrialization and development. Corona virus is dangerous animal and human pathogens and it is threatening people by spreading all over the world. Corona virus patients mostly suffer from lung infection studies have shown it clinically. We proposed detailed analysis on how to predict the expected death, recovered and confirmed cases based on the available data across the world using various machine learning models. Especially we constructed linear regression model (LRM), support vector machine model (SVMM) and polynomial regression models (PRM) and predicted future expected cases over a period of next 15 days. The error between the predicted model and official data curve is quite small in the process of transmission in data modeling. Compare to other models Polynomial regression model performs best prediction of corona positive cases. Forward prediction and backward inference of the epidemic helps to take decisions for necessary actions during Covid-19 propagation.


Sign in / Sign up

Export Citation Format

Share Document