scholarly journals Predicting Hospital Readmission of Diabetic Patients Using Machine Learning

2021 ◽  
Vol 10 (1) ◽  
pp. 74
Author(s):  
Boshra Farajollahi ◽  
Maysam Mehmannavaz ◽  
Hafez Mehrjoo ◽  
Fateme Moghbeli ◽  
Mohammad Javad Sayadi

Introduction: Diabetes is a chronic disease associated with abnormal high levels of glucose in the blood. Diabetes make many kinds of complications, which also leads to a high rate of repeated admission of patients with diabetes. The goal of this study is to Predict hospital readmission of Diabetic patients with machine learning techniques.Material and Methods: The data used in the study are data obtained from the UCI Machine Learning Repository about diabetic patients. The dataset used contains 100,000 instances and it include 55 features from 130 hospitals in the United States for 10 years.Results: This article gets results from the final stages of evaluation. In this evaluation process, compared the performance of Decision tree, Random forest, Xgboost, k-Neighbors, adaboost and deep neural network with accuracy.Conclusion: The number of selected features by PCA-based feature selection method improve the predictive performance based on accuracy of deep learning and most machine learning models for predicting readmission. The improvement of machine learning models depended on the specific choice of the prediction model, number of selected features, and “k” for k-fold validation.

2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Khalid Twarish Alhamazani ◽  
Jalawi Alshudukhi ◽  
Saud Aljaloud ◽  
Solomon Abebaw

Chronic kidney disease (CKD) is a global health issue with a high rate of morbidity and mortality and a high rate of disease progression. Because there are no visible symptoms in the early stages of CKD, patients frequently go unnoticed. The early detection of CKD allows patients to receive timely treatment, slowing the disease’s progression. Due to its rapid recognition performance and accuracy, machine learning models can effectively assist physicians in achieving this goal. We propose a machine learning methodology for the CKD diagnosis in this paper. This information was completely anonymized. As a reference, the CRISP-DM® model (Cross industry standard process for data mining) was used. The data were processed in its entirety in the cloud on the Azure platform, where the sample data was unbalanced. Then the processes for exploration and analysis were carried out. According to what we have learned, the data were balanced using the SMOTE technique. Four matching algorithms were used after the data balancing was completed successfully. Artificial intelligence (AI) (logistic regression, decision forest, neural network, and jungle of decisions). The decision forest outperformed the other machine learning models with a score of 92%, indicating that the approach used in this study provides a good baseline for solutions in the production.


2021 ◽  
Vol 13 (18) ◽  
pp. 3790
Author(s):  
Khang Chau ◽  
Meredith Franklin ◽  
Huikyo Lee ◽  
Michael Garay ◽  
Olga Kalashnikova

Exposure to fine particulate matter (PM2.5) air pollution has been shown in numerous studies to be associated with detrimental health effects. However, the ability to conduct epidemiological assessments can be limited due to challenges in generating reliable PM2.5 estimates, particularly in parts of the world such as the Middle East where measurements are scarce and extreme meteorological events such as sandstorms are frequent. In order to supplement exposure modeling efforts under such conditions, satellite-retrieved aerosol optical depth (AOD) has proven to be useful due to its global coverage. By using AODs from the Multiangle Implementation of Atmospheric Correction (MAIAC) of the MODerate Resolution Imaging Spectroradiometer (MODIS) and the Multiangle Imaging Spectroradiometer (MISR) combined with meteorological and assimilated aerosol information from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), we constructed machine learning models to predict PM2.5 in the area surrounding the Persian Gulf, including Kuwait, Bahrain, and the United Arab Emirates (U.A.E). Our models showed regional differences in predictive performance, with better results in the U.A.E. (median test R2 = 0.66) than Kuwait (median test R2 = 0.51). Variable importance also differed by region, where satellite-retrieved AOD variables were more important for predicting PM2.5 in Kuwait than in the U.A.E. Divergent trends in the temporal and spatial autocorrelations of PM2.5 and AOD in the two regions offered possible explanations for differences in predictive performance and variable importance. In a test of model transferability, we found that models trained in one region and applied to another did not predict PM2.5 well, even if the transferred model had better performance. Overall the results of our study suggest that models developed over large geographic areas could generate PM2.5 estimates with greater uncertainty than could be obtained by taking a regional modeling approach. Furthermore, development of methods to better incorporate spatial and temporal autocorrelations in machine learning models warrants further examination.


Water ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 2516 ◽  
Author(s):  
Changhyun Choi ◽  
Jeonghwan Kim ◽  
Jungwook Kim ◽  
Hung Soo Kim

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.


2021 ◽  
Vol 23 (08) ◽  
pp. 148-160
Author(s):  
Dr. V.Vasudha Rani ◽  
◽  
Dr. G. Vasavi ◽  
Dr. K.R.N Kiran Kumar ◽  
◽  
...  

Diabetes is one of the chronicdiseases in the world. Millions of people are suffering with several other health issues caused by diabetes, every year. Diabetes has got three stages such as type2, type1 and insulin. Curing of diabetes disease at later stages is practically difficult. Here in this paper, we proposed a DNN model and its performance comparison with some of the machine learning models to predict the disease at an earlystage based on the current health condition of the patient. An artificial neural network (ANN) is a predictive model designed to work the same way a human brain does and works better with larger datasets. Having the concept of hidden layers, neural networks work better at predictive analytics and can make predictions with more accuracy. Novelty of this work lies in integration of feature selection method used to optimize the Multilayer Perceptron (MLP) to reduce the number of required input attributes. The results achieved using this method and several conventional machines learning approaches such as Logistic Regression, Random Forest Classifier (RFC) are compared. The proposed DNN method is proved to show better accuracy than Machine learning models for early stage detection of diabetes. This paper work is applicable to clinical support as a tool for making predecisions by the doctors and physicians.


2021 ◽  
Author(s):  
Chang H Kim ◽  
Sadeer Al-Kindi ◽  
Yasir Tarabichi ◽  
Suril Gohel ◽  
Riddhi Vyas ◽  
...  

Background: The value of the electrocardiogram (ECG) for predicting long-term cardiovascular outcomes is not well defined. Machine learning methods are well suited for analysis of highly correlated data such as that from the ECG. Methods: Using demographic, clinical, and 12-lead ECG data from the Third National Health and Nutrition Examination Survey (NHANES III), machine learning models were trained to predict 10-year cardiovascular mortality in ambulatory U.S. adults. Predictive performance of each model was assessed using area under receiver operating characteristic curve (AUROC), area under precision-recall curve (AUPRC), sensitivity, and specificity. These were compared to the 2013 American College of Cardiology/American Heart Association Pooled Cohort Equations (PCE). Results: 7,067 study participants (mean age: 59.2 +/- 13.4 years, female: 52.5%, white: 73.9%, black: 23.3%) were included. At 10 years of follow up, 338 (4.8%) had died from cardiac causes. Compared to the PCE (AUROC: 0.668, AUPRC: 0.125, sensitivity: 0.492, specificity: 0.859), machine learning models only required demographic and ECG data to achieve comparable performance: logistic regression (AUROC: 0.754, AUPRC: 0.141, sensitivity: 0.747, specificity: 0.759), neural network (AUROC: 0.764, AUPRC: 0.149, sensitivity: 0.722, specificity: 0.787), and ensemble model (AUROC: 0.695, AUPRC: 0.166, sensitivity: 0.468, specificity: 0.912). Additional clinical data did not improve the predictive performance of machine learning models. In variable importance analysis, important ECG features clustered in inferior and lateral leads. Conclusions: Machine learning can be applied to demographic and ECG data to predict 10-year cardiovascular mortality in ambulatory adults, with potentially important implications for primary prevention.


2021 ◽  
Vol 12 ◽  
Author(s):  
Brandon N. S. Ooi ◽  
Raechell ◽  
Ariel F. Ying ◽  
Yong Zher Koh ◽  
Yu Jin ◽  
...  

Background:Statins can cause muscle symptoms resulting in poor adherence to therapy and increased cardiovascular risk. We hypothesize that combinations of potentially functional SNPs (pfSNPs), rather than individual SNPs, better predict myalgia in patients on atorvastatin. This study assesses the value of potentially functional single nucleotide polymorphisms (pfSNPs) and employs six machine learning algorithms to identify the combination of SNPs that best predict myalgia.Methods: Whole genome sequencing of 183 Chinese, Malay and Indian patients from Singapore was conducted to identify genetic variants associated with atorvastatin induced myalgia. To adjust for confounding factors, demographic and clinical characteristics were also examined for their association with myalgia. The top factor, sex, was then used as a covariate in the whole genome association analyses. Variants that were highly associated with myalgia from this and previous studies were extracted, assessed for potential functionality (pfSNPs) and incorporated into six machine learning models. Predictive performance of a combination of different models and inputs were compared using the average cross validation area under ROC curve (AUC). The minimum combination of SNPs to achieve maximum sensitivity and specificity as determined by AUC, that predict atorvastatin-induced myalgia in most, if not all the six machine learning models was determined.Results: Through whole genome association analyses using sex as a covariate, a larger proportion of pfSNPs compared to non-pf SNPs were found to be highly associated with myalgia. Although none of the individual SNPs achieved genome wide significance in univariate analyses, machine learning models identified a combination of 15 SNPs that predict myalgia with good predictive performance (AUC >0.9). SNPs within genes identified in this study significantly outperformed SNPs within genes previously reported to be associated with myalgia. pfSNPs were found to be more robust in predicting myalgia, outperforming non-pf SNPs in the majority of machine learning models tested.Conclusion: Combinations of pfSNPs that were consistently identified by different machine learning models to have high predictive performance have good potential to be clinically useful for predicting atorvastatin-induced myalgia once validated against an independent cohort of patients.


2021 ◽  
Author(s):  
Lukasz S Wylezinski ◽  
Coleman R Harris ◽  
Cody N Heiser ◽  
Jamieson D Gray ◽  
Charles F Spurlock

The SARS-CoV-2 (COVID-19) pandemic has exposed health disparities throughout the United States, particularly among racial and ethnic minorities. As a result, there is a need for data-driven approaches to pinpoint the unique constellation of clinical and social determinants of health (SDOH) risk factors that give rise to poor patient outcomes following infection in US communities. We combined county-level COVID-19 testing data, COVID-19 vaccination rates, and SDOH information in Tennessee. Between February-May 2021, we trained machine learning models on a semi-monthly basis using these datasets to predict COVID-19 incidence in Tennessee counties. We then analyzed SDOH data features at each time point to rank the impact of each feature on model performance. Our results indicate that COVID-19 vaccination rates play a crucial role in determining future COVID-19 disease risk. Beginning in mid-March 2021, higher vaccination rates significantly correlated with lower COVID-19 case growth predictions. Further, as the relative importance of COVID-19 vaccination data features grew, demographic SDOH features such as age, race, and ethnicity decreased while the impact of socioeconomic and environmental factors, including access to healthcare and transportation, increased. Incorporating a data framework to track the evolving patterns of community-level SDOH risk factors could provide policymakers with additional data resources to improve health equity and resilience to future public health emergencies.


Sign in / Sign up

Export Citation Format

Share Document