Machine Learning Approaches to Predict Peak Demand Days of Cardiovascular Admissions Considering Environmental Exposure

Mapping Intimacies ◽

10.21203/rs.2.19636/v3 ◽

2020 ◽

Author(s):

Hang Qiu ◽

Lin Luo ◽

Ziqi Su ◽

Li Zhou ◽

Liya Wang ◽

...

Keyword(s):

Machine Learning ◽

Environmental Exposure ◽

Loss Function ◽

Ambient Air ◽

Quality Data ◽

Gradient Boosting ◽

Learning Approaches ◽

Learning Models ◽

Peak Demand ◽

Logarithmic Loss

Abstract Background: Accumulating evidence has linked environmental exposure, such as ambient air pollution and meteorological factors, to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017.Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances of the six models.Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively.Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision-making tool for medical resource management.

Download Full-text

Machine Learning Approaches to Predict Peak Demand Days of Cardiovascular Admissions Considering Environmental Exposure

10.21203/rs.2.19636/v2 ◽

2020 ◽

Author(s):

Hang Qiu ◽

Lin Luo ◽

Ziqi Su ◽

Li Zhou ◽

Liya Wang ◽

...

Keyword(s):

Machine Learning ◽

Loss Function ◽

Ambient Air ◽

Quality Data ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Peak Demand ◽

Logarithmic Loss

Abstract Background: Accumulating evidence has linked environmental exposures, such as ambient air pollution and meteorological factors to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017. Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances between the six models. Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively. Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision making tool for medical resource management.

Download Full-text

Machine Learning Approaches to Predict Peak Demand Days of Cardiovascular Admissions Considering Environmental Exposure

10.21203/rs.2.19636/v1 ◽

2019 ◽

Author(s):

Hang Qiu ◽

Lin Luo ◽

Ziqi Su ◽

Li Zhou ◽

Liya Wang ◽

...

Keyword(s):

Machine Learning ◽

Loss Function ◽

Ambient Air ◽

Quality Data ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Peak Demand ◽

Logarithmic Loss

Abstract Background Accumulating evidence has linked environmental exposures, such as ambient air pollution and meteorological factors to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of situations of demand for healthcare services particularly those associated with peak events of CVDs can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop machine learning models to predict the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017.Methods Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and artificial neural network (ANN), were applied to build the predictive models. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity and F1 score were used to evaluate the predictive performances among the six models.Results The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%) and F1 score (0.725). Feature importance identification based on LightGBM indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively.Conclusion This study suggests that ensemble learning models especially the LightGBM model can be used to effectively predict the peak events of CVDs, which provide decision making for medical resource management.

Download Full-text

Machine learning approaches to predict peak demand days of cardiovascular admissions considering environmental exposure

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-1101-8 ◽

2020 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Hang Qiu ◽

Lin Luo ◽

Ziqi Su ◽

Li Zhou ◽

Liya Wang ◽

...

Keyword(s):

Machine Learning ◽

Environmental Exposure ◽

Learning Approaches ◽

Peak Demand

Download Full-text

Performance of Statistical and Machine Learning-Based Methods for Predicting Biogeographical Patterns of Fungal Productivity in Forest Ecosystems

10.21203/rs.3.rs-122045/v1 ◽

2020 ◽

Author(s):

Albert Morera ◽

Juan Martínez de Aragón ◽

José Antonio Bonet ◽

Jingjing Liang ◽

Sergio de-Miguel

Keyword(s):

Machine Learning ◽

Random Forest ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Modelling Approaches

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.

Download Full-text

Machine learning approaches to understand and predict rate constants for organic processes in mixtures containing ionic liquids

Physical Chemistry Chemical Physics ◽

10.1039/d0cp04227g ◽

2021 ◽

Vol 23 (4) ◽

pp. 2742-2752

Author(s):

Tamar L. Greaves ◽

Karin S. Schaffarczyk McHale ◽

Raphael F. Burkart-Radke ◽

Jason B. Harper ◽

Tu C. Le

Keyword(s):

Machine Learning ◽

Ionic Liquids ◽

Rate Constants ◽

Learning Approaches ◽

Learning Models ◽

Organic Reaction ◽

Machine Learning Models ◽

Selection Of

Machine learning models were developed for an organic reaction in ionic liquids and validated on a selection of ionic liquids.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Meta-analysis cum machine learning approaches address the structure and biogeochemical potential of marine copepod associated bacteriobiomes

Scientific Reports ◽

10.1038/s41598-021-82482-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Balamurugan Sadaiappan ◽

Chinnamani PrasannaKumar ◽

V. Uthara Nambiar ◽

Mahendran Subramanian ◽

Manguesh U. Gauns

Keyword(s):

Machine Learning ◽

Iron Transport ◽

Meta Analysis ◽

Zooplankton Community ◽

Gradient Boosting ◽

Learning Approaches ◽

16S Rdna Gene ◽

Marine Copepod ◽

Form Of Life ◽

First Time

AbstractCopepods are the dominant members of the zooplankton community and the most abundant form of life. It is imperative to obtain insights into the copepod-associated bacteriobiomes (CAB) in order to identify specific bacterial taxa associated within a copepod, and to understand how they vary between different copepods. Analysing the potential genes within the CAB may reveal their intrinsic role in biogeochemical cycles. For this, machine-learning models and PICRUSt2 analysis were deployed to analyse 16S rDNA gene sequences (approximately 16 million reads) of CAB belonging to five different copepod genera viz., Acartia spp., Calanus spp., Centropages sp., Pleuromamma spp., and Temora spp.. Overall, we predict 50 sub-OTUs (s-OTUs) (gradient boosting classifiers) to be important in five copepod genera. Among these, 15 s-OTUs were predicted to be important in Calanus spp. and 20 s-OTUs as important in Pleuromamma spp.. Four bacterial s-OTUs Acinetobacter johnsonii, Phaeobacter, Vibrio shilonii and Piscirickettsiaceae were identified as important s-OTUs in Calanus spp., and the s-OTUs Marinobacter, Alteromonas, Desulfovibrio, Limnobacter, Sphingomonas, Methyloversatilis, Enhydrobacter and Coriobacteriaceae were predicted as important s-OTUs in Pleuromamma spp., for the first time. Our meta-analysis revealed that the CAB of Pleuromamma spp. had a high proportion of potential genes responsible for methanogenesis and nitrogen fixation, whereas the CAB of Temora spp. had a high proportion of potential genes involved in assimilatory sulphate reduction, and cyanocobalamin synthesis. The CAB of Pleuromamma spp. and Temora spp. have potential genes accountable for iron transport.

Download Full-text

PlotMI: visualization of pairwise interactions and positional preferences learned by a deep learning model from sequence data

10.1101/2021.03.14.435285 ◽

2021 ◽

Author(s):

Tuomo Hartonen ◽

Teemu Kivioja ◽

Jussi Taipale

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sequence Data ◽

Predictive Performance ◽

Learning Model ◽

Biological Research ◽

Learning Approaches ◽

Learning Models ◽

Model Interpretation ◽

Pairwise Interactions

Deep learning models have in recent years gained success in various tasks related to understanding information coded in the DNA sequence. Rapidly developing genome-wide measurement technologies provide large quantities of data ideally suited for modeling using deep learning or other powerful machine learning approaches. Although offering state-of-the art predictive performance, the predictions made by deep learning models can be difficult to understand. In virtually all biological research, the understanding of how a predictive model works is as important as the raw predictive performance. Thus interpretation of deep learning models is an emerging hot topic especially in context of biological research. Here we describe plotMI, a mutual information based model interpretation strategy that can intuitively visualize positional preferences and pairwise interactions learned by any machine learning model trained on sequence data with a defined alphabet as input. PlotMI is freely available at https://github.com/hartonen/plotMI.

Download Full-text

Predicting Electric Vehicle Charging Station Availability Using Ensemble Machine Learning

Energies ◽

10.3390/en14237834 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7834

Author(s):

Christopher Hecht ◽

Jan Figgener ◽

Dirk Uwe Sauer

Keyword(s):

Machine Learning ◽

Binary Data ◽

Training Data ◽

Gradient Boosting ◽

Traffic Density ◽

Learning Models ◽

Charging Infrastructure ◽

Ensemble Models ◽

Charging Station ◽

Machine Learning Models

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.

Download Full-text

Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors

Remote Sensing ◽

10.3390/rs14010229 ◽

2022 ◽

Vol 14 (1) ◽

pp. 229

Author(s):

Jiarui Shi ◽

Qian Shen ◽

Yue Yao ◽

Junsheng Li ◽

Fu Chen ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll A ◽

Water Bodies ◽

Gradient Boosting ◽

Learning Models ◽

Small Water ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Sentinel 2 ◽

Small Water Bodies

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.

Download Full-text