scholarly journals Machine learning models to predict myocardial infarctions from past climatic and environmental conditions

2022 ◽  
Author(s):  
Lennart Marien ◽  
Mahyar Valizadeh ◽  
Wolfgang zu Castell ◽  
Christine Nam ◽  
Diana Rechid ◽  
...  

Abstract. Myocardial infarctions (MI) are a major cause of death worldwide, and temperature extremes, e.g., during heat waves and cold winters, may increase the risk of MI. The relationship between health impacts and climate is complex and is influenced by a multitude of climatic, environmental, socio-demographic, and behavioral factors. Here, we present a Machine Learning (ML) approach for predicting MI events based on multiple environmental and demographic variables. We derived data on MI events from the KORA MI registry dataset for Augsburg, Germany between 1998 and 2015. Multivariable predictors include weather and climate, air pollution (PM10, NO, NO2, SO2, and O3), surrounding vegetation, as well as demographic data. We tested the following ML regression algorithms: Decision Tree, Random Forest, Multi-layer Perceptron, Gradient Boosting and Ridge Regression. The models are able to predict the total annual number of MI reasonably well (adjusted R2 = 0.59 − 0.71). Inter-annual variations and long-term trends are captured. Across models the most important predictors are air pollution and daily temperatures. Variables not related to environmental conditions, such as demographics need to be considered as well. This ML approach provides a promising basis to model future MI under changing environmental conditions, as projected by scenarios for climate and other environmental changes.

2022 ◽  
Vol 17 (1) ◽  
pp. 165-198
Author(s):  
Kamil Matuszelański ◽  
Katarzyna Kopczewska

This study is a comprehensive and modern approach to predict customer churn in the example of an e-commerce retail store operating in Brazil. Our approach consists of three stages in which we combine and use three different datasets: numerical data on orders, textual after-purchase reviews and socio-geo-demographic data from the census. At the pre-processing stage, we find topics from text reviews using Latent Dirichlet Allocation, Dirichlet Multinomial Mixture and Gibbs sampling. In the spatial analysis, we apply DBSCAN to get rural/urban locations and analyse neighbourhoods of customers located with zip codes. At the modelling stage, we apply machine learning extreme gradient boosting and logistic regression. The quality of models is verified with area-under-curve and lift metrics. Explainable artificial intelligence represented with a permutation-based variable importance and a partial dependence profile help to discover the determinants of churn. We show that customers’ propensity to churn depends on: (i) payment value for the first order, number of items bought and shipping cost; (ii) categories of the products bought; (iii) demographic environment of the customer; and (iv) customer location. At the same time, customers’ propensity to churn is not influenced by: (i) population density in the customer’s area and division into rural and urban areas; (ii) quantitative review of the first purchase; and (iii) qualitative review summarised as a topic.


2020 ◽  
Author(s):  
Juan David Gutiérrez

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-16
Author(s):  
Bo Liu ◽  
Xi He ◽  
Mingdong Song ◽  
Jiangqiang Li ◽  
Guangzhi Qu ◽  
...  

Atmospheric visibility is an indicator of atmospheric transparency and its range directly reflects the quality of the atmospheric environment. With the acceleration of industrialization and urbanization, the natural environment has suffered some damages. In recent decades, the level of atmospheric visibility shows an overall downward trend. A decrease in atmospheric visibility will lead to a higher frequency of haze, which will seriously affect people's normal life, and also have a significant negative economic impact. The causal relationship mining of atmospheric visibility can reveal the potential relation between visibility and other influencing factors, which is very important in environmental management, air pollution control and haze control. However, causality mining based on statistical methods and traditional machine learning techniques usually achieve qualitative results that are hard to measure the degree of causality accurately. This article proposed the seq2seq-LSTM Granger causality analysis method for mining the causality relationship between atmospheric visibility and its influencing factors. In the experimental part, by comparing with methods such as linear regression, random forest, gradient boosting decision tree, light gradient boosting machine, and extreme gradient boosting, it turns out that the visibility prediction accuracy based on the seq2seq-LSTM model is about 10% higher than traditional machine learning methods. Therefore, the causal relationship mining based on this method can deeply reveal the implicit relationship between them and provide theoretical support for air pollution control.


The Magnetic Resonance Imaging (MRI) data, which are a prevalent source of insight in understanding the inner functioning of the human body is one of the most preliminarymechanisms in the analysis of the human brain, including and not limited to detecting the presence of dementia. In this article, 7 machine learning models are proposed in the analysis and detection of dementiain the subjects ofOpen Access Series of Imaging Studies(OASIS) Brains 1, using OASIS 2 MRI and demographic data. The article also compares the performances of the machine learning models in terms of accuracy and prediction duration. The proposed model, eXtreme Gradient Boosting (XGB) algorithm performs with the highest accuracy of 97.87% and the fastest prediction durationof 0.031s/sample.


2021 ◽  
Author(s):  
Lennart Marien ◽  
Mahyar Valizadeh ◽  
Wolfgang zu Castell ◽  
Alexandra Schneider ◽  
Kathrin Wolf ◽  
...  

<p>Myocardial infarctions (MI) are a major cause of death worldwide. In addition to well-known individual risk factors, studies have shown that temperature extremes, such as encountered during heat waves, lead to increases in MI. The relationship between health impacts and climate is complex, depending on a multitude of climatic, environmental, sociodemographic and behavioral factors. Machine Learning (ML) is a powerful tool for investigating complex and unknown relationships between extreme environmental conditions and their adverse impacts that has already been applied to other climate extremes, such as in the prediction of flood damages. By combining heterogeneous health, climatic, environmental and socio-economic datasets, this study is a first step in developing an ML model for predicting past and future MI risk due to heat waves.</p><p>Here, we present first results of our ML approach for modelling heat-related health effects in Augsburg based on the KORA MI and environmental data. The basis of our data-driven approach is the KORA cohort study and the MI Registry in the Augsburg region of Bavaria, Germany, comprising detailed information on MI and underlying health conditions. Additionally, weather and climate data, air pollution data (e.g., PM<sub>10</sub>, PM<sub>2.5</sub>, nitrous oxides, and ozone), as well as socio-economic data (household income, education) are used for this study. One of the key challenges is to assemble and integrate heterogeneous data from various sources and prepare them for the appropriate spatial scales. We outline major challenges in combining these data and deriving quantitative models from them.</p><p>Moreover, we present initial results based on both regression and classification models, discussing model performance for the period between 2000 and 2015, with a focus on two major heat wave events in Germany during 2003 and 2006. Ultimately, this research may be useful in better understanding heat-related MI risks, supporting possible adaptation options in urban areas and in identifying high-risk groups within society.</p>


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2019 ◽  
Vol 28 (1) ◽  
pp. 349-354 ◽  
Author(s):  
Ahmed Samy Abd El Aziz Moursi ◽  
Marwa Shouman ◽  
Ezz El-din Hemdan ◽  
Nawal El-Fishawy

2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Sign in / Sign up

Export Citation Format

Share Document