Machine learning in predicting immediate and long-term outcomes of myocardial revascularization: a systematic review

Machine learning (ML) is among the main tools of artificial intelligence and are increasingly used in population and clinical cardiology to stratify cardiovascular risk. The systematic review presents an analysis of literature on using various ML methods (artificial neural networks, random forest, stochastic gradient boosting, support vector machines, etc.) to develop predictive models determining the immediate and long-term risk of adverse events after coronary artery bypass grafting and percutaneous coronary intervention. Most of the research on this issue is focused on creation of novel forecast models with a higher predictive value. It is emphasized that the improvement of modeling technologies and the development of clinical decision support systems is one of the most promising areas of digitalizing healthcare that are in demand in everyday professional activities.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text

Reliable photometric membership (RPM) of galaxies in clusters – I. A machine learning method and its performance in the local universe

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa486 ◽

2020 ◽

Vol 493 (3) ◽

pp. 3429-3441

Author(s):

Paulo A A Lopes ◽

André L B Ribeiro

Keyword(s):

Machine Learning ◽

Galaxy Evolution ◽

Large Scale ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Validation Data ◽

Membership Probability ◽

Cluster Membership ◽

Stochastic Gradient Boosting

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.

Download Full-text

Machine learning-based patient classification system for adults with stroke: A systematic review

Chronic Illness ◽

10.1177/17423953211067435 ◽

2021 ◽

pp. 174239532110674

Author(s):

Suebsarn Ruksakulpiwat ◽

Witchuda Thongking ◽

Wendie Zhou ◽

Chitchanok Benjasirisan ◽

Lalipat Phianhasin ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Classification System ◽

Nearest Neighbor ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Optimal Outcomes ◽

And Gender ◽

Meta Analyses

Objective To evaluate the existing evidence of a machine learning-based classification system that stratifies patients with stroke. Methods The authors carried out a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations for a review article. PubMed, MEDLINE, Web of Science, and CINAHL Plus Full Text were searched from January 2015 to February 2021. Results There are twelve studies included in this systematic review. Fifteen algorithms were used in the included studies. The most common forms of machine learning (ML) used to classify stroke patients were the support vector machine (SVM) (n = 8 studies), followed by random forest (RF) (n = 7 studies), decision tree (DT) (n = 4 studies), gradient boosting (GB) (n = 4 studies), neural networks (NNs) (n = 3 studies), deep learning (n = 2 studies), and k-nearest neighbor (k-NN) (n = 2 studies), respectively. Forty-four features of inputs were used in the included studies, and age and gender are the most common features in the ML model. Discussion There is no single algorithm that performed better or worse than all others at classifying patients with stroke, in part because different input data require different algorithms to achieve optimal outcomes.

Download Full-text

Prediction of Long-Term Stroke Recurrence Using Machine Learning Models

Journal of Clinical Medicine ◽

10.3390/jcm10061286 ◽

2021 ◽

Vol 10 (6) ◽

pp. 1286

Author(s):

Vida Abedi ◽

Venkatesh Avula ◽

Durgesh Chaudhary ◽

Shima Shahjouei ◽

Ayesha Khan ◽

...

Keyword(s):

Machine Learning ◽

Ischemic Stroke ◽

Performance Metrics ◽

Gradient Boosting ◽

Stroke Recurrence ◽

Support Vector ◽

Sampling Strategies ◽

Specificity And Sensitivity ◽

Extreme Gradient Boosting

Background: The long-term risk of recurrent ischemic stroke, estimated to be between 17% and 30%, cannot be reliably assessed at an individual level. Our goal was to study whether machine-learning can be trained to predict stroke recurrence and identify key clinical variables and assess whether performance metrics can be optimized. Methods: We used patient-level data from electronic health records, six interpretable algorithms (Logistic Regression, Extreme Gradient Boosting, Gradient Boosting Machine, Random Forest, Support Vector Machine, Decision Tree), four feature selection strategies, five prediction windows, and two sampling strategies to develop 288 models for up to 5-year stroke recurrence prediction. We further identified important clinical features and different optimization strategies. Results: We included 2091 ischemic stroke patients. Model area under the receiver operating characteristic (AUROC) curve was stable for prediction windows of 1, 2, 3, 4, and 5 years, with the highest score for the 1-year (0.79) and the lowest score for the 5-year prediction window (0.69). A total of 21 (7%) models reached an AUROC above 0.73 while 110 (38%) models reached an AUROC greater than 0.7. Among the 53 features analyzed, age, body mass index, and laboratory-based features (such as high-density lipoprotein, hemoglobin A1c, and creatinine) had the highest overall importance scores. The balance between specificity and sensitivity improved through sampling strategies. Conclusion: All of the selected six algorithms could be trained to predict the long-term stroke recurrence and laboratory-based variables were highly associated with stroke recurrence. The latter could be targeted for personalized interventions. Model performance metrics could be optimized, and models can be implemented in the same healthcare system as intelligent decision support for targeted intervention.

Download Full-text

Predict Health Insurance Cost by using Machine Learning and DNN Regression Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8364.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 137-143

Author(s):

Mohamed hanafy ◽

Omar M. A. Mahmoud

Keyword(s):

Machine Learning ◽

Insurance Industry ◽

Additive Model ◽

Policy Formulation ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Stochastic Gradient Boosting ◽

Insurance Cost

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.

Download Full-text

Machine learning in perioperative medicine: a systematic review

Journal of Anesthesia, Analgesia and Critical Care ◽

10.1186/s44158-022-00033-y ◽

2022 ◽

Vol 2 (1) ◽

Author(s):

Valentina Bellini ◽

Marina Valente ◽

Giorgia Bertorelli ◽

Barbara Pifferi ◽

Michelangelo Craca ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Health Care ◽

Random Forest ◽

Risk Stratification ◽

Prediction Models ◽

Cochrane Library ◽

Gradient Boosting ◽

Support Vector ◽

Systemic Complications

Abstract Background Risk stratification plays a central role in anesthetic evaluation. The use of Big Data and machine learning (ML) offers considerable advantages for collection and evaluation of large amounts of complex health-care data. We conducted a systematic review to understand the role of ML in the development of predictive post-surgical outcome models and risk stratification. Methods Following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, we selected the period of the research for studies from 1 January 2015 up to 30 March 2021. A systematic search in Scopus, CINAHL, the Cochrane Library, PubMed, and MeSH databases was performed; the strings of research included different combinations of keywords: “risk prediction,” “surgery,” “machine learning,” “intensive care unit (ICU),” and “anesthesia” “perioperative.” We identified 36 eligible studies. This study evaluates the quality of reporting of prediction models using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) checklist. Results The most considered outcomes were mortality risk, systemic complications (pulmonary, cardiovascular, acute kidney injury (AKI), etc.), ICU admission, anesthesiologic risk and prolonged length of hospital stay. Not all the study completely followed the TRIPOD checklist, but the quality was overall acceptable with 75% of studies (Rev #2, comm #minor issue) showing an adherence rate to TRIPOD more than 60%. The most frequently used algorithms were gradient boosting (n = 13), random forest (n = 10), logistic regression (LR; n = 7), artificial neural networks (ANNs; n = 6), and support vector machines (SVM; n = 6). Models with best performance were random forest and gradient boosting, with AUC > 0.90. Conclusions The application of ML in medicine appears to have a great potential. From our analysis, depending on the input features considered and on the specific prediction task, ML algorithms seem effective in outcomes prediction more accurately than validated prognostic scores and traditional statistics. Thus, our review encourages the healthcare domain and artificial intelligence (AI) developers to adopt an interdisciplinary and systemic approach to evaluate the overall impact of AI on perioperative risk assessment and on further health care settings as well.

Download Full-text

Predicting Safe Parking Spaces: A Machine Learning Approach to Geospatial Urban and Crime Data

Sustainability ◽

10.3390/su11102848 ◽

2019 ◽

Vol 11 (10) ◽

pp. 2848 ◽

Cited By ~ 1

Author(s):

Irina Matijosaitiene ◽

Anthony McDowald ◽

Vishal Juneja

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Linear Models ◽

Hot Spot ◽

Elastic Net ◽

Motor Vehicles ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

This research aims to identify spatial and time patterns of theft in Manhattan, NY, to reveal urban factors that contribute to thefts from motor vehicles and to build a prediction model for thefts. Methods include time series and hot spot analysis, linear regression, elastic-net, Support vector machines SVM with radial and linear kernels, decision tree, bagged CART, random forest, and stochastic gradient boosting. Machine learning methods reveal that linear models perform better on our data (linear regression, elastic-net), specifying that a higher number of subway entrances, graffiti, and restaurants on streets contribute to higher theft rates from motor vehicles. Although the prediction model for thefts meets almost all assumptions (five of six), its accuracy is 77%, suggesting that there are other undiscovered factors making a contribution to the generation of thefts. As an output demonstrating final results, the application prototype for searching safer parking in Manhattan, NY based on the prediction model, has been developed.

Download Full-text

Prediction of E. coli Concentrations in Agricultural Pond Waters: Application and Comparison of Machine Learning Algorithms

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.768650 ◽

2022 ◽

Vol 4 ◽

Author(s):

Matthew D. Stocker ◽

Yakov A. Pachepsky ◽

Robert L. Hill

Keyword(s):

Machine Learning ◽

Water Quality ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Water Quality Parameters ◽

Gradient Boosting ◽

Support Vector ◽

E Coli ◽

Stochastic Gradient Boosting ◽

Significant Difference

The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml−1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.

Download Full-text

Machine learning techniques for mortality prediction in emergency departments: a systematic review

BMJ Open ◽

10.1136/bmjopen-2021-052663 ◽

2021 ◽

Vol 11 (11) ◽

pp. e052663

Author(s):

Amin Naemi ◽

Thomas Schmidt ◽

Marjan Mansourvar ◽

Mohammad Naghavi-Behzad ◽

Ali Ebrahimi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Neural Networks ◽

Hospital Mortality ◽

Emergency Departments ◽

Missing Values ◽

Data Extraction ◽

Risk Of Bias ◽

Gradient Boosting ◽

Support Vector

ObjectivesThis systematic review aimed to assess the performance and clinical feasibility of machine learning (ML) algorithms in prediction of in-hospital mortality for medical patients using vital signs at emergency departments (EDs).DesignA systematic review was performed.SettingThe databases including Medline (PubMed), Scopus and Embase (Ovid) were searched between 2010 and 2021, to extract published articles in English, describing ML-based models utilising vital sign variables to predict in-hospital mortality for patients admitted at EDs. Critical appraisal and data extraction for systematic reviews of prediction modelling studies checklist was used for study planning and data extraction. The risk of bias for included papers was assessed using the prediction risk of bias assessment tool.ParticipantsAdmitted patients to the ED.Main outcome measureIn-hospital mortality.ResultsFifteen articles were included in the final review. We found that eight models including logistic regression, decision tree, K-nearest neighbours, support vector machine, gradient boosting, random forest, artificial neural networks and deep neural networks have been applied in this domain. Most studies failed to report essential main analysis steps such as data preprocessing and handling missing values. Fourteen included studies had a high risk of bias in the statistical analysis part, which could lead to poor performance in practice. Although the main aim of all studies was developing a predictive model for mortality, nine articles did not provide a time horizon for the prediction.ConclusionThis review provided an updated overview of the state-of-the-art and revealed research gaps; based on these, we provide eight recommendations for future studies to make the use of ML more feasible in practice. By following these recommendations, we expect to see more robust ML models applied in the future to help clinicians identify patient deterioration earlier.

Download Full-text