scholarly journals What can machines learn about heart failure? A systematic literature review

Author(s):  
A. Jasinska-Piadlo ◽  
R. Bond ◽  
P. Biglarbeigi ◽  
R. Brisk ◽  
P. Campbell ◽  
...  

AbstractThis paper presents a systematic literature review with respect to application of data science and machine learning (ML) to heart failure (HF) datasets with the intention of generating both a synthesis of relevant findings and a critical evaluation of approaches, applicability and accuracy in order to inform future work within this field. This paper has a particular intention to consider ways in which the low uptake of ML techniques within clinical practice could be resolved. Literature searches were performed on Scopus (2014-2021), ProQuest and Ovid MEDLINE databases (2014-2021). Search terms included ‘heart failure’ or ‘cardiomyopathy’ and ‘machine learning’, ‘data analytics’, ‘data mining’ or ‘data science’. 81 out of 1688 articles were included in the review. The majority of studies were retrospective cohort studies. The median size of the patient cohort across all studies was 1944 (min 46, max 93260). The largest patient samples were used in readmission prediction models with the median sample size of 5676 (min. 380, max. 93260). Machine learning methods focused on common HF problems: detection of HF from available dataset, prediction of hospital readmission following index hospitalization, mortality prediction, classification and clustering of HF cohorts into subgroups with distinctive features and response to HF treatment. The most common ML methods used were logistic regression, decision trees, random forest and support vector machines. Information on validation of models was scarce. Based on the authors’ affiliations, there was a median 3:1 ratio between IT specialists and clinicians. Over half of studies were co-authored by a collaboration of medical and IT specialists. Approximately 25% of papers were authored solely by IT specialists who did not seek clinical input in data interpretation. The application of ML to datasets, in particular clustering methods, enabled the development of classification models assisting in testing the outcomes of patients with HF. There is, however, a tendency to over-claim the potential usefulness of ML models for clinical practice. The next body of work that is required for this research discipline is the design of randomised controlled trials (RCTs) with the use of ML in an intervention arm in order to prospectively validate these algorithms for real-world clinical utility.

PLoS ONE ◽  
2020 ◽  
Vol 15 (1) ◽  
pp. e0224135 ◽  
Author(s):  
Gian Luca Di Tanna ◽  
Heidi Wirtz ◽  
Karen L. Burrows ◽  
Gary Globe

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 5799-5799
Author(s):  
Benjamin Djulbegovic ◽  
Jennifer Berano Teh ◽  
Lennie Wong ◽  
Iztok Hozo ◽  
Saro H. Armenian

Background: Therapy-related heart failure (HF) is a leading cause of morbidity and mortality in patients who undergo successful HCT for hematological malignancies. To eventually help improve management of HF, timely and accurate diagnosis of HF is crucial. Currently, no established method for diagnosis of HF exist. One approach to help improve diagnosis and management of HF is to use predictive modeling to assess the likelihood of HF, using key predictors known to be associated with HF. Such models can, in turn, be used for bedside management, including implementation of early screening approaches. That said, many techniques for predictive modeling exist. Currently, it is not known if Artificial Intelligence machine learning (ML) approaches are superior to standard statistical techniques for the development of predictive models for clinical practice. Here we present a comparative analysis of traditional multivariable models with ML learning predictive modeling in an attempt to identify the best predictive model for diagnosis of HF after HCT. Methods: At City of Hope, we have established a large prospective cohort (>12,000 patients) of HCT survivors (HCT survivorship registry). This registry is dynamic, and interfaces with other registries and databases (e.g., electronically indexed inpatient and outpatient medical records, national death index [NDI]). We utilized natural language processing (NLP) to extract 13 key demographics and clinical data that are known to be associated with HF. For the purposes of this project, we extracted data from 1,834 patients (~15% sample) who underwent HCT between 1994 to 2004 to allow adequate follow-up for the development of HF. We fit and compared 6 models [standard logistic regression (glm), FFT (fast-and-frugal tree) decision model and four ML models: CART (classification and regression trees), SVM (support vector machine), NN (neural network) and RF (random forest)]. Data were randomly split (50:50) into training and validation samples; the ultimate assessment of the best algorithm was based on its performance (in terms of calibration and discrimination) in the validation sample. DeLong test was used to test for statistical differences in discrimination [i.e. area under the curve (AUC)] among the models. Results: The accuracy of NLP was consistently >95%. Only the standard logistic regression model LR (glm) resulted in a well-calibrated model (Hosmer-Lemeshow goodness of fitness test: p=0.104); all other models were miscalibrated. The standard glm model also had best discrimination properties (AUC=0.704 in training and 0.619 in validation set). CART performed the worst (AUC=0.5). Other ML models (RF, NN, and SVM) also showed modest discriminatory characteristics (AUCs of 0.547, 0.573, and 0.619, respectively). DeLong test indicated that all models outperformed CART model (at nominal p<0.05 ), but were statistically indistinguishable among each other (see Figure). Power of analysis was borderline sufficient for the glm model and very limited for ML models. Conclusions: None of tested models showed optimal performance characteristics for their use in clinical practice. ML models performed even worse than standard logistic models; given increasing use of ML models in medicine, we caution against the use of these models without adequate comparative testing. Figure Disclosures No relevant conflicts of interest to declare.


Author(s):  
Kerim Koc ◽  
Asli Pelin Gurgun

Despite significant improvements in safety management practices, the construction industry remains among the most unsafe industries. Thus, it is an essential need to reduce the number of construction accidents through prediction models. In this context, machine learning (ML) methods are extensively used in construction safety literature to predict several outcomes of construction accidents. This study provides a literature review in ML applications in construction safety literature to illustrate research directions for future research. Based on the literature review, 43 journal articles were deeply investigated, and distribution of the articles were classified based on six features: journal, year, adopted machine learning methods, model development approach, utilized dataset, and sub-topics. The findings show that the prediction models in construction safety have taken considerable attention recently. Besides, linear regression and logistic regression were used as a benchmark model, while support vector machine and decision tree were the most frequently implemented ML methods. The number of publications that considered classification problem is two times higher than those adopted regression models. Utilized data were mainly captured from national databases or construction companies. Severity evaluation of construction accidents was the most widely investigated sub-topic, while there is a gap in the literature related to effects of culture on accident outcome and conflict, claim and nonconformance. The findings of this study can provide valuable information for researchers with trends in construction safety literature.


PLoS ONE ◽  
2020 ◽  
Vol 15 (7) ◽  
pp. e0235970
Author(s):  
Gian Luca Di Tanna ◽  
Heidi Wirtz ◽  
Karen L. Burrows ◽  
Gary Globe

Author(s):  
Wen-Yu Ou Yang ◽  
Cheng-Chien Lai ◽  
Meng-Ting Tsou ◽  
Lee-Ching Hwang

Osteoporosis is treatable but often overlooked in clinical practice. We aimed to construct prediction models with machine learning algorithms to serve as screening tools for osteoporosis in adults over fifty years old. Additionally, we also compared the performance of newly developed models with traditional prediction models. Data were acquired from community-dwelling participants enrolled in health checkup programs at a medical center in Taiwan. A total of 3053 men and 2929 women were included. Models were constructed for men and women separately with artificial neural network (ANN), support vector machine (SVM), random forest (RF), k-nearest neighbor (KNN), and logistic regression (LoR) to predict the presence of osteoporosis. Area under receiver operating characteristic curve (AUROC) was used to compare the performance of the models. We achieved AUROC of 0.837, 0.840, 0.843, 0.821, 0.827 in men, and 0.781, 0.807, 0.811, 0.767, 0.772 in women, for ANN, SVM, RF, KNN, and LoR models, respectively. The ANN, SVM, RF, and LoR models in men, and the ANN, SVM, and RF models in women performed significantly better than the traditional Osteoporosis Self-Assessment Tool for Asians (OSTA) model. We have demonstrated that machine learning algorithms improve the performance of screening for osteoporosis. By incorporating the models in clinical practice, patients could potentially benefit from earlier diagnosis and treatment of osteoporosis.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2021 ◽  
Vol 21 (2) ◽  
pp. 1-31
Author(s):  
Bjarne Pfitzner ◽  
Nico Steckhan ◽  
Bert Arnrich

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.


2021 ◽  
pp. 097215092098485
Author(s):  
Sonika Gupta ◽  
Sushil Kumar Mehta

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


Sign in / Sign up

Export Citation Format

Share Document