scholarly journals Diagnostic Predictive Model for Diagnosis of Heart Failure after Hematopoietic Cell Transplantation (HCT): Comparison of Traditional Statistical with Machine Learning Modeling

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 5799-5799
Author(s):  
Benjamin Djulbegovic ◽  
Jennifer Berano Teh ◽  
Lennie Wong ◽  
Iztok Hozo ◽  
Saro H. Armenian

Background: Therapy-related heart failure (HF) is a leading cause of morbidity and mortality in patients who undergo successful HCT for hematological malignancies. To eventually help improve management of HF, timely and accurate diagnosis of HF is crucial. Currently, no established method for diagnosis of HF exist. One approach to help improve diagnosis and management of HF is to use predictive modeling to assess the likelihood of HF, using key predictors known to be associated with HF. Such models can, in turn, be used for bedside management, including implementation of early screening approaches. That said, many techniques for predictive modeling exist. Currently, it is not known if Artificial Intelligence machine learning (ML) approaches are superior to standard statistical techniques for the development of predictive models for clinical practice. Here we present a comparative analysis of traditional multivariable models with ML learning predictive modeling in an attempt to identify the best predictive model for diagnosis of HF after HCT. Methods: At City of Hope, we have established a large prospective cohort (>12,000 patients) of HCT survivors (HCT survivorship registry). This registry is dynamic, and interfaces with other registries and databases (e.g., electronically indexed inpatient and outpatient medical records, national death index [NDI]). We utilized natural language processing (NLP) to extract 13 key demographics and clinical data that are known to be associated with HF. For the purposes of this project, we extracted data from 1,834 patients (~15% sample) who underwent HCT between 1994 to 2004 to allow adequate follow-up for the development of HF. We fit and compared 6 models [standard logistic regression (glm), FFT (fast-and-frugal tree) decision model and four ML models: CART (classification and regression trees), SVM (support vector machine), NN (neural network) and RF (random forest)]. Data were randomly split (50:50) into training and validation samples; the ultimate assessment of the best algorithm was based on its performance (in terms of calibration and discrimination) in the validation sample. DeLong test was used to test for statistical differences in discrimination [i.e. area under the curve (AUC)] among the models. Results: The accuracy of NLP was consistently >95%. Only the standard logistic regression model LR (glm) resulted in a well-calibrated model (Hosmer-Lemeshow goodness of fitness test: p=0.104); all other models were miscalibrated. The standard glm model also had best discrimination properties (AUC=0.704 in training and 0.619 in validation set). CART performed the worst (AUC=0.5). Other ML models (RF, NN, and SVM) also showed modest discriminatory characteristics (AUCs of 0.547, 0.573, and 0.619, respectively). DeLong test indicated that all models outperformed CART model (at nominal p<0.05 ), but were statistically indistinguishable among each other (see Figure). Power of analysis was borderline sufficient for the glm model and very limited for ML models. Conclusions: None of tested models showed optimal performance characteristics for their use in clinical practice. ML models performed even worse than standard logistic models; given increasing use of ML models in medicine, we caution against the use of these models without adequate comparative testing. Figure Disclosures No relevant conflicts of interest to declare.

2021 ◽  
Vol 11 (19) ◽  
pp. 9292
Author(s):  
Noman Islam ◽  
Asadullah Shaikh ◽  
Asma Qaiser ◽  
Yousef Asiri ◽  
Sultan Almakdi ◽  
...  

In recent years, the consumption of social media content to keep up with global news and to verify its authenticity has become a considerable challenge. Social media enables us to easily access news anywhere, anytime, but it also gives rise to the spread of fake news, thereby delivering false information. This also has a negative impact on society. Therefore, it is necessary to determine whether or not news spreading over social media is real. This will allow for confusion among social media users to be avoided, and it is important in ensuring positive social development. This paper proposes a novel solution by detecting the authenticity of news through natural language processing techniques. Specifically, this paper proposes a novel scheme comprising three steps, namely, stance detection, author credibility verification, and machine learning-based classification, to verify the authenticity of news. In the last stage of the proposed pipeline, several machine learning techniques are applied, such as decision trees, random forest, logistic regression, and support vector machine (SVM) algorithms. For this study, the fake news dataset was taken from Kaggle. The experimental results show an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15% for the support vector machine algorithm. The SVM is better than the second best classifier, i.e., logistic regression, by 6.82%.


Author(s):  
A. Jasinska-Piadlo ◽  
R. Bond ◽  
P. Biglarbeigi ◽  
R. Brisk ◽  
P. Campbell ◽  
...  

AbstractThis paper presents a systematic literature review with respect to application of data science and machine learning (ML) to heart failure (HF) datasets with the intention of generating both a synthesis of relevant findings and a critical evaluation of approaches, applicability and accuracy in order to inform future work within this field. This paper has a particular intention to consider ways in which the low uptake of ML techniques within clinical practice could be resolved. Literature searches were performed on Scopus (2014-2021), ProQuest and Ovid MEDLINE databases (2014-2021). Search terms included ‘heart failure’ or ‘cardiomyopathy’ and ‘machine learning’, ‘data analytics’, ‘data mining’ or ‘data science’. 81 out of 1688 articles were included in the review. The majority of studies were retrospective cohort studies. The median size of the patient cohort across all studies was 1944 (min 46, max 93260). The largest patient samples were used in readmission prediction models with the median sample size of 5676 (min. 380, max. 93260). Machine learning methods focused on common HF problems: detection of HF from available dataset, prediction of hospital readmission following index hospitalization, mortality prediction, classification and clustering of HF cohorts into subgroups with distinctive features and response to HF treatment. The most common ML methods used were logistic regression, decision trees, random forest and support vector machines. Information on validation of models was scarce. Based on the authors’ affiliations, there was a median 3:1 ratio between IT specialists and clinicians. Over half of studies were co-authored by a collaboration of medical and IT specialists. Approximately 25% of papers were authored solely by IT specialists who did not seek clinical input in data interpretation. The application of ML to datasets, in particular clustering methods, enabled the development of classification models assisting in testing the outcomes of patients with HF. There is, however, a tendency to over-claim the potential usefulness of ML models for clinical practice. The next body of work that is required for this research discipline is the design of randomised controlled trials (RCTs) with the use of ML in an intervention arm in order to prospectively validate these algorithms for real-world clinical utility.


2020 ◽  
pp. 016555152093091
Author(s):  
Saeed-Ul Hassan ◽  
Aneela Saleem ◽  
Saira Hanif Soroya ◽  
Iqra Safder ◽  
Sehrish Iqbal ◽  
...  

The purpose of the study is to (a) contribute to annotating an Altmetrics dataset across five disciplines, (b) undertake sentiment analysis using various machine learning and natural language processing–based algorithms, (c) identify the best-performing model and (d) provide a Python library for sentiment analysis of an Altmetrics dataset. First, the researchers gave a set of guidelines to two human annotators familiar with the task of related tweet annotation of scientific literature. They duly labelled the sentiments, achieving an inter-annotator agreement (IAA) of 0.80 (Cohen’s Kappa). Then, the same experiments were run on two versions of the dataset: one with tweets in English and the other with tweets in 23 languages, including English. Using 6388 tweets about 300 papers indexed in Web of Science, the effectiveness of employed machine learning and natural language processing models was measured by comparing with well-known sentiment analysis models, that is, SentiStrength and Sentiment140, as the baseline. It was proved that Support Vector Machine with uni-gram outperformed all the other classifiers and baseline methods employed, with an accuracy of over 85%, followed by Logistic Regression at 83% accuracy and Naïve Bayes at 80%. The precision, recall and F1 scores for Support Vector Machine, Logistic Regression and Naïve Bayes were (0.89, 0.86, 0.86), (0.86, 0.83, 0.80) and (0.85, 0.81, 0.76), respectively.


2021 ◽  
Vol 11 (12) ◽  
pp. 5727
Author(s):  
Sifat Muin ◽  
Khalid M. Mosalam

Machine learning (ML)-aided structural health monitoring (SHM) can rapidly evaluate the safety and integrity of the aging infrastructure following an earthquake. The conventional damage features used in ML-based SHM methodologies face the curse of dimensionality. This paper introduces low dimensional, namely, cumulative absolute velocity (CAV)-based features, to enable the use of ML for rapid damage assessment. A computer experiment is performed to identify the appropriate features and the ML algorithm using data from a simulated single-degree-of-freedom system. A comparative analysis of five ML models (logistic regression (LR), ordinal logistic regression (OLR), artificial neural networks with 10 and 100 neurons (ANN10 and ANN100), and support vector machines (SVM)) is performed. Two test sets were used where Set-1 originated from the same distribution as the training set and Set-2 came from a different distribution. The results showed that the combination of the CAV and the relative CAV with respect to the linear response, i.e., RCAV, performed the best among the different feature combinations. Among the ML models, OLR showed good generalization capabilities when compared to SVM and ANN models. Subsequently, OLR is successfully applied to assess the damage of two numerical multi-degree of freedom (MDOF) models and an instrumented building with CAV and RCAV as features. For the MDOF models, the damage state was identified with accuracy ranging from 84% to 97% and the damage location was identified with accuracy ranging from 93% to 97.5%. The features and the OLR models successfully captured the damage information for the instrumented structure as well. The proposed methodology is capable of ensuring rapid decision-making and improving community resiliency.


2021 ◽  
pp. 219256822110193
Author(s):  
Kevin Y. Wang ◽  
Ijezie Ikwuezunma ◽  
Varun Puvanesarajah ◽  
Jacob Babu ◽  
Adam Margalit ◽  
...  

Study Design: Retrospective review. Objective: To use predictive modeling and machine learning to identify patients at risk for venous thromboembolism (VTE) following posterior lumbar fusion (PLF) for degenerative spinal pathology. Methods: Patients undergoing single-level PLF in the inpatient setting were identified in the National Surgical Quality Improvement Program database. Our outcome measure of VTE included all patients who experienced a pulmonary embolism and/or deep venous thrombosis within 30-days of surgery. Two different methodologies were used to identify VTE risk: 1) a novel predictive model derived from multivariable logistic regression of significant risk factors, and 2) a tree-based extreme gradient boosting (XGBoost) algorithm using preoperative variables. The methods were compared against legacy risk-stratification measures: ASA and Charlson Comorbidity Index (CCI) using area-under-the-curve (AUC) statistic. Results: 13, 500 patients who underwent single-level PLF met the study criteria. Of these, 0.95% had a VTE within 30-days of surgery. The 5 clinical variables found to be significant in the multivariable predictive model were: age > 65, obesity grade II or above, coronary artery disease, functional status, and prolonged operative time. The predictive model exhibited an AUC of 0.716, which was significantly higher than the AUCs of ASA and CCI (all, P < 0.001), and comparable to that of the XGBoost algorithm ( P > 0.05). Conclusion: Predictive analytics and machine learning can be leveraged to aid in identification of patients at risk of VTE following PLF. Surgeons and perioperative teams may find these tools useful to augment clinical decision making risk stratification tool.


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


10.2196/15347 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e15347
Author(s):  
Christopher Michael Homan ◽  
J Nicolas Schrading ◽  
Raymond W Ptucha ◽  
Catherine Cerulli ◽  
Cecilia Ovesdotter Alm

Background Social media is a rich, virtually untapped source of data on the dynamics of intimate partner violence, one that is both global in scale and intimate in detail. Objective The aim of this study is to use machine learning and other computational methods to analyze social media data for the reasons victims give for staying in or leaving abusive relationships. Methods Human annotation, part-of-speech tagging, and machine learning predictive models, including support vector machines, were used on a Twitter data set of 8767 #WhyIStayed and #WhyILeft tweets each. Results Our methods explored whether we can analyze micronarratives that include details about victims, abusers, and other stakeholders, the actions that constitute abuse, and how the stakeholders respond. Conclusions Our findings are consistent across various machine learning methods, which correspond to observations in the clinical literature, and affirm the relevance of natural language processing and machine learning for exploring issues of societal importance in social media.


2021 ◽  
Vol 75 (3) ◽  
pp. 94-99
Author(s):  
A.M. Yelenov ◽  
◽  
A.B. Jaxylykova ◽  

This research focuses on a comparative study of the Named Entity Recognition task for scientific article texts. Natural language processing could be considered as one of the cornerstones in the machine learning area which devotes its attention to the problems connected with the understanding of different natural languages and linguistic analysis. It was already shown that current deep learning techniques have a good performance and accuracy in such areas as image recognition, pattern recognition, computer vision, that could mean that such technology probably would be successful in the neuro-linguistic programming area too and lead to a dramatic increase on the research interest on this topic. For a very long time, quite trivial algorithms have been used in this area, such as support vector machines or various types of regression, basic encoding on text data was also used, which did not provide high results. The following dataset was used to process the experiment models: Dataset Scientific Entity Relation Core. The algorithms used were Long short-term memory, Random Forest Classifier with Conditional Random Fields, and Named-entity recognition with Bidirectional Encoder Representations from Transformers. In the findings, the metrics scores of all models were compared to each other to make a comparison. This research is devoted to the processing of scientific articles, concerning the machine learning area, because the subject is not investigated on enough properly level.The consideration of this task can help machines to understand natural languages better, so that they can solve other neuro-linguistic programming tasks better, enhancing scores in common sense.


Sign in / Sign up

Export Citation Format

Share Document