Abstract WP312: Identifying Acute Ischemic Stroke by Analyzing Icd-10 Claims Data Using Machine Learning Models

Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Charles Esenwa ◽  
Jorge Luna ◽  
Benjamin Kummer ◽  
Hojjat Salmasian ◽  
David Vawdrey ◽  
...  

Introduction: Retrospective identification of patients hospitalized with new diagnosis of acute ischemic stroke is important for administrative quality assurance, post-discharge clinical management, and stroke research. The benefit of using administrative claims data is its widespread availability, but the disadvantage is in the inability to accurately and consistently identify the clinical diagnosis of interest. Hypothesis: We hypothesized that decision tree and logistic regression models could be applied to administrative claims data coded using International Classification of Diseases, version 10 (ICD-10) to create algorithms that could accurately identify patients with acute ischemic stroke. Methods: We used hospital records from our institution to develop a gold standard list of 243 patients, continuously hospitalized with a new diagnosis of stroke from 10/1/2015 to 3/31/2016. We used 1,393 neurological patients without a diagnosis of stroke as negative controls. This list was used to train and test two machine learning methods of diagnosis and procedure codes analysis, for the purpose of ischemic stroke identification: one using classification and regression tree (CART) and another using regularized logistic regression. We trained the models using 75% of the data and performed the evaluation using the remaining 25%. Results: The CART model had a κ=0.78, sensitivity of 96%, specificity of 90%, and a positive predictive value of 99%. The regularized logistic regression model had a κ=0.73, sensitivity of 97%, specificity of 81%, and a positive predictive value of 98%. Conclusion: Both the decision tree and logistic regression machine based learning models showed very high accuracy in identifying patients with a new diagnosis of ischemic stroke, using ICD-10 code claims data, when compared to our gold standard. Applying these machine learning models to identify patients with ischemic stroke has widespread applications, especially in this period where national billing data has transitioned from ICD-9 to ICD-10 codes.

Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Charles Esenwa ◽  
Jorge Luna ◽  
Benjamin Kummer ◽  
Hojjat Salmasian ◽  
Hooman Kamel ◽  
...  

Introduction: Stroke research using widely available institutional, state-wide and national retrospective data is dependent on accurate identification of stroke subtypes using claims data. Despite the abundance of such data and the advances in clinical informatics, there is limited published data on the application of machine learning models to improve previously reported administrative stroke identification algorithms. Hypothesis: We hypothesized that machine learning models can be applied to claims data coded using the International Classification of Disease, version 9 (ICD-9), to accuracy identify patients with ischemic stroke (IS), intracerebral hemorrhage (ICH), and subarachnoid hemorrhage (SAH), and these models would outperform previously published algorithms in our patient cohort. Methods: We developed a gold standard list of 427 stroke patients continuously admitted to our institution from 1/1/2015 to 9/30/2015 using an internal stroke database and applied 75% of it to train and 25% to test two machine learning models: one using classification and regression tree (CART) and another using regularized logistic regression. There were 2,241 negative controls. We further applied a previously reported stroke detection algorithm, by Tirschwell and Longstreth, to our cohort for comparison. Results: The CART model had a κ of 0.72, 0.82, 0.59; sensitivity of 95%, 99%, 99%; and a specificity of 88%, 78%, 75%; for IS, ICH and SAH respectively. The regularized logistic regression model had a κ of 0.73, 0.80, 0.59; sensitivity of 95%, 99%, 99%, and a specificity of 89%, 78%, 75%; for IS, ICH and SAH respectively. The previously reported algorithm by Tirschwell et al, had a κ of 0.71,0.56, 0.64; sensitivity of 98%, 99%, 99%; and a specificity of 64%, 52%, 50%; for IS, ICH and SAH. Conclusion: Compared with the previously reported ICD 9 based detection algorithm, the machine learning models had a higher κ for diagnosis of IS and ICH, similar sensitivity for all subtypes, and higher specificity for all stroke subtypes in our cohort. Applying machine learning models to identify stroke subtypes from administrative data sets, can lead to highly accurate models of stroke subtype identification for health services researchers.


Diagnostics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 80
Author(s):  
I-Min Chiu ◽  
Wun-Huei Zeng ◽  
Chi-Yung Cheng ◽  
Shih-Hsuan Chen ◽  
Chun-Hung Richard Lin

Prediction of functional outcome in ischemic stroke patients is useful for clinical decisions. Previous studies mostly elaborate on the prediction of favorable outcomes. Miserable outcomes, which are usually defined as modified Rankin Scale (mRS) 5–6, should be considered as well before further invasive intervention. By using a machine learning algorithm, we aimed to develop a multiclass classification model for outcome prediction in acute ischemic stroke patients requiring reperfusion therapy. This was a retrospective study performed at a stroke medical center in Taiwan. Patients with acute ischemic stroke who visited between January 2016 and December 2019 and who were candidates for reperfusion therapy were included. Clinical outcomes were classified as favorable outcome, intermediate outcome, and miserable outcome. We developed four different multiclass machine learning models (Logistic Regression, Supportive Vector Machine, Random Forest, and Extreme Gradient Boosting) to predict clinical outcomes and compared their performance to the DRAGON score. A sample of 590 patients was included in this study. Of them, 180 (30.5%) had favorable outcomes and 152 (25.8%) had miserable outcomes. All selected machine learning models outperformed the DRAGON score on accuracy of outcome prediction (Logistic Regression: 0.70, Supportive Vector Machine: 0.67, Random Forest: 0.69, and Extreme Gradient Boosting: 0.67, vs. DRAGON: 0.51, p < 0.001). Among all selected models, Logistic Regression also had a better performance than the DRAGON score on positive predictive value, sensitivity, and specificity. Compared with the DRAGON score, the multiclass machine learning approach showed better performance on the prediction of the 3-month functional outcome of acute ischemic stroke patients requiring reperfusion therapy.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241917
Author(s):  
Malte Grosser ◽  
Susanne Gellißen ◽  
Patrick Borchert ◽  
Jan Sedlacik ◽  
Jawed Nawabi ◽  
...  

Background An accurate prediction of tissue outcome in acute ischemic stroke patients is of high interest for treatment decision making. To date, various machine learning models have been proposed that combine multi-parametric imaging data for this purpose. However, most of these machine learning models were trained using voxel information extracted from the whole brain, without taking differences in susceptibility to ischemia into account that exist between brain regions. The aim of this study was to develop and evaluate a local tissue outcome prediction approach, which makes predictions using locally trained machine learning models and thus accounts for regional differences. Material and methods Multi-parametric MRI data from 99 acute ischemic stroke patients were used for the development and evaluation of the local tissue outcome prediction approach. Diffusion (ADC) and perfusion parameter maps (CBF, CBV, MTT, Tmax) and corresponding follow-up lesion masks for each patient were registered to the MNI brain atlas. Logistic regression (LR) and random forest (RF) models were trained employing a local approach, which makes predictions using models individually trained for each specific voxel position using the corresponding local data. A global approach, which uses a single model trained using all voxels of the brain, was used for comparison. Tissue outcome predictions resulting from the global and local RF and LR models, as well as a combined (hybrid) approach were quantitatively evaluated and compared using the area under the receiver operating characteristic curve (ROC AUC), the Dice coefficient, and the sensitivity and specificity metrics. Results Statistical analysis revealed the highest ROC AUC and Dice values for the hybrid approach. With 0.872 (ROC AUC; LR) and 0.353 (Dice; RF), these values were significantly higher (p < 0.01) than the values of the two other approaches. In addition, the local approach achieved the highest sensitivity of 0.448 (LR). Overall, the hybrid approach was only outperformed in sensitivity (LR) by the local approach and in specificity by both other approaches. However, in these cases the effect sizes were comparatively small. Conclusion The results of this study suggest that using locally trained machine learning models can lead to better lesion outcome prediction results compared to a single global machine learning model trained using all voxel information independent of the location in the brain.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Martine De Cock ◽  
Rafael Dowsley ◽  
Anderson C. A. Nascimento ◽  
Davis Railsback ◽  
Jianwei Shen ◽  
...  

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.


2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


Sign in / Sign up

Export Citation Format

Share Document