Early Diagnostics Model for Dengue Disease Using Decision Tree-Based Approaches

Author(s):  
Shalini Gambhir ◽  
Yugal Kumar ◽  
Sanjay Malik ◽  
Geeta Yadav ◽  
Amita Malik

Classification schemes have been applied in the medical arena to explore patients' data and extract a predictive model.This model helps doctors to improve their prognosis, diagnosis, or treatment planning processes.The aim of this work is to utilize and compare different decision tree classifiers for early diagnosis of Dengue. Six approaches, mainly J48 tree, random tree, REP tree, SOM, logistic regression, and naïve Bayes, have been utilized to study real-world Dengue data collected from different hospitals in the Delhi, India region during 2015-2016. Standard statistical metrics are used to assess the efficiency of the proposed Dengue disease diagnostic system, and the outcomes showed that REP tree is best among these classifiers with 82.7% efficient in supplying an exact diagnosis.

2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S41-S41
Author(s):  
Courtney Moc ◽  
William Shropshire ◽  
Patrick McDaneld ◽  
Samuel A Shelburne ◽  
Samuel L Aitken ◽  
...  

Abstract Background There are several clinical tools that have been developed to predict the likelihood of extended-spectrum β-lactamase producing Enterobacterales; however, the creation of these tools included few patients with cancer or otherwise immunosuppressed. The objectives of this retrospective cohort study were to develop a decision tree and traditional risk score to predict ceftriaxone resistance in cancer patients with Escherichia coli (E. coli) bacteremia as well as to compare the predictive accuracy between the tools. Methods Adults age ≥ 18 years old with E. coli bacteremia at The University of Texas MD Anderson Cancer Center from 1/2018 to 12/2019 were included. Isolates recovered within 1 week from the same patient were excluded. The decision tree was constructed using classification and regression tree analysis, with a minimum node size of 10. The risk score was created using a multivariable logistic regression model derived by using stepwise variable selection with backward elimination at level 0.2. The decision tree and risk score statistical metrics were compared. Results A total of 629 E. coli isolates were screened, of which 580 isolates met criteria. Ceftriaxone-resistant (CRO-R) E. coli accounted for 36% of isolates. The machine learning-derived decision tree included 5 predictors whereas the logistic regression-derived risk score included 7 predictors. The risk score cutoff point of ≥ 5 points demonstrated the most optimized overall classification accuracy. The positive predictive value of the decision tree was higher than that of the risk score (88% vs 74%, respectively), but the area under the receiver operating characteristic curve and model accuracy of the risk score was higher than that of the decision tree (0.85 vs 0.73 and 82% vs 74%, respectively). Figure 1. Clinical Decision Tree Table 1. Regression Model and Assigned Points for Clinical Risk Score Table 2. Statistical Metrics of Clinical Decision Tree and Clinical Risk Score Conclusion The decision tree and risk score can be used to determine the likelihood of whether a cancer patient with E. coli bacteremia has a CRO-R infection. In both clinical tools, the strongest predictor was a history of CRO-R E. coli colonization or infection in the last 6 months. The decision tree was more user-friendly, has fewer variables, and has a better positive predictive value in comparison to the risk score. However, the risk score has a significantly better discrimination and model accuracy than that of the decision tree. Disclosures Samuel L. Aitken, PharmD, MPH, BCIDP, Melinta Therapeutoics (Individual(s) Involved: Self): Consultant, Grant/Research Support


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2019 ◽  
Author(s):  
Joseph Tassone ◽  
Peizhi Yan ◽  
Mackenzie Simpson ◽  
Chetan Mendhe ◽  
Vijay Mago ◽  
...  

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ashwath Radhachandran ◽  
Anurag Garikipati ◽  
Nicole S. Zelin ◽  
Emily Pellegrini ◽  
Sina Ghandian ◽  
...  

Abstract Background Acute heart failure (AHF) is associated with significant morbidity and mortality. Effective patient risk stratification is essential to guiding hospitalization decisions and the clinical management of AHF. Clinical decision support systems can be used to improve predictions of mortality made in emergency care settings for the purpose of AHF risk stratification. In this study, several models for the prediction of seven-day mortality among AHF patients were developed by applying machine learning techniques to retrospective patient data from 236,275 total emergency department (ED) encounters, 1881 of which were considered positive for AHF and were used for model training and testing. The models used varying subsets of age, sex, vital signs, and laboratory values. Model performance was compared to the Emergency Heart Failure Mortality Risk Grade (EHMRG) model, a commonly used system for prediction of seven-day mortality in the ED with similar (or, in some cases, more extensive) inputs. Model performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. Results When trained and tested on a large academic dataset, the best-performing model and EHMRG demonstrated test set AUROCs of 0.84 and 0.78, respectively, for prediction of seven-day mortality. Given only measurements of respiratory rate, temperature, mean arterial pressure, and FiO2, one model produced a test set AUROC of 0.83. Neither a logistic regression comparator nor a simple decision tree outperformed EHMRG. Conclusions A model using only the measurements of four clinical variables outperforms EHMRG in the prediction of seven-day mortality in AHF. With these inputs, the model could not be replaced by logistic regression or reduced to a simple decision tree without significant performance loss. In ED settings, this minimal-input risk stratification tool may assist clinicians in making critical decisions about patient disposition by providing early and accurate insights into individual patient’s risk profiles.


Author(s):  
Hyerim Kim ◽  
Dong Hoon Lim ◽  
Yoona Kim

Few studies have been conducted to classify and predict the influence of nutritional intake on overweight/obesity, dyslipidemia, hypertension and type 2 diabetes mellitus (T2DM) based on deep learning such as deep neural network (DNN). The present study aims to classify and predict associations between nutritional intake and risk of overweight/obesity, dyslipidemia, hypertension and T2DM by developing a DNN model, and to compare a DNN model with the most popular machine learning models such as logistic regression and decision tree. Subjects aged from 40 to 69 years in the 4–7th (from 2007 through 2018) Korea National Health and Nutrition Examination Survey (KNHANES) were included. Diagnostic criteria of dyslipidemia (n = 10,731), hypertension (n = 10,991), T2DM (n = 3889) and overweight/obesity (n = 10,980) were set as dependent variables. Nutritional intakes were set as independent variables. A DNN model comprising one input layer with 7 nodes, three hidden layers with 30 nodes, 12 nodes, 8 nodes in each layer and one output layer with one node were implemented in Python programming language using Keras with tensorflow backend. In DNN, binary cross-entropy loss function for binary classification was used with Adam optimizer. For avoiding overfitting, dropout was applied to each hidden layer. Structural equation modelling (SEM) was also performed to simultaneously estimate multivariate causal association between nutritional intake and overweight/obesity, dyslipidemia, hypertension and T2DM. The DNN model showed the higher prediction accuracy with 0.58654 for dyslipidemia, 0.79958 for hypertension, 0.80896 for T2DM and 0.62496 for overweight/obesity compared with two other machine leaning models with five-folds cross-validation. Prediction accuracy for dyslipidemia, hypertension, T2DM and overweight/obesity were 0.58448, 0.79929, 0.80818 and 0.62486, respectively, when analyzed by a logistic regression, also were 0.52148, 0.66773, 0.71587 and 0.54026, respectively, when analyzed by a decision tree. This study observed a DNN model with three hidden layers with 30 nodes, 12 nodes, 8 nodes in each layer had better prediction accuracy than two conventional machine learning models of a logistic regression and decision tree.


2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 292-292
Author(s):  
Shiru Liu ◽  
Wing Chan ◽  
Genevieve Bouchard-Fortier ◽  
Stephanie Lheureux ◽  
Sarah Ferguson ◽  
...  

292 Background: Initial treatment of epithelial ovarian cancer (EOC) consists of combination of cytoreductive surgery (CSR) and/or chemotherapy. Targeted therapies such as bevacizumab have shown to improve outcomes in a subset population with high-risk features. Real-world patterns of systemic therapy delivery in EOC in the modern era are not well understood. Our objective is to evaluate the patterns of first-line systemic treatment of advanced EOC in Ontario, focusing on adoption of bevacizumab, which was approved for use in 2016. Methods: We conducted a retrospective, population cohort study using administrative databases held at the ICES in Ontario, Canada. Patients diagnosed with non-mucinous EOC between 2014 and 2018 were identified from the Ontario Cancer Registry; early-stage disease was excluded. Information on systemic therapy was obtained from Activity Level Reporting and New Drug Funding Program databases. Provider of care (gynecologic oncologist vs medical oncologist) information was obtained from billing codes. Academic cancer centers were identified using validated systemic facility codes from Cancer Care Ontario. Statistical analyses include descriptive statistics, t-tests, and multivariable logistic regression using SAS. Results: Out of 4,680 cases diagnosed with EOC during the study period, 3,632 (77.6%) were considered advanced stage. Median age of cohort was between 65-70, and the majority had Charlson score of 1-2 (97%) and are urban (91.8%). A total of 3,181 (87.6%) patients underwent CRS and 2,722(74.9%) patients underwent chemotherapy. Of those who received chemotherapy, 1,259 (46.2%) received neoadjuvant chemotherapy, 1,012 (37.2%) received upfront CRS, and 451(16.5%) received chemotherapy only. The majority of chemotherapy was delivered by gynecologic oncologists (60.6%) and in academic cancer centres (61.7%). There was no significant difference in use of neoadjuvant chemotherapy between medical oncologists and gynecologic oncologists (p = 0.67). Only 53 chemotherapy patients (1.9%) received bevacizumab containing-regimen in the first-line setting. Medical oncologists were 4 times more likely to administer bevacizumab-containing regimen compared to gynecologic oncologists (OR 4.03, 95% CI.29 – 7.36) after adjusting for age, stage, Charlson score and rurality score on logistic regression. Delivery of bevacizumab is relatively higher in non-academic cancer centres (OR 2.61, 95% CI 2.32- 2.94) while 83% of intraperitoneal chemotherapy is delivered in academic cancer centres. Conclusions: Patterns of care of EOC in Ontario remain heterogenous between care providers and institutions, while uptake of bevacizumab for first-line treatment of EOC remains low. Factors leading to low uptake and real-world outcomes should be explored.


2005 ◽  
Vol 13 (4) ◽  
pp. 431-436 ◽  
Author(s):  
Daniela Gamba Garib ◽  
Nildiceli Leite Melo Zanella ◽  
Sheldon Peck

Certain human dental anomalies frequently occur together, supporting the accumulated evidence of the shared genetic control of dental developmental disturbances. The present study reports a rare and interesting case of a 12-year-old girl with an association of multiple dental abnormalities, including agenesis, tooth malposition and delayed development. The etiology and treatment planning are discussed with reference to the literature. The clinical implications of genetically controlled patterns of dental anomalies are important in the establishment of early diagnosis and appropriate orthodontic intervention.


2015 ◽  
Vol 54 (06) ◽  
pp. 560-567 ◽  
Author(s):  
K. Zhu ◽  
Z. Lou ◽  
J. Zhou ◽  
N. Ballester ◽  
P. Parikh ◽  
...  

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Big Data and Analytics in Healthcare”.Background: Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners.Objectives: Explore the use of conditional logistic regression to increase the prediction accuracy.Methods: We analyzed an HCUP statewide in-patient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models.Results: The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 – 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures.Conclusions: It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.


Sign in / Sign up

Export Citation Format

Share Document