A Comparison of Text Classification Techniques Applied to Indonesian Text Dataset

Author(s):  
Umniy Salamah

In organization, statement contained opinion and complaint to a service or program by it organization. can be proceed using machine learning and the result can be used by organization to improve and enhance their quality. This research attempted to classify the reports from social media based on complaint and non-complaint using machine learning algorithm named Logistic regression (LR) and eXtreme Gradient Boosting (XGBoost). Logistic Regression model using CountVectorizer feature extraction and TfidfVectorizer. Moreover, the XGBoost algorithm uses multiple parameters so that it can be improved by tuning the parameters, i.e. eta or learning rate, gamma, max_depth, min_child_weight, subsample, colsample_bytree and alpha. As the result, the best value for XGBoost with parameter  are 'reg_alpha': 0.01, 'colsample_bytree': 0.9, 'learning_rate': 0.5, 'min_child_weight': 1, 'subsample': 0.8, 'max_depth': 3, 'gamma': 0.0, in wich the computational time is 13870.012468 and the best accuracy that achieved is 0.927943760984. Furthermore, the performance evaluation results for Logistic Regression using TfidfVectorizer and CountVectorizer feature extraction are 0.9181 and 0.9356.

2021 ◽  
Vol 9 ◽  
Author(s):  
Wenle Li ◽  
Jiaming Wang ◽  
Wencai Liu ◽  
Chan Xu ◽  
Wanying Li ◽  
...  

Background: Bone cement leakage is a common complication of percutaneous vertebroplasty and it could be life-threatening to some extent. The aim of this study was to develop a machine learning model for predicting the risk of cement leakage in patients with osteoporotic vertebral compression fractures undergoing percutaneous vertebroplasty. Furthermore, we developed an online calculator for clinical application.Methods: This was a retrospective study including 385 patients, who had osteoporotic vertebral compression fracture disease and underwent surgery at the Department of Spine Surgery, Liuzhou People's Hospital from June 2016 to June 2018. Combing the patient's clinical characteristics variables, we applied six machine learning (ML) algorithms to develop the predictive models, including logistic regression (LR), Gradient boosting machine (GBM), Extreme gradient boosting (XGB), Random Forest (RF), Decision Tree (DT) and Multilayer perceptron (MLP), which could predict the risk of bone cement leakage. We tested the results with ten-fold cross-validation, which calculated the Area Under Curve (AUC) of the six models and selected the model with the highest AUC as the excellent performing model to build the web calculator.Results: The results showed that Injection volume of bone cement, Surgery time and Multiple vertebral fracture were all independent predictors of bone cement leakage by using multivariate logistic regression analysis in the 385 observation subjects. Furthermore, Heatmap revealed the relative proportions of the 15 clinical variables. In bone cement leakage prediction, the AUC of the six ML algorithms ranged from 0.633 to 0.898, while the RF model had an AUC of 0.898 and was used as the best performing ML Web calculator (https://share.streamlit.io/liuwencai0/pvp_leakage/main/pvp_leakage) was developed to estimate the risk of bone cement leakage that each patient undergoing vertebroplasty.Conclusion: It achieved a good prediction for the occurrence of bone cement leakage with our ML model. The Web calculator concluded based on RF model can help orthopedist to make more individual and rational clinical strategies.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0247866
Author(s):  
Arman Kilic ◽  
Daniel Dochtermann ◽  
Rema Padman ◽  
James K. Miller ◽  
Artur Dubrawski

Risk models have historically displayed only moderate predictive performance in estimating mortality risk in left ventricular assist device therapy. This study evaluated whether machine learning can improve risk prediction for left ventricular assist devices. Primary durable left ventricular assist devices reported in the Interagency Registry for Mechanically Assisted Circulatory Support between March 1, 2006 and December 31, 2016 were included. The study cohort was randomly divided 3:1 into training and testing sets. Logistic regression and machine learning models (extreme gradient boosting) were created in the training set for 90-day and 1-year mortality and their performance was evaluated after bootstrapping with 1000 replications in the testing set. Differences in model performance were also evaluated in cases of concordance versus discordance in predicted risk between logistic regression and extreme gradient boosting as defined by equal size patient tertiles. A total of 16,120 patients were included. Calibration metrics were comparable between logistic regression and extreme gradient boosting. C-index was improved with extreme gradient boosting (90-day: 0.707 [0.683–0.730] versus 0.740 [0.717–0.762] and 1-year: 0.691 [0.673–0.710] versus 0.714 [0.695–0.734]; each p<0.001). Net reclassification index analysis similarly demonstrated an improvement of 48.8% and 36.9% for 90-day and 1-year mortality, respectively, with extreme gradient boosting (each p<0.001). Concordance in predicted risk between logistic regression and extreme gradient boosting resulted in substantially improved c-index for both logistic regression and extreme gradient boosting (90-day logistic regression 0.536 versus 0.752, 1-year logistic regression 0.555 versus 0.726, 90-day extreme gradient boosting 0.623 versus 0.772, 1-year extreme gradient boosting 0.613 versus 0.742, each p<0.001). These results demonstrate that machine learning can improve risk model performance for durable left ventricular assist devices, both independently and as an adjunct to logistic regression.


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1909
Author(s):  
Dougho Park ◽  
Eunhwan Jeong ◽  
Haejong Kim ◽  
Hae Wook Pyun ◽  
Haemin Kim ◽  
...  

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.


2021 ◽  
Author(s):  
Lifan Zhang ◽  
Canzheng Wei ◽  
Yunxia Feng ◽  
Aijia Ma ◽  
Yan Kang

Abstract Background: Acute kidney injury (AKI) is a serve and harmful syndrome in the intensive care unit. The purpose of this study is to develop a prediction model that predict whether patients with AKI stage 1/2 will progress to AKI stage 3. Methods: Patients with AKI stage 1/2, when they were first diagnosed with AKI in the Medical Information Mart for Intensive Care (MIMIC-III), were included. We excluded patients who had underwent RRT or progressed to AKI stage 3 within 72 hours of the first AKI diagnosis. We also excluded patients with chronic kidney disease (CKD). We used the Logistic regression and machine learning extreme gradient boosting (XGBoost) to build two models which can predict patients who will progress to AKI stage 3. Established models were evaluated by cross-validation, receiver operating characteristic curve (ROC), and precision-recall curves (PRC). Results: We included 25711 patients, of whom 2130 (8.3%) progressed to AKI stage 3. Creatinine, multiple organ failure syndromes (MODS), blood urea nitrogen (BUN), sepsis, and respiratory failure were the most important in AKI progression prediction. The XGBoost model has a better performance than the Logistic regression model on predicting AKI stage 3 progression (AU-ROC, 0.926; 95%CI, 0.917 to 0.931 vs. 0.784; 95%CI, 0.771 to 0.796, respectively). Conclusions: The XGboost model can better identify patients with AKI progression than Logistic regression model. Machine learning techniques may improve predictive modeling in medical research. Keywords: Acute kidney injury; Critical care; Logistic Models; Extreme gradient boosting


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


2021 ◽  
Vol 10 (9) ◽  
pp. 1875
Author(s):  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
Wun-Huei Zeng ◽  
Ying-Hsien Huang ◽  
Chun-Hung Richard Lin

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Cheng Qu ◽  
Lin Gao ◽  
Xian-qiang Yu ◽  
Mei Wei ◽  
Guo-quan Fang ◽  
...  

Background. Acute kidney injury (AKI) has long been recognized as a common and important complication of acute pancreatitis (AP). In the study, machine learning (ML) techniques were used to establish predictive models for AKI in AP patients during hospitalization. This is a retrospective review of prospectively collected data of AP patients admitted within one week after the onset of abdominal pain to our department from January 2014 to January 2019. Eighty patients developed AKI after admission (AKI group) and 254 patients did not (non-AKI group) in the hospital. With the provision of additional information such as demographic characteristics or laboratory data, support vector machine (SVM), random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost) were used to build models of AKI prediction and compared to the predictive performance of the classic model using logistic regression (LR). XGBoost performed best in predicting AKI with an AUC of 91.93% among the machine learning models. The AUC of logistic regression analysis was 87.28%. Present findings suggest that compared to the classical logistic regression model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients.


2021 ◽  
Author(s):  
Michał Kruczkowski ◽  
Anna Drabik-Kruczkowska ◽  
Anna Marciniak ◽  
Martyna Tarczewska ◽  
Monika Kosowska ◽  
...  

Abstract Cervical cancer is one of the most commonly appearing cancers, which early diagnosis is of greatest importance. Unfortunately, many diagnoses are based on subjective opinions of doctors – to date, there is no general measurement method with a calibrated standard. The problem can be solved with the measurement system being a fusion of an optoelectronic sensor and machine learning algorithm to provide reliable assistance for doctors in the early diagnosis stage of cervical cancer. We demonstrate the preliminary research on cervical cancer assessment utilizing optical sensor and prediction algorithm. Since each matter is characterized by refractive index, measuring its value and detecting changes give information about the state of the tissue. The optical measurements provided datasets for training and validating the analyzing software. We present data preprocessing, machine learning results utilizing three algorithms (Random Forest, eXtreme Gradient Boosting, Naïve Bayes) and assessment of their performance for classification of tissue as healthy or sick. All of them provided high values (>89%) of the measures describing them. Our solution allows for rapid sample measurement and automatic classification of the results constituting a potential support tool for doctors.


Author(s):  
Ren-qi Yao ◽  
Xin Jin ◽  
Guo-wei Wang ◽  
Yue Yu ◽  
Guo-sheng Wu ◽  
...  

Abstract Background: The incidence of postoperative sepsis is continually increased, while few studies have specifically focused on the risk factors and clinical outcomes associated with the development of sepsis after surgical procedures. The present study aimed to develop a mathematical model for predicting the in-hospital mortality among patients with postoperative sepsis.Methods: Surgical patients in Medical Information Mart for Intensive Care (MIMIC-III) database who simultaneously fulfilled Sepsis 3.0 as well as Agency for Healthcare Research and Quality (AHRQ) criteria during ICU admission were incorporated. We employed both extreme gradient boosting (XGBoost) and stepwise logistic regression model to predict in-hospital mortality among included patients with postoperative sepsis. Consequently, model performance was assessed from the angles of discrimination and calibration.Results: We included 3713 patients who fulfilled our inclusion criteria, in which 397 (10.7%) patients died during hospitalization, while 3316 (89.3%) of them survived through discharge. Fluid-electrolyte disturbance, coagulopathy, renal replacement therapy (RRT), urine output, and cardiovascular surgery were important features related to the in-hospital mortality. The XGBoost model had a better performance in both discriminatory ability (c-statistics, 0.835 [95% CI, 0.786 to 0.877] vs. c-statistics, 0.737 [95% CI, 0.688 to 0.786]) and goodness of fit (visualized by calibration curve) compared to the stepwise logistic regression model. Conclusion: XGBoost model appears to be a better performance in predicting hospital mortality among postoperative septic patients compared to the conventional stepwise logistic regression model. Machine learning-based algorithm might have significant application in the development of early warning system for septic patients following major operations.


Sign in / Sign up

Export Citation Format

Share Document