A gradient boosted decision tree-based sentiment classification of twitter data

Author(s):  
S. Neelakandan ◽  
D. Paulraj

People communicate their views, arguments and emotions about their everyday life on social media (SM) platforms (e.g. Twitter and Facebook). Twitter stands as an international micro-blogging service that features a brief message called tweets. Freestyle writing, incorrect grammar, typographical errors and abbreviations are some noises that occur in the text. Sentiment analysis (SA) centered on a tweet posted by the user, and also opinion mining (OM) of the customers review is another famous research topic. The texts are gathered from users’ tweets by means of OM and automatic-SA centered on ternary classifications, namely positive, neutral and negative. It is very challenging for the researchers to ascertain sentiments as a result of its limited size, misspells, unstructured nature, abbreviations and slangs for Twitter data. This paper, with the aid of the Gradient Boosted Decision Tree classifier (GBDT), proposes an efficient SA and Sentiment Classification (SC) of Twitter data. Initially, the twitter data undergoes pre-processing. Next, the pre-processed data is processed using HDFS MapReduce. Now, the features are extracted from the processed data, and then efficient features are selected using the Improved Elephant Herd Optimization (I-EHO) technique. Now, score values are calculated for each of those chosen features and given to the classifier. At last, the GBDT classifier classifies the data as negative, positive, or neutral. Experiential results are analyzed and contrasted with the other conventional techniques to show the highest performance of the proposed method.

Water ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2249
Author(s):  
Ghorban Mahtabi ◽  
Barkha Chaplot ◽  
Hazi Mohammad Azamathulla ◽  
Mahesh Pal

This paper presents a classification using a decision tree algorithm of hydraulic jump over rough beds based on the approach Froude number, Fr1. Specifically, 581 datasets, from literature, were analyzed. Of these, 280 datasets were for natural rough beds and 301 were for artificial rough beds. The said dataset was divided into four classes based on the energy losses. To compare the performance of the decision tree classifier (J48), a multi-layer neural network (NN) was used. The results suggest an improved performance in terms of classification accuracy by the J48 algorithm in comparison to the NN classifier. Furthermore, the classifier model had only four leaves and achieved an accuracy of 91.56%. Furthermore, classification results showed that the first class (A) of hydraulic jump over the rough beds is approximately similar to that for the smooth bed. Moreover, in the next three classes (B, C, and D), upper values of Fr1 decreased with respect to the smooth bed classes. Lastly, in class D, the upper value of Fr1 reduced to 7.45, which indicates that the shear stress (i.e., the energy loss) grows sharply with increasing Fr1. Put simply, bed roughness effectively increases the energy dissipation with an increase in the Fr1.


2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Majid Nour ◽  
Kemal Polat

Hypertension (high blood pressure) is an important disease seen among the public, and early detection of hypertension is significant for early treatment. Hypertension is depicted as systolic blood pressure higher than 140 mmHg or diastolic blood pressure higher than 90 mmHg. In this paper, in order to detect the hypertension types based on the personal information and features, four machine learning (ML) methods including C4.5 decision tree classifier (DTC), random forest, linear discriminant analysis (LDA), and linear support vector machine (LSVM) have been used and then compared with each other. In the literature, we have first carried out the classification of hypertension types using classification algorithms based on personal data. To further explain the variability of the classifier type, four different classifier algorithms were selected for solving this problem. In the hypertension dataset, there are eight features including sex, age, height (cm), weight (kg), systolic blood pressure (mmHg), diastolic blood pressure (mmHg), heart rate (bpm), and BMI (kg/m2) to explain the hypertension status and then there are four classes comprising the normal (healthy), prehypertension, stage-1 hypertension, and stage-2 hypertension. In the classification of the hypertension dataset, the obtained classification accuracies are 99.5%, 99.5%, 96.3%, and 92.7% using the C4.5 decision tree classifier, random forest, LDA, and LSVM. The obtained results have shown that ML methods could be confidently used in the automatic determination of the hypertension types.


Oncogene ◽  
2021 ◽  
Author(s):  
Dvir Netanely ◽  
Stav Leibou ◽  
Roma Parikh ◽  
Neta Stern ◽  
Hananya Vaknine ◽  
...  

AbstractCutaneous melanoma tumors are heterogeneous and show diverse responses to treatment. Identification of robust molecular biomarkers for classifying melanoma tumors into clinically distinct and homogenous subtypes is crucial for improving the diagnosis and treatment of the disease. In this study, we present a classification of melanoma tumors into four subtypes with different survival profiles based on three distinct gene expression signatures: keratin, immune, and melanogenesis. The melanogenesis expression pattern includes several genes that are characteristic of the melanosome organelle and correlates with worse survival, suggesting the involvement of melanosomes in melanoma aggression. We experimentally validated the secretion of melanosomes into surrounding tissues by melanoma tumors, which potentially affects the lethality of metastasis. We propose a simple molecular decision tree classifier for predicting a tumor’s subtype based on representative genes from the three identified signatures. Key predictor genes were experimentally validated on melanoma samples taken from patients with varying survival outcomes. Our three-pattern approach for classifying melanoma tumors can contribute to advancing the understanding of melanoma variability and promote accurate diagnosis, prognostication, and treatment.


Modelling the sentiment with context is one of the most important part in Sentiment analysis. There are various classifiers which helps in detecting and classifying it. Detection of sentiment with consideration of sarcasm would make it more accurate. But detection of sarcasm in people review is a challenging task and it may lead to wrong decision making or classification if not detected. This paper uses Decision Tree and Random forest classifiers and compares the performance of both. Here we consider the random forest as hybrid decision tree classifier. We propose that performance of random forest classifier is better than any other normal decision tree classifier with appropriate reasoning


Loan Default Prediction For Social Lending Is An Emerging Area Of Research In Predictive Analytics. The Need For Large Amount Of Data And Few Available Studies In The Current Loan Default Prediction Models For Social Lending Suggest That Other Viable And Easily Implementable Models Should Be Investigated And Developed. In View Of This, This Study Developed A Data Mining Model For Predicting Loan Default Among Social Lending Patrons, Specifically The Small Business Owners, Using Boosted Decision Tree Model. The United States Small Business Administration (Usba) PubliclyAvailable Loan Administration Dataset Of 27 Features And 899164 Data Instances Was Used In 80:20 Ratios For The Training And Testing Of The Model. 16 Data Features Were Finally Used As Predictors After Data Cleaning And Feature Engineering. The Gradient Boosting Decision Tree Classifier Recorded 99% Accuracy Compared To The Basic Decision Tree Classifier Of 98%. The Model Is Further Evaluated With (A) Receiver Operating Characteristics (Roc) And Area Under Curve (Auc), (B) Cumulative Accuracy Profile (Cap), And (C) Cumulative Accuracy Profile (Cap) Under Auc. Each Of These Model Performance Evaluation Metrics, Especially Roc-Auc, Showed The Relationship Between The True Positives And False Positives That Implies The Model Is A Good Fit.


Sign in / Sign up

Export Citation Format

Share Document