A gradient boosted decision tree-based sentiment classification of twitter data

S. Neelakandan; D. Paulraj

doi:10.1142/s0219691320500277

A gradient boosted decision tree-based sentiment classification of twitter data

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500277 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2050027

Author(s):

S. Neelakandan ◽

D. Paulraj

Keyword(s):

Decision Tree ◽

Opinion Mining ◽

Research Topic ◽

Sentiment Classification ◽

Decision Tree Classifier ◽

Twitter Data ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Text Sentiment Analysis

People communicate their views, arguments and emotions about their everyday life on social media (SM) platforms (e.g. Twitter and Facebook). Twitter stands as an international micro-blogging service that features a brief message called tweets. Freestyle writing, incorrect grammar, typographical errors and abbreviations are some noises that occur in the text. Sentiment analysis (SA) centered on a tweet posted by the user, and also opinion mining (OM) of the customers review is another famous research topic. The texts are gathered from users’ tweets by means of OM and automatic-SA centered on ternary classifications, namely positive, neutral and negative. It is very challenging for the researchers to ascertain sentiments as a result of its limited size, misspells, unstructured nature, abbreviations and slangs for Twitter data. This paper, with the aid of the Gradient Boosted Decision Tree classifier (GBDT), proposes an efficient SA and Sentiment Classification (SC) of Twitter data. Initially, the twitter data undergoes pre-processing. Next, the pre-processed data is processed using HDFS MapReduce. Now, the features are extracted from the processed data, and then efficient features are selected using the Improved Elephant Herd Optimization (I-EHO) technique. Now, score values are calculated for each of those chosen features and given to the classifier. At last, the GBDT classifier classifies the data as negative, positive, or neutral. Experiential results are analyzed and contrasted with the other conventional techniques to show the highest performance of the proposed method.

Download Full-text

Classification of Hydraulic Jump in Rough Beds

Water ◽

10.3390/w12082249 ◽

2020 ◽

Vol 12 (8) ◽

pp. 2249

Author(s):

Ghorban Mahtabi ◽

Barkha Chaplot ◽

Hazi Mohammad Azamathulla ◽

Mahesh Pal

Keyword(s):

Decision Tree ◽

Hydraulic Jump ◽

Decision Tree Classifier ◽

Class A ◽

Class D ◽

Tree Classifier ◽

Improved Performance ◽

Bed Roughness ◽

Rough Beds

This paper presents a classification using a decision tree algorithm of hydraulic jump over rough beds based on the approach Froude number, Fr1. Specifically, 581 datasets, from literature, were analyzed. Of these, 280 datasets were for natural rough beds and 301 were for artificial rough beds. The said dataset was divided into four classes based on the energy losses. To compare the performance of the decision tree classifier (J48), a multi-layer neural network (NN) was used. The results suggest an improved performance in terms of classification accuracy by the J48 algorithm in comparison to the NN classifier. Furthermore, the classifier model had only four leaves and achieved an accuracy of 91.56%. Furthermore, classification results showed that the first class (A) of hydraulic jump over the rough beds is approximately similar to that for the smooth bed. Moreover, in the next three classes (B, C, and D), upper values of Fr1 decreased with respect to the smooth bed classes. Lastly, in class D, the upper value of Fr1 reduced to 7.45, which indicates that the shear stress (i.e., the energy loss) grows sharply with increasing Fr1. Put simply, bed roughness effectively increases the energy dissipation with an increase in the Fr1.

Download Full-text

An Efficient Detection of HCC-recurrence in Clinical Data Processing using Boosted Decision Tree Classifier

Procedia Computer Science ◽

10.1016/j.procs.2020.03.196 ◽

2020 ◽

Vol 167 ◽

pp. 193-204

Author(s):

P. Radha ◽

R. Divya

Keyword(s):

Decision Tree ◽

Data Processing ◽

Clinical Data ◽

Decision Tree Classifier ◽

Efficient Detection ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Hcc Recurrence

Download Full-text

Speaker recognition using adaptively boosted decision tree classifier

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.5743678 ◽

2002 ◽

Cited By ~ 3

Author(s):

Say Wei Foo ◽

Eng Guan Lim

Keyword(s):

Decision Tree ◽

Speaker Recognition ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Boosted Decision Tree

Download Full-text

Automatic Classification of Hypertension Types Based on Personal Features by Machine Learning Algorithms

Mathematical Problems in Engineering ◽

10.1155/2020/2742781 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Majid Nour ◽

Kemal Polat

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Random Forest ◽

Decision Tree ◽

Systolic Blood Pressure ◽

Diastolic Blood Pressure ◽

Decision Tree Classifier ◽

Tree Classifier ◽

C4.5 Decision Tree

Hypertension (high blood pressure) is an important disease seen among the public, and early detection of hypertension is significant for early treatment. Hypertension is depicted as systolic blood pressure higher than 140 mmHg or diastolic blood pressure higher than 90 mmHg. In this paper, in order to detect the hypertension types based on the personal information and features, four machine learning (ML) methods including C4.5 decision tree classifier (DTC), random forest, linear discriminant analysis (LDA), and linear support vector machine (LSVM) have been used and then compared with each other. In the literature, we have first carried out the classification of hypertension types using classification algorithms based on personal data. To further explain the variability of the classifier type, four different classifier algorithms were selected for solving this problem. In the hypertension dataset, there are eight features including sex, age, height (cm), weight (kg), systolic blood pressure (mmHg), diastolic blood pressure (mmHg), heart rate (bpm), and BMI (kg/m2) to explain the hypertension status and then there are four classes comprising the normal (healthy), prehypertension, stage-1 hypertension, and stage-2 hypertension. In the classification of the hypertension dataset, the obtained classification accuracies are 99.5%, 99.5%, 96.3%, and 92.7% using the C4.5 decision tree classifier, random forest, LDA, and LSVM. The obtained results have shown that ML methods could be confidently used in the automatic determination of the hypertension types.

Download Full-text

Decision Tree Classifier for Classification of Plant and Animal Micro RNA’s

Communications in Computer and Information Science - Computational Intelligence and Intelligent Systems ◽

10.1007/978-3-642-04962-0_51 ◽

2009 ◽

pp. 443-451 ◽

Cited By ~ 4

Author(s):

Bhasker Pant ◽

Kumud Pant ◽

K. R. Pardasani

Keyword(s):

Decision Tree ◽

Decision Tree Classifier ◽

Tree Classifier

Download Full-text

Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform

Applied Mathematics and Computation ◽

10.1016/j.amc.2006.09.022 ◽

2007 ◽

Vol 187 (2) ◽

pp. 1017-1026 ◽

Cited By ~ 264

Author(s):

Kemal Polat ◽

Salih Güneş

Keyword(s):

Fourier Transform ◽

Decision Tree ◽

Fast Fourier Transform ◽

Hybrid System ◽

Decision Tree Classifier ◽

Tree Classifier

Download Full-text

Classification of Sleep Apneas using Decision Tree Classifier

2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iciccs51141.2021.9432197 ◽

2021 ◽

Author(s):

Remalli Rohan ◽

L. V. Rajani Kumari

Keyword(s):

Decision Tree ◽

Decision Tree Classifier ◽

Tree Classifier

Download Full-text

Classification of node-positive melanomas into prognostic subgroups using keratin, immune, and melanogenesis expression patterns

Oncogene ◽

10.1038/s41388-021-01665-0 ◽

2021 ◽

Author(s):

Dvir Netanely ◽

Stav Leibou ◽

Roma Parikh ◽

Neta Stern ◽

Hananya Vaknine ◽

...

Keyword(s):

Gene Expression ◽

Decision Tree ◽

Cutaneous Melanoma ◽

Expression Patterns ◽

Accurate Diagnosis ◽

Survival Outcomes ◽

Decision Tree Classifier ◽

Pattern Approach ◽

Tree Classifier

AbstractCutaneous melanoma tumors are heterogeneous and show diverse responses to treatment. Identification of robust molecular biomarkers for classifying melanoma tumors into clinically distinct and homogenous subtypes is crucial for improving the diagnosis and treatment of the disease. In this study, we present a classification of melanoma tumors into four subtypes with different survival profiles based on three distinct gene expression signatures: keratin, immune, and melanogenesis. The melanogenesis expression pattern includes several genes that are characteristic of the melanosome organelle and correlates with worse survival, suggesting the involvement of melanosomes in melanoma aggression. We experimentally validated the secretion of melanosomes into surrounding tissues by melanoma tumors, which potentially affects the lethality of metastasis. We propose a simple molecular decision tree classifier for predicting a tumor’s subtype based on representative genes from the three identified signatures. Key predictor genes were experimentally validated on melanoma samples taken from patients with varying survival outcomes. Our three-pattern approach for classifying melanoma tumors can contribute to advancing the understanding of melanoma variability and promote accurate diagnosis, prognostication, and treatment.

Download Full-text

Random Forest: A Hybrid Implementation for Sarcasm Detection in Public Opinion Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3758.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 5022-5025

Keyword(s):

Decision Making ◽

Public Opinion ◽

Random Forest ◽

Decision Tree ◽

Opinion Mining ◽

Random Forest Classifier ◽

Decision Tree Classifier ◽

Wrong Decision ◽

Tree Classifier ◽

Better Than

Modelling the sentiment with context is one of the most important part in Sentiment analysis. There are various classifiers which helps in detecting and classifying it. Detection of sentiment with consideration of sarcasm would make it more accurate. But detection of sarcasm in people review is a challenging task and it may lead to wrong decision making or classification if not detected. This paper uses Decision Tree and Random forest classifiers and compares the performance of both. Here we consider the random forest as hybrid decision tree classifier. We propose that performance of random forest classifier is better than any other normal decision tree classifier with appropriate reasoning

Download Full-text

A Boosted Decision Tree Model for Predicting Loan Default in P2P Lending Communities

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9626.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1257-1261

Keyword(s):

Small Business ◽

Decision Tree ◽

Decision Tree Classifier ◽

Tree Model ◽

Loan Default ◽

Accuracy Profile ◽

Default Prediction ◽

Tree Classifier ◽

Social Lending ◽

Boosted Decision Tree

Loan Default Prediction For Social Lending Is An Emerging Area Of Research In Predictive Analytics. The Need For Large Amount Of Data And Few Available Studies In The Current Loan Default Prediction Models For Social Lending Suggest That Other Viable And Easily Implementable Models Should Be Investigated And Developed. In View Of This, This Study Developed A Data Mining Model For Predicting Loan Default Among Social Lending Patrons, Specifically The Small Business Owners, Using Boosted Decision Tree Model. The United States Small Business Administration (Usba) PubliclyAvailable Loan Administration Dataset Of 27 Features And 899164 Data Instances Was Used In 80:20 Ratios For The Training And Testing Of The Model. 16 Data Features Were Finally Used As Predictors After Data Cleaning And Feature Engineering. The Gradient Boosting Decision Tree Classifier Recorded 99% Accuracy Compared To The Basic Decision Tree Classifier Of 98%. The Model Is Further Evaluated With (A) Receiver Operating Characteristics (Roc) And Area Under Curve (Auc), (B) Cumulative Accuracy Profile (Cap), And (C) Cumulative Accuracy Profile (Cap) Under Auc. Each Of These Model Performance Evaluation Metrics, Especially Roc-Auc, Showed The Relationship Between The True Positives And False Positives That Implies The Model Is A Good Fit.

Download Full-text