Python NLTK Sentiment Inspectionusing Naïve Bayes Classifier

The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Doc2Vec &Naïve Bayes: Learners’ Cognitive Presence Assessment through Asynchronous Online Discussion TQ Transcripts

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v14i08.9964 ◽

2019 ◽

Vol 14 (08) ◽

pp. 70 ◽

Cited By ~ 3

Author(s):

Hind Hayati ◽

Abdessamad Chanaa ◽

Mohammed Khalidi Idrissi ◽

Samir Bennani

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Online Discussions ◽

Cognitive Presence ◽

Machine Learning Techniques ◽

Face To Face ◽

Learning Techniques ◽

Bayes Algorithm ◽

Context Features

Due to the lack of face to face interaction in online learning environment, this article aims essentially to give tutors the opportunity to understand and analyze learners’ cognitive behavior. In this perspective, we propose an automatic system to assess learners’ cognitive presence regarding their social interactions within synchronous online discussions. Combining Natural Language Preprocessing, Doc2Vec document embedding method and machine learning techniques; we first make some transformations and preprocessing to the given transcripts, then we apply Doc2Vec method to represent each message as a vector that will be concatenated with LIWC and context features. The vectors are input data of Naïve Bayes algorithm; a machine learning method; that aims to classify transcripts according to cognitive presence categories.

Download Full-text

LARGE SCALE EXPERIMENTS WITH NAIVE BAYES AND DECISION TREES FOR FUNCTION TAGGING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008004011 ◽

2008 ◽

Vol 17 (03) ◽

pp. 483-499

Author(s):

MIHAI LINTEAN ◽

VASILE RUS

Keyword(s):

Decision Trees ◽

Large Scale ◽

Naive Bayes ◽

Classification Problem ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Parse Tree ◽

Data Set ◽

Learning Techniques ◽

Logical Subject

This paper describes the use of two machine learning techniques, naive Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic parse tree. Function tags are extra functional information, such as logical subject or predicate, that can be added to certain nodes in syntactic parse trees. We model the function tags assignment problem as a classification problem. Each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags. The paper offers the first systematic comparison of the two techniques, naive Bayes and decision trees, for the task of function tags assignment. The comparison is based on a standardized data set, the Penn Treebank, a collection of sentences annotated with syntactic information including function tags. We found out that decision trees generally outperform naive Bayes for the task of function tagging. Furthermore, this is the first large scale evaluation of decision trees based solutions to the task of functional tagging.

Download Full-text

Borderline and Depression: A Thin EEG Line

Clinical EEG and Neuroscience ◽

10.1177/15500594211060830 ◽

2021 ◽

pp. 155005942110608

Author(s):

Jakša Vukojević ◽

Damir Mulc ◽

Ivana Kinder ◽

Eda Jovičić ◽

Krešimir Friganović ◽

...

Keyword(s):

Machine Learning ◽

Borderline Personality ◽

Machine Learning Techniques ◽

Major Depressive ◽

Everyday Clinical Practice ◽

Data Set ◽

Learning Techniques ◽

Eeg Recordings ◽

The Given ◽

Close Interrelationship

In everyday clinical practice, there is an ongoing debate about the nature of major depressive disorder (MDD) in patients with borderline personality disorder (BPD). The underlying research does not give us a clear distinction between those 2 entities, although depression is among the most frequent comorbid diagnosis in borderline personality patients. The notion that depression can be a distinct disorder but also a symptom in other psychopathologies led our team to try and delineate those 2 entities using 146 EEG recordings and machine learning. The utilized algorithms, developed solely for this purpose, could not differentiate those 2 entities, meaning that patients suffering from MDD did not have significantly different EEG in terms of patients diagnosed with MDD and BPD respecting the given data and methods used. By increasing the data set and the spatiotemporal specificity, one could have a more sensitive diagnostic approach when using EEG recordings. To our knowledge, this is the first study that used EEG recordings and advanced machine learning techniques and further confirmed the close interrelationship between those 2 entities.

Download Full-text

Sentiment analysis on Twitter Data-set using Naive Bayes algorithm

2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) ◽

10.1109/icatcct.2016.7912034 ◽

2016 ◽

Cited By ~ 20

Author(s):

Huma Parveen ◽

Shikha Pandey

Keyword(s):

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

Twitter Data ◽

Bayes Algorithm

Download Full-text

An Intrusion Detection Model based on Hybrid Classification algorithm

MATEC Web of Conferences ◽

10.1051/matecconf/201824603027 ◽

2018 ◽

Vol 246 ◽

pp. 03027

Author(s):

Manfu Ma ◽

Wei Deng ◽

Hongtong Liu ◽

Xinmiao Yun

Keyword(s):

Intrusion Detection ◽

Detection Rate ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

Data Set ◽

Detection Model ◽

Performance Requirements ◽

Hybrid Classification ◽

Bayes Algorithm

Due to using the single classification algorithm can not meet the performance requirements of intrusion detection, combined with the numerical value of KNN and the advantage of naive Bayes in the structure of data, an intrusion detection model KNN-NB based on KNN and Naive Bayes hybrid classification algorithm is proposed. The model first preprocesses the NSL-KDD intrusion detection data set. And then by exploiting the advantages of KNN algorithm in data values, the model calculates the distance between the samples according to the feature items and selects the K sample data with the smallest distance. Finally, by naive Bayes to get the final result. The experimental results on the NSL-KDD dataset show that the KNN-NB algorithm can meet the requirement of balanced performance than the traditional KNN and Naive Bayes algorithm in term of accuracy, sensitivity, false detection rate, specificity, and missed detection rate.

Download Full-text

Comparison of Various Classification Techniques for Prediction of the Agriculture Production Based on Different Parameters Rainfall, Temperature

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/04247 ◽

2021 ◽

Vol 23 (04) ◽

pp. 356-372

Author(s):

Manpreet Kaur ◽

◽

Dr. Dinesh Kumar ◽

Keyword(s):

Naive Bayes ◽

Confusion Matrix ◽

Big Data Analysis ◽

Naïve Bayes ◽

The Other ◽

Machine Learning Techniques ◽

True Positive ◽

True Negative ◽

Classification Techniques ◽

Learning Techniques

The classification techniques based on various machine learning techniques are having use for the Big data analysis. This will be useful in identifying the classification and then finally the prediction which will be useful for the decision managers for having quality decisions. There are various types of supervised and unsupervised learning techniques which are having capabilities in the terms of driving the analysis. This analysis will be useful for having identification of relationship between the various attributes which is required to device the analysis. There are various supervised learning techniques which are useful to drive the analysis. These techniques are SVM, Logistic regression, KNN, Naïve Bayes, Tree, Neural network. The relative comparison of this technique is done in the terms of various parameters for example AUC, CA, F1, Recall and precision. The accuracy in the terms of AUC, CA is highest for the Naïve Bayes. This shows the Naïve Bayes is having higher true positives, true negative ratio. The proposed technique is having higher accuracy of 81% which is far above than all the remaining techniques. The confusion matrix for the Naïve Bayes is having true positive count as 729, true negative at 103. This shows that the true positive and true negative count is far above for this technique compared to the other techniques.

Download Full-text

The Comparison of Data Mining Methods Using C4.5 Algorithm and Naive Bayes in Predicting Heart Disease

Tech-E ◽

10.31253/te.v4i2.543 ◽

2021 ◽

Vol 4 (2) ◽

pp. 44

Author(s):

Rino Rino

Keyword(s):

Data Mining ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

A Value ◽

C4.5 Algorithm ◽

Calculation Results ◽

Mining Methods ◽

Bayes Algorithm

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.

Download Full-text

Detecting spam e-mails using stop word TF-IDF and stemming algorithm with Naïve Bayes classifier on the multicore GPU

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i4.pp3168-3175 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3168

Author(s):

Manjit Jaiswal ◽

Sukriti Das ◽

Khushboo Khushboo

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Testing Time ◽

Training Time ◽

Spam Filter ◽

Time Period ◽

Stop Word ◽

Working Principle ◽

Testing Accuracy ◽

Bayes Algorithm

<span>A spam filter is a program which is used to identify unwanted emails and prevents those messages from getting into a user's mail. The study was focused on how the algorithms can be applied on a number of e-mails consisting of both ham and spam e-mails. First, the working principle and steps which are followed for implementation of stop words, TF-IDF and stemming algorithm on NVIDIA’s Tesla P100 GPU are discussed and to verify the findings by executing of Naïve Bayes algorithm. After complete training and testing of the spam e-mails dataset taken from Kaggle by using the proposed method, we got a high training accuracy of 99.67% and got a testing accuracy of about 99.03% on the multicore GPU that boosted the speed of execution of training time period and testing time period which is improved of training and testing accuracy around 0.22% and 0.18% respectively when compared to that after applying only Naïve Bayes i.e. conventional method to the same dataset where we found training and testing accuracy to be 99.45% and 98.85% respectively. Also, we found that training time taken on GPU is 1.361 seconds which was about 1.49X faster than that taken on CPU which is 2.029 seconds. And the testing time taken on GPU is 1.978 seconds which was about 1.15X faster than that taken on CPU which is 2.280 seconds.</span>

Download Full-text

Twitter Sentiment Analysis using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6281.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 4205-4209

Keyword(s):

Logistic Regression ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Learning Techniques ◽

Social Media Platforms

Nowadays people share their views and opinions in twitter and other social media platforms, the way of recognizing sentiments and speculation in tweets is Twitter Sentiment Analysis. Determining the contradiction or sentiment of the tweets and then listing them into positive, negative and neutral tweets is the main classifying step in this process. The issue related to sentiment analysis is the naming of the correct congruous sentiment classifier algorithm to list the tweets. The foundation classifier techniques like Logistic regression, Naive Bayes classifier, Random Forest and SVMs are normally used. In this paper, the Naïve Bayes classifier and Logistic Regression has been used to perform sentiment analysis and classify based on the better accuracy of catagorizing Technique. The outcome shows that Naive Bayes classifier works better for this approach. Data pre-processing and feature extraction is realized as a portion of task.

Download Full-text