Support Vector Machine based Word Embedding and Feature Reduction for Sentiment Analysis-A Study

Sentiment analysis (SA), also called as opinion mining is the technique for the removal of opinions of a specific entity or feature from reviews dataset. The opinions of other users help in decision making process of people. This paper studies different methods that are aimed at SA. These approaches vary from semantic based methods, machine learning, neural networks, syntactical methods with each having its own strength. Although hybrid approach also exists where the idea is to combine strengths of two or more methods to increase the accuracy. A framework in which sentiment analysis is done by using word embedding and feature reduction techniques is also proposed. Word embedding is a technique in which low-dimensional vector representation of words is provided. Feature reduction method is used with Support Vector Machine (SVM) classifier. The framework will perform sentiment analysis of user opinions by using a machine learning approach and provides a recommendation system for the ease of decision making for users. The proposed system in this paper has solved the scalability problem and improved the accuracy.

Download Full-text

Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine

International Journal of New Media Technology ◽

10.31937/ijnmt.v8i1.2047 ◽

2021 ◽

Vol 8 (1) ◽

pp. 57-64

Author(s):

Lionel Reinhart Halim ◽

Alethea Suryadibrata

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Prediction Accuracy ◽

Word Embedding ◽

Support Vector ◽

Svm Model ◽

Index Terms ◽

Tuning Process ◽

Negative Impacts ◽

Randomized Search

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively. Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling

Download Full-text

Analysis of Feature Reduction Techniques for Online News Popularity Prediction

SMART MOVES JOURNAL IJOSCIENCE ◽

10.24113/ijo-science.v4i10.165 ◽

2018 ◽

Vol 4 (10) ◽

pp. 6

Author(s):

Shivangi Bhargava ◽

Dr. Shivnath Ghosh

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Particle Swarm Optimization ◽

Naive Bayes ◽

Particle Swarm ◽

Naïve Bayes ◽

Online News ◽

Feature Reduction ◽

Support Vector ◽

Swarm Optimization

News popularity is the maximum growth of attention given for particular news article. The popularity of online news depends on various factors such as the number of social media, the number of visitor comments, the number of Likes, etc. It is therefore necessary to build an automatic decision support system to predict the popularity of the news as it will help in business intelligence too. The work presented in this study aims to find the best model to predict the popularity of online news using machine learning methods. In this work, the result analysis is performed by applying Co-relation algorithm, particle swarm optimization and principal component analysis. For performance evaluation support vector machine, naïve bayes, k-nearest neighbor and neural network classifiers are used to classify the popular and unpopular data. From the experimental results, it is observed that support vector machine and naïve bayes outperforms better with co-relation algorithm as well as k-NN and neural network outperforms better with particle swarm optimization.

Download Full-text

Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1840 ◽

2020 ◽

Vol 4 (2) ◽

pp. 362-369

Author(s):

Sharazita Dyah Anggita ◽

Ikmah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

The Public ◽

Svm Algorithm ◽

Bayes Algorithm ◽

Freight Forwarding ◽

Improved Accuracy

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

Combining support vector machine with radial basis function kernel and information gain for sentiment analysis of movie reviews

Journal of Physics Conference Series ◽

10.1088/1742-6596/1918/4/042157 ◽

2021 ◽

Vol 1918 (4) ◽

pp. 042157

Author(s):

Z Abidin ◽

W Destian ◽

R Umer

Keyword(s):

Support Vector Machine ◽

Radial Basis Function ◽

Sentiment Analysis ◽

Basis Function ◽

Information Gain ◽

Support Vector ◽

Radial Basis Function Kernel ◽

Radial Basis

Download Full-text

Multi-Class Sentiment Analysis Comparison Using Support Vector Machine (SVM) and BAGGING Technique-An Ensemble Method

2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE) ◽

10.1109/icscee.2018.8538397 ◽

2018 ◽

Author(s):

Shashank Sharma ◽

Sumit Srivastava ◽

Ashish Kumar ◽

Abhilasha Dangi

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Ensemble Method ◽

Support Vector

Download Full-text

A Feature Based Approach for Sentiment Analysis by Using Support Vector Machine

2016 IEEE 6th International Conference on Advanced Computing (IACC) ◽

10.1109/iacc.2016.11 ◽

2016 ◽

Cited By ~ 13

Author(s):

D.V. Nagarjuna Devi ◽

Chinta Kishore Kumar ◽

Siriki Prasad

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Support Vector ◽

Feature Based

Download Full-text

Aspect Term Extraction for Aspect Based Opinion Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2050.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 2228-2233

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Random Fields ◽

Opinion Mining ◽

Nearest Neighbor ◽

Conditional Random Fields ◽

International Workshop ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Extraction

Opinion Mining (OM) is also called as Sentiment Analysis (SA). Aspect Based Opinion Mining (ABOM) is also called as Aspect Based Sentiment Analysis (ABSA). In this paper, three new features are proposed to extract the aspect term for Aspect Based Sentiment Analysis (ABSA). The influence of the proposed features is evaluated on five classifiers namely Decision Tree (DT), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Conditional Random Fields (CRF). The proposed features are evaluated on the Two datasets on Restaurant and Laptop domains available in International Workshop on Semantic Evaluation 2014 i.e. SemEval 2014. The influence of proposed features is evaluated using Precision, Recall and F1 measures. The proposed features are highly influencing for aspect term extraction on classifiers. The performance of SVM and CRF classifiers with proposed features is more influencing for aspect term extraction compared with NB, DT and KNN classifiers.

Download Full-text

Analisis Sentimen Data Twitter Tentang Pasangan Capres-Cawapres Pemilu 2019 Dengan Metode Lexicon Based Dan Support Vector Machine

Jurnal Ilmiah FIFO ◽

10.22441/fifo.2019.v11i2.004 ◽

2019 ◽

Vol 11 (2) ◽

pp. 144

Author(s):

Danar Wido Seno ◽

Arief Wibowo

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Vice President ◽

Training Data ◽

Support Vector ◽

New Words ◽

Textual Data ◽

Data Content ◽

Combination Of Methods

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.

Download Full-text