scholarly journals Effects of kernels and the proportion of training data on the accuracy of SVM sentiment analysis in lecturer evaluation

Author(s):  
Daniel Febrian Sengkey ◽  
Agustinus Jacobus ◽  
Fabian Johanes Manoppo

Support vector machine (SVM) is a known method for supervised learning in sentiment analysis and there are many studies about the use of SVM in classifying the sentiments in lecturer evaluation. SVM has various parameters that can be tuned and kernels that can be chosen to improve the classifier accuracy. However, not all options have been explored. Therefore, in this study we compared the four SVM kernels: radial, linear, polynomial, and sigmoid, to discover how each kernel influences the accuracy of the classifier. To make a proper assessment, we used our labeled dataset of students’ evaluations toward the lecturer. The dataset was split, one for training the classifier, and another one for testing the model. As an addition, we also used several different ratios of the training:testing dataset. The split ratios are 0.5 to 0.95, with the increment factor of 0.05. The dataset was split randomly, hence the splitting-training-testing processes were repeated 1,000 times for each kernel and splitting ratio. Therefore, at the end of the experiment, we got 40,000 accuracy data. Later, we applied statistical methods to see whether the differences are significant. Based on the statistical test, we found that in this particular case, the linear kernel significantly has higher accuracy compared to the other kernels. However, there is a tradeoff, where the results are getting more varied with a higher proportion of data used for training.

2019 ◽  
Vol 11 (2) ◽  
pp. 144
Author(s):  
Danar Wido Seno ◽  
Arief Wibowo

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.


Author(s):  
Jalel Akaichi

In this work, we focus on the application of text mining and sentiment analysis techniques for analyzing Tunisian users' statuses updates on Facebook. We aim to extract useful information, about their sentiment and behavior, especially during the “Arabic spring” era. To achieve this task, we describe a method for sentiment analysis using Support Vector Machine and Naïve Bayes algorithms, and applying a combination of more than two features. The output of this work consists, on one hand, on the construction of a sentiment lexicon based on the Emoticons and Acronyms' lexicons that we developed based on the extracted statuses updates; and on the other hand, it consists on the realization of detailed comparative experiments between the above algorithms by creating a training model for sentiment classification.


2021 ◽  
Vol 13 (3) ◽  
pp. 128-133
Author(s):  
Attala Rafid Abelard ◽  
Yuliant Sibaroni

Among many film streaming platforms that have sprung up, Netflix is ​​the platform that has the most subscribers compared to the other platforms. However, not all reviews provided by the Netflix users are good reviews. These reviews will later be analyzed to determine what aspects are reviewed by the users based on reviews written on the Google Play Store, using the Latent Dirichlet Allocation (LDA) method. Then, the classification process using the Support Vector Machine (SVM) method will be carried out to determine whether each of these reviews is included in the positive or negative class (Sentiment Analysis). There are 2 scenarios that were carried out in this study. The first scenario resulted that the best number of LDA topics to be used is 40, and the second scenario resulted that the use of filtering process in the preprocessing stage reduces the score of the f1-score. Thus, this study resulted in the best performance score on LDA and SVM testing with 40 topics, and without running the filtering process with the score of 78.15%.


SINERGI ◽  
2020 ◽  
Vol 24 (2) ◽  
pp. 87
Author(s):  
Mona Cindo ◽  
Dian Palupi Rini ◽  
Ermatita Ermatita

With the advancement of social media and its growth, there is a lot of data that can be presented for research in social mining. Twitter is a microblogging that can be used. In this event, a lot of companies used the data on Twitter to analyze the satisfaction of their customer about product quality. On the other hand, a lot of users use social media to express their daily emotions. The case can be developed into a research study that can be used both to improve product quality, as well as to analyze the opinion on certain events. The research is often called sentiment analysis or opinion mining. While The previous research does a particularly useful feature for sentiment analysis, but it is still a lack of performance. Furthermore, they used Support Vector Machine as a classification method. On the other hand, most researchers found another classification method, which is considered more efficient such as Maximum Entropy. So, this research used two types of a dataset, the general opinion data, and the airline's opinion data. For feature extraction, we employ four feature extraction, such as pragmatic, lexical-grams, pos-grams, and sentiment lexical. For the classification, we use both of Support Vector Machine and Maximum Entropy to find the best result. In the end, the best result is performed by Maximum Entropy with 85,8% accuracy on general opinion data, and 92,6% accuracy on airlines opinion data.


2020 ◽  
pp. 016555152093091
Author(s):  
Saeed-Ul Hassan ◽  
Aneela Saleem ◽  
Saira Hanif Soroya ◽  
Iqra Safder ◽  
Sehrish Iqbal ◽  
...  

The purpose of the study is to (a) contribute to annotating an Altmetrics dataset across five disciplines, (b) undertake sentiment analysis using various machine learning and natural language processing–based algorithms, (c) identify the best-performing model and (d) provide a Python library for sentiment analysis of an Altmetrics dataset. First, the researchers gave a set of guidelines to two human annotators familiar with the task of related tweet annotation of scientific literature. They duly labelled the sentiments, achieving an inter-annotator agreement (IAA) of 0.80 (Cohen’s Kappa). Then, the same experiments were run on two versions of the dataset: one with tweets in English and the other with tweets in 23 languages, including English. Using 6388 tweets about 300 papers indexed in Web of Science, the effectiveness of employed machine learning and natural language processing models was measured by comparing with well-known sentiment analysis models, that is, SentiStrength and Sentiment140, as the baseline. It was proved that Support Vector Machine with uni-gram outperformed all the other classifiers and baseline methods employed, with an accuracy of over 85%, followed by Logistic Regression at 83% accuracy and Naïve Bayes at 80%. The precision, recall and F1 scores for Support Vector Machine, Logistic Regression and Naïve Bayes were (0.89, 0.86, 0.86), (0.86, 0.83, 0.80) and (0.85, 0.81, 0.76), respectively.


2021 ◽  
Vol 13 (2) ◽  
pp. 168-174
Author(s):  
Rifqatul Mukarramah ◽  
Dedy Atmajaya ◽  
Lutfi Budi Ilmawan

Sentiment analysis is a technique to extract information of one’s perception, called sentiment, on an issue or event. This study employs sentiment analysis to classify society’s response on covid-19 virus posted at twitter into 4 polars, namely happy, sad, angry, and scared. Classification technique used is support vector machine (SVM) method which compares the classification performance figure of 2 linear kernel functions, linear and polynomial. There were 400 tweet data used where each sentiment class consists of 100 data. Using the testing method of k-fold cross validation, the result shows the accuracy value of linear kernel function is 0.28 for unigram feature and 0.36 for trigram feature. These figures are lower compared to accuracy value of kernel polynomial with 0.34 and 0.48 for unigram and trigram feature respectively. On the other hand, testing method of confusion matrix suggests the highest performance is obtained by using kernel polynomial with accuracy value of 0.51, precision of 0.43, recall of 0.45, and f-measure of 0.51.


2020 ◽  
Vol 4 (2) ◽  
pp. 362-369
Author(s):  
Sharazita Dyah Anggita ◽  
Ikmah

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.


Author(s):  
Midde Venkateswarlu Naik ◽  
D. Vasumathi ◽  
A.P. Siva Kumar

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.


Sign in / Sign up

Export Citation Format

Share Document