Improving the Sentiment Analysis Process of Spanish Tweets with BM25

Author(s):  
Juan Sixto ◽  
Aitor Almeida ◽  
Diego López-de-Ipiña
Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


Author(s):  
ThippaReddy Gadekallu ◽  
Akshat Soni ◽  
Deeptanu Sarkar ◽  
Lakshmanna Kuruva

Sentiment analysis is a sub-domain of opinion mining where the analysis is focused on the extraction of emotions and opinions of the people towards a particular topic from a structured, semi-structured, or unstructured textual data. In this chapter, the authors try to focus the task of sentiment analysis on IMDB movie review database. This chapter presents the experimental work on a new kind of domain-specific feature-based heuristic for aspect-level sentiment analysis of movie reviews. The authors have devised an aspect-oriented scheme that analyzes the textual reviews of a movie and assign it a sentiment label on each aspect. Finally, the authors conclude that incorporating syntactical information in the models is vital to the sentiment analysis process. The authors also conclude that the proposed approach to sentiment classification supplements the existing rating movie rating systems used across the web and will serve as base to future researches in this domain.


2019 ◽  
Vol 21 (3) ◽  
pp. 347-367
Author(s):  
Thara Angskun ◽  
Jitimon Angskun

Purpose This paper aims to introduce a hierarchical fuzzy system for an online review analysis named FLORA. FLORA enables tourists to decide their destination without reading numerous reviews from experienced tourists. It summarizes reviews and visualizes them through a hierarchical structure. The visualization does not only present overall quality of an accommodation, but it also presents the condition of the bed, hospitality of the front desk receptionist and much more in a snap. Design/methodology/approach FLORA is a complete system which acquires online reviews, analyzes sentiments, computes feature scores and summarizes results in a hierarchical view. FLORA is designed to use an overall score, rated by real tourists as a baseline for accuracy comparison. The accuracy of FLORA has achieved by a novel sentiment analysis process (as part of a knowledge acquisition engine) based on semantic analysis and a novel rating technique, called hierarchical fuzzy calculation, in the knowledge inference engine. Findings The performance comparison of FLORA against related work has been assessed in two aspects. The first aspect focuses on review analysis with binary format representation. The results reveal that the hierarchical fuzzy method, with probability weighting of FLORA, is achieved with the highest values in precision, recall and F-measure. The second aspect looks at review analysis with a five-point rating scale rating by comparing with one of the most advanced research methods, called fuzzy domain ontology. The results reveal that the hierarchical fuzzy method, with probability weighting of FLORA, returns the closest results to the tourist-defined rating. Research limitations/implications This research advances knowledge of online review analysis by contributing a novel sentiment analysis process and a novel rating technique. The FLORA system has two limitations. First, the reviews are based on individual expression, which is an arbitrary distinction and not always grammatically correct. Consequently, some opinions may not be extracted because the context free grammar rules are insufficient. Second, natural languages evolve and diversify all the time. Many emerging words or phrases, including idioms, proverbs and slang, are often used in online reviews. Thus, those words or phrases need to be manually updated in the knowledge base. Practical implications This research contributes to the tourism business and assists travelers by introducing comprehensive and easy to understand information about each accommodation to travelers. Although the FLORA system was originally designed and tested with accommodation reviews, it can also be used with reviews of any products or services by updating data in the knowledge base. Thus, businesses, which have online reviews for their products or services, can benefit from the FLORA system. Originality/value This research proposes a FLORA system which analyzes sentiments from online reviews, computes feature scores and summarizes results in a hierarchical view. Moreover, this work is able to use the overall score, rated by real tourists, as a baseline for accuracy comparison. The main theoretical implication is a novel sentiment analysis process based on semantic analysis and a novel rating technique called hierarchical fuzzy calculation.


Author(s):  
Debby Alita ◽  
Sigit Priyanta ◽  
Nur Rokhman

Background: Indonesia is an active Twitter user that is the largest ranked in the world. Tweets written by Twitter users vary, from tweets containing positive to negative responses. This agreement will be utilized by the parties concerned for evaluation.Objective: On public comments there are emoticons and sarcasm which have an influence on the process of sentiment analysis. Emoticons are considered to make it easier for someone to express their feelings but not a few are also other opinion researchers, namely by ignoring emoticons, the reason being that it can interfere with the sentiment analysis process, while sarcasm is considered to be produced from the results of the sarcasm sentiment analysis in it.Methods: The emoticon and no emoticon categories will be tested with the same testing data using classification method are Naïve Bayes Classifier and Support Vector Machine. Sarcasm data will be proposed using the Random Forest Classifier, Naïve Bayes Classifier and Support Vector Machine method.Results: The use of emoticon with sarcasm detection can increase the accuracy value in the sentiment analysis process using Naïve Bayes Classifier method.Conclusion: Based on the results, the amount of data greatly affects the value of accuracy. The use of emoticons is excellent in the sentiment analysis process. The detection of superior sarcasm only by using the Naïve Bayes Classifier method due to differences in the amount of sarcasm data and not sarcasm in the research process.Keywords:  Emoticon, Naïve Bayes Classifier, Random Forest Classifier, Sarcasm, Support Vector Machine


2020 ◽  
Vol 16 (4) ◽  
pp. 285-295
Author(s):  
Fatima Zohra Ennaji ◽  
Abdelaziz El Fazziki ◽  
Hasna El Alaoui El Abdallaoui ◽  
Hamada El Kabtane

As social networking has spread, people started sharing their personal opinions and thoughts widely via these online platforms. The resulting vast valuable data represent a rich source for companies to deduct their products’ reputation from both social media and crowds’ judgments. To exploit this wealth of data, a framework was proposed to collect opinions and rating scores respectively from social media and crowdsourcing platform to perform sentiment analysis, provide insights about a product and give consumers’ tendencies. During the analysis process, a consumer category (strict) is excluded from the process of reaching a majority consensus. To overcome this, a fuzzy clustering is used to compute consumers’ credibility. The key novelty of our approach is the new layer of validity check using a crowdsourcing component that ensures that the results obtained from social media are supported by opinions extracted directly from real-life consumers. Finally, experiments are carried out to validate this model (Twitter and Facebook were used as data sources). The obtained results show that this approach is more efficient and accurate than existing solutions thanks to our two-layer validity check design.


2020 ◽  
Vol 2 (1) ◽  
pp. 126-138
Author(s):  
Flaviu Bogdan Dan ◽  
Monica Maer-Matei ◽  
Stelian Stancu

Abstract This article aims to use text mining methods and sentiment analysis to determine the stock market evolution of companies as well as virtual currencies such as Bitcoin. The source of the text is the social media channel Twitter and the text is composed of individual messages sent by users. Although previous papers proved with a degree of certainty that this paper hypothesis is true, as we will see bellow, the area of research was focused only on the professional environment or known opinion makers and not taking into account a high population mass. To ensure that a high level of information is maintained after the sentiment analysis process, we will use multiple algorithms based on different calculation methods and different word dictionaries. In addition, indicators such as the number of assessments, the number of replays etc. will be added to the methodology. By the end of the paper we will be able to both identify a working methodology of analyzing text for the purposes of stock market prediction and also we will touch on the limitations faced when creating it and the ways through which we can expand and improve it’s reliability. The implementation of all these methods and of the multiple dictionaries helped us in simulating human behavior and the differences of opinion, when a group wants to analyze a text. The algorithm becoming a way to balance the different “opinions” that resulted out of the sentiment analysis.


Author(s):  
Nurul Husna Mahadzir Et.al

In recent times, sentiment analysis has become one of the most active research and progressively popular areas in information retrieval and text mining. To date, sentiment analysis has been applied in various domains such as product, movie, sport and political reviews. Most of the previous work in this field has focused on analyzing only a single language, especially English. However, with the need of globalization and the increasing number of the Internet used worldwide; it is common to see the post written in multiple languages. Moreover, in an unstructured content like Twitter posts, people tend to mix languages in one sentence, which make sentiment analysis process even harder and more challenging. This paper reviews the state-of-the-art of sentiment analysis for code-mixed, which includes the detail discussions of each focus area, qualitative comparison and limitations of current approaches. This paper also highlights challenges along this line of research and suggests several recommendations for future works that should be explored.


2019 ◽  
Author(s):  
Murilo C. Medeiros ◽  
Vinicius R. P. Borges

This paper describes a methodology for analyzing sentiments and for knowledge discovery in tweets regarding the Brazilian stock market. The proposed methodology starts by preprocessing and characterizing tweets to obtain an associated vector-space model. After that, a dimensionality reduction is em- ployed by using Principal Component Analysis and t-Stochastic Neighbor Embedding. Sentiment analysis of stock market tweets is performed by considering the tasks of sentiment classification, topic modeling and clustering, along with a visual analysis process. Experiments results showed satisfactory performances in single and multi-label sentiment classification scenarios. The visual analysis process also revealed interesting relationships among topics and clusters.


2021 ◽  
Vol 5 (1) ◽  
pp. 24 ◽  
Author(s):  
Chairullah Naury ◽  
Dhomas Hatta Fudholi ◽  
Ahmad Fathan Hidayatullah

The online mass media is the source of the fastest and up-to-date information. A model that can provide mapping will help in sorting out information more precisely. In this study, the authors applied topic modeling to the results of sentiment analysis on online news headlines in Indonesian. Sources of data in this study were obtained from online mass media in Indonesian. The data collected were analyzed for sentiment using the Long Short-term Memory (LSTM) method, in order to obtain news headlines with positive, negative, and neutral sentiments. The classification obtained from the results of the sentiment analysis process is continued with the topic modeling process using the Latent Dirichlet Allocation (LDA) method and visualized in the form of wordcloud and intertopic distance map (pyLDAVis) to determine the relationship between one topic and another. The result of sentiment analysis is a model with 71.13% of accuracy level and the results of topic modeling are in the form of some topics that are easy to interpret.


2018 ◽  
Vol 3 (335) ◽  
pp. 123-138
Author(s):  
Piotr Młodzianowski

The article presents the results of a study on the influence of online information originating from financial websites on changes in the Warsaw Stock Exchange indexes. The first part is theoretical. It describes the issue of text mining and sentiment analysis and their use in the text analysis process. The next part of the article describes the characteristics of the study. A selection was made of Polish financial websites that may trigger reactions from investors on the Warsaw Stock Exchange. Words occurring on the analysed websites were selected and put into classes. Then the relation between changes in WSE indexes and the frequency of appearance of individual words within the classes was analysed. The last part of the article presents the study results, discusses the possibilities of using them and indicates further areas for research.


Sign in / Sign up

Export Citation Format

Share Document