An Empirical Study of Skip-Gram Features and Regularization for Learning on Sentiment Analysis

Author(s):  
Cheng Li ◽  
Bingyu Wang ◽  
Virgil Pavlu ◽  
Javed A. Aslam
2021 ◽  
Vol 9 (2) ◽  
pp. 1051-1052
Author(s):  
K. Kavitha, Et. al.

Sentiments is the term of opinion or views about any topic expressed by the people through a source of communication. Nowadays social media is an effective platform for people to communicate and it generates huge amount of unstructured details every day. It is essential for any business organization in the current era to process and analyse the sentiments by using machine learning and Natural Language Processing (NLP) strategies. Even though in recent times the deep learning strategies are becoming more familiar due to higher capabilities of performance. This paper represents an empirical study of an application of deep learning techniques in Sentiment Analysis (SA) for sarcastic messages and their increasing scope in real time. Taxonomy of the sentiment analysis in recent times and their key terms are also been highlighted in the manuscript. The survey concludes the recent datasets considered, their key contributions and the performance of deep learning model applied with its primary purpose like sarcasm detection in order to describe the efficiency of deep learning frameworks in the domain of sentimental analysis.


2016 ◽  
Vol 23 (4) ◽  
pp. 855-869 ◽  
Author(s):  
Jianqiang Hao ◽  
Hongying Dai

Purpose Security breaches have been arising issues that cast a large amount of financial losses and social problems to society and people. Little is known about how social media could be used a surveillance tool to track messages related to security breaches. This paper aims to fill the gap by proposing a framework in studying the social media surveillance on security breaches along with an empirical study to shed light on public attitudes and concerns. Design/methodology/approach In this study, the authors propose a framework for real-time monitoring of public perception to security breach events using social media metadata. Then, an empirical study was conducted on a sample of 1,13,340 related tweets collected in August 2015 on Twitter. By text mining a large number of unstructured, real-time information, the authors extracted topics, opinions and knowledge about security breaches from the general public. The time series analysis suggests significant trends for multiple topics and the results from sentiment analysis show a significant difference among topics. Findings The study confirms that social media monitoring provides a supplementary tool for the traditional surveys which are costly and time-consuming to track security breaches. Sentiment score and impact factors are good predictors of real-time public opinions and attitudes to security breaches. Unusual patterns/events of security breaches can be detected in the early stage, which could prevent further destruction by raising public awareness. Research limitations/implications The sample data were collected from a short period of time on Twitter. Future study could extend the research to a longer period of time or expand key words search to observe the sentiment trend, especially before and after large security breaches, and to track various topics across time. Practical implications The findings could be useful to inform public policy and guide companies responding to consumer security breaches in shaping public perception. Originality/value This study is the first of its kind to undertake the analysis of social media (Twitter) content and sentiment on public perception to security breaches.


2021 ◽  
Author(s):  
Samsiah Ahmad ◽  
Puteri Nur Sarah Nabila Mahdi ◽  
Zalikha Zulkifli ◽  
Lily Marlia Abdul Latif ◽  
Mohamed Imran Mohamed Ariff

Author(s):  
Samatcha Thanangthanakij ◽  
Eakasit Pacharawongsakda ◽  
Nattapong Tongtep ◽  
Pakinee Aimmanee ◽  
Thanaruk Theeramunkong

2021 ◽  
Vol 17 (1) ◽  
pp. 45-53
Author(s):  
Le Hong Trang ◽  
Tran Duong Huy ◽  
Anh Ngoc Le

Purpose Pricing on the online booking systems is a difficult task for the host, the systems usually set the prices that are lower than the general premises and quality, and that only gives benefits to the system by easily attracting the customer to use the service. The setting price of the new accommodation is often based on location, the number of beds, type of house and so on. The main problem is to predict the most reasonable price for the host. This paper aims to study the use of machine learning and sentiment analysis for predicting the price of online booking systems. Design/methodology/approach In particular, an empirical study is performed first for some well-known classification models for the problems. The authors then propose to apply k-means, a clustering technique, together with Gradient Boost and XGBoost models to improve the prediction performance. Experiments are conducted and tested for real Airbnb data sets collected in London City. Findings Experimental results are given and compared to show that the authors’ method outperforms to an updated method. Originality/value The authors use k-means and sampling together with Gradient Boost and XGBoost models to improve the prediction performance.


Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 845
Author(s):  
Danbi Cho ◽  
Hyunyoung Lee ◽  
Seungshik Kang

It is important how the token unit is defined in a sentence in natural language process tasks, such as text classification, machine translation, and generation. Many studies recently utilized the subword tokenization in language models such as BERT, KoBERT, and ALBERT. Although these language models achieved state-of-the-art results in various NLP tasks, it is not clear whether the subword tokenization is the best token unit for Korean sentence embedding. Thus, we carried out sentence embedding based on word, morpheme, subword, and submorpheme, respectively, on Korean sentiment analysis. We explored the two-sentence representation methods for sentence embedding: considering the order of tokens in a sentence and not considering the order. While inputting a sentence, which is decomposed by token unit, to the two-sentence representation methods, we construct the sentence embedding with various tokenizations to find the most effective token unit for Korean sentence embedding. In our work, we confirmed: the robustness of the subword unit for out-of-vocabulary (OOV) problems compared to other token units, the disadvantage of replacing whitespace with a particular symbol in the sentiment analysis task, and that the optimal vocabulary size is 16K in subword and submorpheme tokenization. We empirically noticed that the subword, which was tokenized by a vocabulary size of 16K without replacement of whitespace, was the most effective for sentence embedding on the Korean sentiment analysis task.


2008 ◽  
Vol 34 (4) ◽  
pp. 2622-2629 ◽  
Author(s):  
S TAN ◽  
J ZHANG

Sign in / Sign up

Export Citation Format

Share Document