Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi

Author(s):  
Vedika Gupta ◽  
Nikita Jain ◽  
Shubham Shubham ◽  
Agam Madan ◽  
Ankit Chaudhary ◽  
...  

Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.

2018 ◽  
Vol 28 (11n12) ◽  
pp. 1719-1737
Author(s):  
Hao Wang ◽  
Xiaofang Zhang ◽  
Bin Liang ◽  
Qian Zhou ◽  
Baowen Xu

In the field of target-based sentiment analysis, the deep neural model combining attention mechanism is a remarkable success. In current research, it is commonly seen that attention mechanism is combined with Long Short-Term Memory (LSTM) networks. However, such neural network-based architectures generally rely on complex computation and only focus on single target. In this paper, we propose a gated hierarchical LSTM (GH-LSTMs) model which combines regional LSTM and sentence-level LSTM via a gated operation for the task of target-based sentiment analysis. This approach can distinguish different polarities of sentiment of different targets in the same sentence through a regional LSTM. Furthermore, it is able to concentrate on the long-distance dependency of target in the whole sentence via a sentence-level LSTM. The final results of our experiments on multi-domain datasets of two languages from SemEval 2016 indicate that our approach yields better performance than Support Vector Machine (SVM) and several typical neural network models. A case study of some typical examples also makes a supplement to this conclusion.


2021 ◽  
Vol 10 (4) ◽  
pp. 2181-2191
Author(s):  
Devi Munandar ◽  
Andri Fachrur Rozie ◽  
Andria Arisal

Sentiment analysis of short texts is challenging because of its limited context of information. It becomes more challenging to be done on limited resource language like Bahasa Indonesia. However, with various deep learning techniques, it can give pretty good accuracy. This paper explores several deep learning methods, such as multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), and builds combinations of those three architectures. The combinations of those three architectures are intended to get the best of those architecture models. The MLP accommodates the use of the previous model to obtain classification output. The CNN layer extracts the word feature vector from text sequences. Subsequently, the LSTM repetitively selects or discards feature sequences based on their context. Those advantages are useful for different domain datasets. The experiments on sentiment analysis of short text in Bahasa Indonesia show that hybrid models can obtain better performance, and the same architecture can be directly used in another domain-specific dataset.


2020 ◽  
Vol 13 (4) ◽  
pp. 627-640 ◽  
Author(s):  
Avinash Chandra Pandey ◽  
Dharmveer Singh Rajpoot

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.


2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4520
Author(s):  
Luis Lopes Chambino ◽  
José Silvestre Silva ◽  
Alexandre Bernardino

Facial recognition is a method of identifying or authenticating the identity of people through their faces. Nowadays, facial recognition systems that use multispectral images achieve better results than those that use only visible spectral band images. In this work, a novel architecture for facial recognition that uses multiple deep convolutional neural networks and multispectral images is proposed. A domain-specific transfer-learning methodology applied to a deep neural network pre-trained in RGB images is shown to generalize well to the multispectral domain. We also propose a skin detector module for forgery detection. Several experiments were planned to assess the performance of our methods. First, we evaluate the performance of the forgery detection module using face masks and coverings of different materials. A second study was carried out with the objective of tuning the parameters of our domain-specific transfer-learning methodology, in particular which layers of the pre-trained network should be retrained to obtain good adaptation to multispectral images. A third study was conducted to evaluate the performance of support vector machines (SVM) and k-nearest neighbor classifiers using the embeddings obtained from the trained neural network. Finally, we compare the proposed method with other state-of-the-art approaches. The experimental results show performance improvements in the Tufts and CASIA NIR-VIS 2.0 multispectral databases, with a rank-1 score of 99.7% and 99.8%, respectively.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 374 ◽  
Author(s):  
Sudhanshu Kumar ◽  
Monika Gahalawat ◽  
Partha Pratim Roy ◽  
Debi Prosad Dogra ◽  
Byung-Gyu Kim

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.


2021 ◽  
pp. 016555152110065
Author(s):  
Rahma Alahmary ◽  
Hmood Al-Dossari

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7853
Author(s):  
Aleksej Logacjov ◽  
Kerstin Bach ◽  
Atle Kongsvold ◽  
Hilde Bremseth Bårdstu ◽  
Paul Jarle Mork

Existing accelerometer-based human activity recognition (HAR) benchmark datasets that were recorded during free living suffer from non-fixed sensor placement, the usage of only one sensor, and unreliable annotations. We make two contributions in this work. First, we present the publicly available Human Activity Recognition Trondheim dataset (HARTH). Twenty-two participants were recorded for 90 to 120 min during their regular working hours using two three-axial accelerometers, attached to the thigh and lower back, and a chest-mounted camera. Experts annotated the data independently using the camera’s video signal and achieved high inter-rater agreement (Fleiss’ Kappa =0.96). They labeled twelve activities. The second contribution of this paper is the training of seven different baseline machine learning models for HAR on our dataset. We used a support vector machine, k-nearest neighbor, random forest, extreme gradient boost, convolutional neural network, bidirectional long short-term memory, and convolutional neural network with multi-resolution blocks. The support vector machine achieved the best results with an F1-score of 0.81 (standard deviation: ±0.18), recall of 0.85±0.13, and precision of 0.79±0.22 in a leave-one-subject-out cross-validation. Our highly professional recordings and annotations provide a promising benchmark dataset for researchers to develop innovative machine learning approaches for precise HAR in free living.


Author(s):  
Ralph Sherwin A. Corpuz ◽  

Analyzing natural language-based Customer Satisfaction (CS) is a tedious process. This issue is practically true if one is to manually categorize large datasets. Fortunately, the advent of supervised machine learning techniques has paved the way toward the design of efficient categorization systems used for CS. This paper presents the feasibility of designing a text categorization model using two popular and robust algorithms – the Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) Neural Network, in order to automatically categorize complaints, suggestions, feedbacks, and commendations. The study found that, in terms of training accuracy, SVM has best rating of 98.63% while LSTM has best rating of 99.32%. Such results mean that both SVM and LSTM algorithms are at par with each other in terms of training accuracy, but SVM is significantly faster than LSTM by approximately 35.47s. The training performance results of both algorithms are attributed on the limitations of the dataset size, high-dimensionality of both English and Tagalog languages, and applicability of the feature engineering techniques used. Interestingly, based on the results of actual implementation, both algorithms are found to be 100% effective in accurately predicting the correct CS categories. Hence, the extent of preference between the two algorithms boils down on the available dataset and the skill in optimizing these algorithms through feature engineering techniques and in implementing them toward actual text categorization applications.


Sign in / Sign up

Export Citation Format

Share Document