Assessing Regression-Based Sentiment Analysis Techniques in Financial Texts

Sentiment analysis (SA) is increasing its importance due to the enormous amount of opinionated textual data available today. Most of the researches have investigated different models, feature representation and hyperparameters in SA classification tasks. However, few studies were conducted to evaluate the impact of these features on regression SA tasks. In this paper, we conduct such assessment on a financial domain data set by investigating different feature representations and hyperparameters in two important models -- Support Vector Regression (SVR) and Convolution Neural Networks (CNN). We conclude presenting the most relevant feature representations and hyperparameters and how they impact outcomes on a regression SA task.

Download Full-text

A Hybrid Approach for Sarcasm Detection

Technical Journal ◽

10.3126/tj.v1i1.27581 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1-9

Author(s):

S. Luintel ◽

R.K. Sah ◽

B.R. Lamichhane

Keyword(s):

Random Forest ◽

Sentiment Analysis ◽

Hybrid Approach ◽

Weighted Average ◽

Text Summarization ◽

Support Vector ◽

Markup Language ◽

Textual Data ◽

Enormous Amount ◽

Hypertext Markup Language

There is an excessive growth in user generated textual data due to increment in internet and social media users which includes enormous amount of sarcastic words, emoji, sentences. Sarcasm is a nuanced form of communication where individual states opposite of what is implied which is done in order to insult someone, to show irritation, or to be funny. Sarcasm is considered as one of the most difficult problems in sentiment analysis due to its ambiguous nature. Recognizing sarcasm in the texts can promote many sentiment analysis and text summarization applications. So for addressing the problem of sarcasm many steps have been adopted for sarcasm detection. Different preprocessing techniques such as Hypertext markup language removal, stop words removal, etc. have been done. Similarly, conversion of the emoji and smileys into their textual equivalent has been performed. Most frequent features has been selected and a hybrid cascade and hybrid weighted average approaches which are the combinations of the algorithms random forest, naïve Bayes and support vector machine have been used for sarcasm detection. The comparison of these two approaches on different basis has been done which has shown cascade outperformed weighted approach. Moreover, comparison of cascade approaches in terms of the algorithm placement has also been performed in which random forest has proved to be the best.

Download Full-text

Meta Learning for Few-Shot One-Class Classification

AI ◽

10.3390/ai2020012 ◽

2021 ◽

Vol 2 (2) ◽

pp. 195-208

Author(s):

Gabriel Dahia ◽

Maurício Pamplona Segundo

Keyword(s):

Feature Representation ◽

Support Vector ◽

Support Vector Data Description ◽

Target Class ◽

Feature Representations ◽

Shot Classification ◽

Training Stage ◽

Comparable Performance ◽

Meta Learning ◽

One Class Classification

We propose a method that can perform one-class classification given only a small number of examples from the target class and none from the others. We formulate the learning of meaningful features for one-class classification as a meta-learning problem in which the meta-training stage repeatedly simulates one-class classification, using the classification loss of the chosen algorithm to learn a feature representation. To learn these representations, we require only multiclass data from similar tasks. We show how the Support Vector Data Description method can be used with our method, and also propose a simpler variant based on Prototypical Networks that obtains comparable performance, indicating that learning feature representations directly from data may be more important than which one-class algorithm we choose. We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario, obtaining similar results to the state-of-the-art of traditional one-class classification, and that improves upon that of one-class classification baselines employed in the few-shot setting.

Download Full-text

Classification of standing and sitting phases based on in-socket piezoelectric sensors in a transfemoral amputee

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2018-0249 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Tawfik Yahya ◽

Nur Azah Hamzaid ◽

Sadeeq Ali ◽

Farahiyah Jasni ◽

Hanie Nadia Shasmin

Keyword(s):

Time Domain ◽

Classification Accuracy ◽

Sensory System ◽

Metabolic Energy ◽

Support Vector ◽

Data Set ◽

Transfemoral Amputee ◽

Sit To Stand ◽

Optimal Classifier ◽

The Impact

AbstractA transfemoral prosthesis is required to assist amputees to perform the activity of daily living (ADL). The passive prosthesis has some drawbacks such as utilization of high metabolic energy. In contrast, the active prosthesis consumes less metabolic energy and offers better performance. However, the recent active prosthesis uses surface electromyography as its sensory system which has weak signals with microvolt-level intensity and requires a lot of computation to extract features. This paper focuses on recognizing different phases of sitting and standing of a transfemoral amputee using in-socket piezoelectric-based sensors. 15 piezoelectric film sensors were embedded in the inner socket wall adjacent to the most active regions of the agonist and antagonist knee extensor and flexor muscles, i. e. region with the highest level of muscle contractions of the quadriceps and hamstring. A male transfemoral amputee wore the instrumented socket and was instructed to perform several sitting and standing phases using an armless chair. Data was collected from the 15 embedded sensors and went through signal conditioning circuits. The overlapping analysis window technique was used to segment the data using different window lengths. Fifteen time-domain and frequency-domain features were extracted and new feature sets were obtained based on the feature performance. Eight of the common pattern recognition multiclass classifiers were evaluated and compared. Regression analysis was used to investigate the impact of the number of features and the window lengths on the classifiers’ accuracies, and Analysis of Variance (ANOVA) was used to test significant differences in the classifiers’ performances. The classification accuracy was calculated using k-fold cross-validation method, and 20% of the data set was held out for testing the optimal classifier. The results showed that the feature set (FS-5) consisting of the root mean square (RMS) and the number of peaks (NP) achieved the highest classification accuracy in five classifiers. Support vector machine (SVM) with cubic kernel proved to be the optimal classifier, and it achieved a classification accuracy of 98.33 % using the test data set. Obtaining high classification accuracy using only two time-domain features would significantly reduce the processing time of controlling a prosthesis and eliminate substantial delay. The proposed in-socket sensors used to detect sit-to-stand and stand-to-sit movements could be further integrated with an active knee joint actuation system to produce powered assistance during energy-demanding activities such as sit-to-stand and stair climbing. In future, the system could also be used to accurately predict the intended movement based on their residual limb’s muscle and mechanical behaviour as detected by the in-socket sensory system.

Download Full-text

Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning

Electronics ◽

10.3390/electronics9020374 ◽

2020 ◽

Vol 9 (2) ◽

pp. 374 ◽

Cited By ~ 2

Author(s):

Sudhanshu Kumar ◽

Monika Gahalawat ◽

Partha Pratim Roy ◽

Debi Prosad Dogra ◽

Byung-Gyu Kim

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Age Groups ◽

Modern World ◽

Support Vector ◽

Digital Information ◽

Age And Gender ◽

And Gender ◽

The Impact

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Texts of “internet confessions” as a source for training data set for the research on the sentiment-analysis field

Vestnik NSU Series Linguistics and Intercultural Communication ◽

10.25205/1818-7935-2019-17-3-71-82 ◽

2019 ◽

Vol 17 (3) ◽

pp. 71-82

Author(s):

Anastasia V. Kolmogorova

Keyword(s):

Sentiment Analysis ◽

Narrative Structure ◽

Training Data ◽

Data Set ◽

Financial Reports ◽

Technological Basis ◽

Self Image ◽

Textual Data ◽

Primary Advantage ◽

Multiclass Classifier

The article aims to analyze the validity of Internet confession texts used as a source of training data set for designing computer classifier of Internet texts in Russian according to their emotional tonality. Thus, the classifier, backed by Lövheim’s emotional cube model, is expected to detect eight classes of emotions represented in the text or to assign the text to the emotionally neutral class. The first and one of the most important stages of the classifier creation is the training data set selection. The training data set in Machine Learning is the actual dataset used to train the model for performing various actions. The internet text genres that are traditionally used in sentiment analysis to train two or three tonalities classifiers are twits, films and market reviews, blogs and financial reports. The novelty of our project consists in designing multiclass classifier that requires a new non-trivial training data. As such, we have chosen the texts from public group Overheard in Russian social network VKontakte. As all texts show similarities, we united them under the genre name “Internet confession”. To feature the genre, we applied the method of narrative semiotics describing six positions forming the deep narrative structure of “Internet confession”: Addresser – a person aware of her/his separateness from the society; Addressee – society / public opinion; Subject – a narrator describing his / her emotional state; Object – the person’s self-image; Helper – the person’s frankness; Adversary – the person’s shame. The above mentioned genre features determine its primary advantage – a qualitative one – to be especially focused on the emotionality while more traditional sources of textual data are based on such categories as expressivity (twits) or axiological estimations (all sorts of reviews). The structural analysis of texts under discussion has also demonstrated several advantages due to the technological basis of the Overheard project: the text hashtagging prevents the researcher from submitting the whole collection to the crowdsourcing assessment; its size is optimal for assessment by experts; despite their hyperbolized emotionality, the texts of Internet confession genre share the stylistic features typical of different types of personal internet discourse. However, the narrative character of all Internet confession texts implies some restrictions in their use within sentiment analysis project.

Download Full-text

Analisis Sentimen Data Twitter Tentang Pasangan Capres-Cawapres Pemilu 2019 Dengan Metode Lexicon Based Dan Support Vector Machine

Jurnal Ilmiah FIFO ◽

10.22441/fifo.2019.v11i2.004 ◽

2019 ◽

Vol 11 (2) ◽

pp. 144

Author(s):

Danar Wido Seno ◽

Arief Wibowo

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Vice President ◽

Training Data ◽

Support Vector ◽

New Words ◽

Textual Data ◽

Data Content ◽

Combination Of Methods

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.

Download Full-text

Emotion-Semantic-Enhanced Bidirectional LSTM with Multi-Head Attention Mechanism for Microblog Sentiment Analysis

Information ◽

10.3390/info11050280 ◽

2020 ◽

Vol 11 (5) ◽

pp. 280

Author(s):

Shaoxiu Wang ◽

Yonghua Zhu ◽

Wenjing Gao ◽

Meng Cao ◽

Mengyao Li

Keyword(s):

Sentiment Analysis ◽

Short Term Memory ◽

Syntactic Structure ◽

Contextual Information ◽

Research Field ◽

Attention Mechanism ◽

Feature Representation ◽

Mechanism Model ◽

Hidden Layer ◽

The Impact

The sentiment analysis of microblog text has always been a challenging research field due to the limited and complex contextual information. However, most of the existing sentiment analysis methods for microblogs focus on classifying the polarity of emotional keywords while ignoring the transition or progressive impact of words in different positions in the Chinese syntactic structure on global sentiment, as well as the utilization of emojis. To this end, we propose the emotion-semantic-enhanced bidirectional long short-term memory (BiLSTM) network with the multi-head attention mechanism model (EBILSTM-MH) for sentiment analysis. This model uses BiLSTM to learn feature representation of input texts, given the word embedding. Subsequently, the attention mechanism is used to assign the attentive weights of each words to the sentiment analysis based on the impact of emojis. The attentive weights can be combined with the output of the hidden layer to obtain the feature representation of posts. Finally, the sentiment polarity of microblog can be obtained through the dense connection layer. The experimental results show the feasibility of our proposed model on microblog sentiment analysis when compared with other baseline models.

Download Full-text

Ooredoo Rayek

International Journal of Technology Diffusion ◽

10.4018/ijtd.2020040105 ◽

2020 ◽

Vol 11 (2) ◽

pp. 66-81

Author(s):

Badia Klouche ◽

Sidi Mohamed Benslimane ◽

Sakina Rim Bennabi

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Text Mining ◽

Sentiment Analysis ◽

Experimental Results ◽

Support Vector ◽

Textual Data ◽

New Strategy ◽

Set Up

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.

Download Full-text

Sentiment Analysis of Student’s Opinion on Programming Assessment: Evaluation of Naïve Bayes over Support Vector Machines

International Journal of Innovative Computing ◽

10.11113/ijic.v10n2.278 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Mahmood Umar ◽

Nor Bahiah Ahmad ◽

Anazida Zainal

Keyword(s):

Support Vector Machines ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Experimental Result ◽

Support Vector ◽

Small Data ◽

Data Set ◽

Vector Machines

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset.

Download Full-text