Comparison of Accuracy between Long Short-Term Memory-Deep Learning and Multinomial Logistic Regression-Machine Learning in Sentiment Analysis on Twitter

Setiap orang mempunyai pendapat atau opini terhadap suatu produk, tokoh masyarakat, atau pun sebuah kebijakan pemerintah yang tersebar di media sosial. Pengolahan data opini itu di sebut dengan sentiment analysis. Dalam pengolahan data opini yang besar tersebut tidak hanya cukup menggunakan machine learning, namun bisa juga menggunakan deep learning yang di kombinasikan dengan teknik NLP (Natural Languange Processing). Penelitian ini membandingkan beberapa model deep learning seperti CNN (Convolutional Neural Network), RNN (Recurrent Neural Networks), LSTM (Long Short-Term Memory) dan beberapa variannya untuk mengolah data sentiment analysis dari review produk amazon dan yelp.

Download Full-text

A semiautomatic annotation approach for sentiment analysis

Journal of Information Science ◽

10.1177/01655515211006594 ◽

2021 ◽

pp. 016555152110065

Author(s):

Rahma Alahmary ◽

Hmood Al-Dossari

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Support Vector ◽

Short Term ◽

Term Memory ◽

Annotation Process ◽

Learning Classifiers ◽

Long Short Term Memory

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.

Download Full-text

COMPARATIVE ANALYSIS AND EVALUATION OF THE APPLICATION OF DEEP LEARNING TECHNIQUES TO CYBERSECURITY DATASETS

DYNA INGENIERIA E INDUSTRIA ◽

10.6036/10007 ◽

2021 ◽

Vol 96 (5) ◽

pp. 528-533

Author(s):

XAVIER LARRIVA NOVO ◽

MARIO VEGA BARBAS ◽

VICTOR VILLAGRA ◽

JULIO BERROCAL

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

New Technologies ◽

Short Term Memory ◽

Machine Learning Techniques ◽

Short Term ◽

Term Memory ◽

Learning Techniques ◽

Long Short Term Memory

Cybersecurity has stood out in recent years with the aim of protecting information systems. Different methods, techniques and tools have been used to make the most of the existing vulnerabilities in these systems. Therefore, it is essential to develop and improve new technologies, as well as intrusion detection systems that allow detecting possible threats. However, the use of these technologies requires highly qualified cybersecurity personnel to analyze the results and reduce the large number of false positives that these technologies presents in their results. Therefore, this generates the need to research and develop new high-performance cybersecurity systems that allow efficient analysis and resolution of these results. This research presents the application of machine learning techniques to classify real traffic, in order to identify possible attacks. The study has been carried out using machine learning tools applying deep learning algorithms such as multi-layer perceptron and long-short-term-memory. Additionally, this document presents a comparison between the results obtained by applying the aforementioned algorithms and algorithms that are not deep learning, such as: random forest and decision tree. Finally, the results obtained are presented, showing that the long-short-term-memory algorithm is the one that provides the best results in relation to precision and logarithmic loss.

Download Full-text

ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms

Journal of Information Science ◽

10.1177/01655515211007746 ◽

2021 ◽

pp. 016555152110077

Author(s):

Şura Genç ◽

Elif Surer

Keyword(s):

Machine Learning ◽

Social Media ◽

Logistic Regression ◽

Random Forest ◽

Short Term Memory ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.

Download Full-text

Previsão de casos de dengue através de Machine Learning e Deep Learning: uma revisão sistemática

Research Society and Development ◽

10.33448/rsd-v10i11.19347 ◽

2021 ◽

Vol 10 (11) ◽

pp. e33101119347

Author(s):

Ewethon Dyego de Araujo Batista ◽

Wellington Candeia de Araújo ◽

Romeryto Vieira Lira ◽

Laryssa Izabel de Araujo Batista

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Short Term Memory ◽

Mean Absolute Error ◽

Absolute Error ◽

Support Vector ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

De Medicina

Introdução: a dengue é uma arbovirose causada pelo vírus DENV e transmitida para o homem através do mosquito Aedes aegypti. Atualmente, não existe uma vacina eficaz para combater todas as sorologias do vírus. Diante disso, o combate à doença se volta para medidas preventivas contra a proliferação do mosquito. Os pesquisadores estão utilizando Machine Learning (ML) e Deep Learning (DL) como ferramentas para prever casos de dengue e ajudar os governantes nesse combate. Objetivo: identificar quais técnicas e abordagens de ML e de DL estão sendo utilizadas na previsão de dengue. Métodos: revisão sistemática realizada nas bases das áreas de Medicina e de Computação com intuito de responder as perguntas de pesquisa: é possível realizar previsões de casos de dengue através de técnicas de ML e de DL, quais técnicas são utilizadas, onde os estudos estão sendo realizados, como e quais dados estão sendo utilizados? Resultados: após realizar as buscas, aplicar os critérios de inclusão, exclusão e leitura aprofundada, 14 artigos foram aprovados. As técnicas Random Forest (RF), Support Vector Regression (SVR), e Long Short-Term Memory (LSTM) estão presentes em 85% dos trabalhos. Em relação aos dados, na maioria, foram utilizados 10 anos de dados históricos da doença e informações climáticas. Por fim, a técnica Root Mean Absolute Error (RMSE) foi a preferida para mensurar o erro. Conclusão: a revisão evidenciou a viabilidade da utilização de técnicas de ML e de DL para a previsão de casos de dengue, com baixa taxa de erro e validada através de técnicas estatísticas.

Download Full-text

Algoritma LSTM-CNN untuk Binary Klasifikasi dengan Word2vec pada Media Online

Creative Information Technology Journal ◽

10.24076/citec.2021v8i1.264 ◽

2021 ◽

Vol 8 (1) ◽

pp. 64

Author(s):

Dedi Tri Hermanto ◽

Arief Setyanto ◽

Emha Taufiq Luthfi

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Direct Impact ◽

Test Results ◽

Short Term ◽

Term Memory ◽

Traditional Markets ◽

Interesting Topic ◽

Long Short Term Memory

Media online banyak menghasilkan berbagai macam berita, baik ekonomi, politik, kesehatan, olahraga atau ilmu pengetahuan. Di antara itu semua, ekonomi adalah salah satu topik menarik untuk dibahas. Ekonomi memiliki dampak langsung kepada warga negara, perusahaan, bahkan pasar tradisional tergantung pada kondisi ekonomi di suatu negara. Sentimen yang terkandung dalam berita dapat mempengaruhi pandangan masyarakat terhadap suatu hal atau kebijakan pemerintah. Topik ekonomi adalah bahasan yang menarik untuk dilakukan penelitian karena memiliki dampak langsung kepada masyarakat Indonesia. Namun, masih sedikit penelitian yang menerapkan metode deep learning yaitu Long Short-Term Memory dan CNN untuk analisis sentimen pada artikel finance di Indonesia. Penelitian ini bertujuan untuk melakukan pengklasifikasian judul berita berbahasa Indonesia berdasarkan sentimen positif, negatif dengan menggunakan metode LSTM, LSTM-CNN, CNN-LSTM. Dataset yang digunakan adalah data judul artikel berbahasa Indonesia yang diambil dari situs Detik Finance. Berdasarkan hasil pengujian memperlihatkan bahwa metode LSTM, LSTM-CNN, CNN-LSTM memiliki hasil akurasi sebesar, 62%, 65% dan 74%.Kata Kunci — LSTM, sentiment analysis, CNNOnline media produce a lot of various kinds of news, be it economics, politics, health, sports or science. Among them, economics is one interesting topic to discuss. The economy has a direct impact on citizens, companies, and even traditional markets depending on the economic conditions in a country. The sentiment contained in the news can influence people's views on a matter or government policy. The topic of economics is an interesting topic for research because it has a direct impact on Indonesian society. However, there are still few studies that apply deep learning methods, namely Long Short-Term Memory and CNN for sentiment analysis on finance articles in Indonesia. This study aims to classify Indonesian news headlines based on positive and negative sentiments using the LSTM, LSTM-CNN, CNN-LSTM methods. The dataset used is data on Indonesian language article titles taken from the Detik Finance website. Based on the test results, it shows that the LSTM, LSTM-CNN, CNN-LSTM methods have an accuracy of, 62%, 65% and 74%.Keywords — LSTM, sentiment analysis, CNN

Download Full-text

Evaluation of Sentiment Analysis via Word Embedding and RNN Variants for Amazon Online Reviews

Mathematical Problems in Engineering ◽

10.1155/2021/5536560 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Najla M. Alharbi ◽

Norah S. Alghamdi ◽

Eman H. Alkhammash ◽

Jehad F. Al Amri

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Online Reviews ◽

Word Embedding ◽

Learning Approaches ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Consumer feedback is highly valuable in business to assess their performance and is also beneficial to customers as it gives them an idea of what to expect from new products. In this research, the aim is to evaluate different deep learning approaches to accurately predict the opinion of customers based on mobile phone reviews obtained from Amazon.com. The prediction is based on analysing these reviews and categorizing them as positive, negative, or neutral. Different deep learning algorithms have been implemented and evaluated such as simple RNN with its four variants, namely, Long Short-Term Memory Networks (LRNN), Group Long Short-Term Memory Networks (GLRNN), gated recurrent unit (GRNN), and update recurrent unit (UGRNN). All evaluated algorithms are combined with word embedding as feature extraction approach for sentiment analysis including Glove, word2vec, and FastText by Skip-grams. The five different algorithms with the three feature extraction methods are evaluated based on accuracy, recall, precision, and F1-score for both balanced and unbalanced datasets. For the unbalanced dataset, it was found that the GLRNN algorithms with FastText feature extraction scored the highest accuracy of 93.75%. This result achieved the highest accuracy on this dataset when compared with other methods mentioned in the literature. For the balanced dataset, the highest achieved accuracy was 88.39% by the LRNN algorithm.

Download Full-text

Development of a Model for Predicting the Direction of Daily Price Changes in the Forex Market Using Long Short-Term Memory

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.1.1015 ◽

2021 ◽

Vol 11 (1) ◽

pp. 61-67

Author(s):

Watthana Pongsena ◽

◽

Prakaidoy Sitsayabut ◽

Nittaya Kerdprasop ◽

Kittisak Kerdprasop ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Short Term Memory ◽

Series Data ◽

Short Term ◽

Price Changes ◽

Term Memory ◽

The Future ◽

Long Short Term Memory

Forex is the largest global financial market in the world. Traditionally, fundamental and technical analysis are strategies that the Forex traders often used. Nowadays, advanced computational technology, Artificial Intelligence (AI) has played a significant role in the financial domain. Various applications based on AI technologies particularly machine learning and deep learning have been constantly developed. As the historical data of the Forex are time-series data where the values from the past affect the values that will appear in the future. Several existing works from other domains of applications have proved that the Long-Short Term Memory (LSTM), which is a particular kind of deep learning that can be applied to modeling time series, provides better performance than traditional machine learning algorithms. In this paper, we aim to develop a powerful predictive model targeting to predicts the daily price changes of the currency pairwise in the Forex market using LSTM. Besides, we also conduct an extensive experiment with the intention to demonstrate the effect of various factors contributing to the performance of the model. The experimental results show that the optimized LSTM model accurately predicts the direction of the future price up to 61.25 percent.

Download Full-text

Long Short Term Memory (LSTM) based Deep Learning for Sentiment Analysis of English and Spanish Data

2020 International Conference on Computational Performance Evaluation (ComPE) ◽

10.1109/compe49325.2020.9200054 ◽

2020 ◽

Author(s):

Baidya Nath Saha ◽

Apurbalal Senapati

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Sentiment Analysis on Twitter Data by Using Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM)

10.21203/rs.3.rs-247154/v1 ◽

2021 ◽

Author(s):

Usha Devi G ◽

Priyan M K ◽

Gokulnath Chandra Babu ◽

Gayathri Karthick

Keyword(s):

Neural Network ◽

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Twitter Data ◽

Learning Techniques ◽

Stop Word ◽

Long Short Term Memory

Abstract Twitter sentiment analysis is an automated process of analyzing the text data which determining the opinion or feeling of public tweets from the various fields. For example, in marketing field, political field huge number of tweets is posting with hash tags every moment via internet from one user to another user. This sentiment analysis is a challenging task for the researchers mainly to correct interpretation of context in which certain tweet words are difficult to evaluate what truly is negative and positive statement from the huge corpus of tweet data. This problem violates the integrity of the system and the user reliability can be significantly reduced. In this paper, we identify the each tweet word and we are assigning a meaning into it. The feature work is combined with tweet words, word2vec, stop words and integrated into the deep learning techniques of Convolution neural network model and Long short Term Memory, these algorithms can identify the pattern of stop word counts with its own strategy. Those two models are well trained and applied for IMDB dataset which contains 50,000 movie reviews. With huge amount of twitter data is processed for predicting the sentimental tweets for classification. With the proposed methodology, the samples are experimentally collected from the real-time environment can be discriminated well and the efficacy of the system is improved. The result of Deep Learning algorithms aims to rate the review tweets and also able to identify movie review with testing accuracy as 87.74% and 88.02%.

Download Full-text