Emotion Classification on Twitter Data Using Word Embedding and Lexicon Based Approach

This article describes how recent advances in computing have led to an increase in the generation of data in fields such as social media, medical, power and others. With the rapid increase in internet users, social media has given power for sentiment analysis or opinion mining. It is a highly challenging task for storing, querying and analyzing such types of data. This article aims at providing a solution to store, query and analyze streaming data using Apache Kafka as the platform and twitter data as an example for analysis. A three-way classification method is proposed for sentimental analysis of twitter data that combines both the approaches for knowledge-based and machine-learning using three stages namely emotion classification, word classification and sentiment classification. The hybrid three-way classification approach was evaluated using a sample of five query strings on twitter and compared with existing emotion classifier, polarity classifier and Naïve Bayes classifier for sentimental analysis. The accuracy of the results of the proposed approach is superior when compared to existing approaches.

Download Full-text

Emotion classification on youtube comments using word embedding

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) ◽

10.1109/icaicta.2017.8090986 ◽

2017 ◽

Cited By ~ 7

Author(s):

Julio Savigny ◽

Ayu Purwarianti

Keyword(s):

Word Embedding ◽

Emotion Classification

Download Full-text

Emotion Classification of Twitter Data Using an Approach Based on Ranking

Research in Computing Science ◽

10.13053/rcs-147-11-4 ◽

2018 ◽

Vol 147 (11) ◽

pp. 45-52

Author(s):

Cecilia Reyes-Peña ◽

David Pinto-Avendaño ◽

Darnes Vilariño-Ayala

Keyword(s):

Emotion Classification ◽

Twitter Data

Download Full-text

Real-Time Streaming Data Analysis Using a Three-Way Classification Method for Sentimental Analysis

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch069 ◽

2020 ◽

pp. 1377-1390

Author(s):

Srinidhi Hiriyannaiah ◽

G.M. Siddesh ◽

K.G. Srinivasa

Keyword(s):

Social Media ◽

Opinion Mining ◽

Streaming Data ◽

Classification Method ◽

Emotion Classification ◽

Knowledge Based ◽

Twitter Data ◽

Internet Users ◽

Word Classification ◽

Three Stages

This article describes how recent advances in computing have led to an increase in the generation of data in fields such as social media, medical, power and others. With the rapid increase in internet users, social media has given power for sentiment analysis or opinion mining. It is a highly challenging task for storing, querying and analyzing such types of data. This article aims at providing a solution to store, query and analyze streaming data using Apache Kafka as the platform and twitter data as an example for analysis. A three-way classification method is proposed for sentimental analysis of twitter data that combines both the approaches for knowledge-based and machine-learning using three stages namely emotion classification, word classification and sentiment classification. The hybrid three-way classification approach was evaluated using a sample of five query strings on twitter and compared with existing emotion classifier, polarity classifier and Naïve Bayes classifier for sentimental analysis. The accuracy of the results of the proposed approach is superior when compared to existing approaches.

Download Full-text

Sentiment-Aware Word Embedding for Emotion Classification

Applied Sciences ◽

10.3390/app9071334 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1334 ◽

Cited By ~ 4

Author(s):

Xingliang Mao ◽

Shuai Chang ◽

Jinjing Shi ◽

Fangfang Li ◽

Ronghua Shi

Keyword(s):

Language Processing ◽

Word Embedding ◽

Emotional Word ◽

Word Embeddings ◽

Emotion Classification ◽

Emotional Information ◽

Input Text ◽

Classification Tasks ◽

Emotional Knowledge ◽

Emotional Lexicon

Word embeddings are effective intermediate representations for capturing semantic regularities between words in natural language processing (NLP) tasks. We propose sentiment-aware word embedding for emotional classification, which consists of integrating sentiment evidence within the emotional embedding component of a term vector. We take advantage of the multiple types of emotional knowledge, just as the existing emotional lexicon, to build emotional word vectors to represent emotional information. Then the emotional word vector is combined with the traditional word embedding to construct the hybrid representation, which contains semantic and emotional information as the inputs of the emotion classification experiments. Our method maintains the interpretability of word embeddings, and leverages external emotional information in addition to input text sequences. Extensive results on several machine learning models show that the proposed methods can improve the accuracy of emotion classification tasks.

Download Full-text

Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data

Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16 ◽

10.1145/2911451.2914729 ◽

2016 ◽

Cited By ~ 16

Author(s):

Anjie Fang ◽

Craig Macdonald ◽

Iadh Ounis ◽

Philip Habel

Keyword(s):

Word Embedding ◽

Twitter Data

Download Full-text

Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification

Cognitive Computation ◽

10.1007/s12559-015-9319-y ◽

2015 ◽

Vol 7 (2) ◽

pp. 226-240 ◽

Cited By ~ 32

Author(s):

Ruifeng Xu ◽

Tao Chen ◽

Yunqing Xia ◽

Qin Lu ◽

Bin Liu ◽

...

Keyword(s):

Word Embedding ◽

Emotion Classification

Download Full-text

Effective Word Embedding for Twitter Data

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2018.43.11.1903 ◽

2018 ◽

Vol 43 (11) ◽

pp. 1903-1910

Author(s):

Inhwan Kim ◽

Beakcheol Jang

Keyword(s):

Word Embedding ◽

Twitter Data

Download Full-text

Eliminasi Non-Topic Menggunakan Pemodelan Topik untuk Peringkasan Otomatis Data Tweet dengan Konteks Covid-19

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0814324 ◽

2021 ◽

Vol 8 (1) ◽

pp. 199

Author(s):

Putri Damayanti ◽

Diana Purwitasari ◽

Nanik Suciati

Keyword(s):

Topic Modeling ◽

Modeling Method ◽

Text Summarization ◽

Word Embedding ◽

Test Results ◽

Automatic Summarization ◽

The Public ◽

Twitter Data ◽

Processing Data ◽

Embedding Methods

Akun twitter, seperti Suara Surabaya, dapat membantu menyebarkan informasi tentang COVID-19 meskipun ada bahasan lainnya seperti kecelakaan, kemacetan atau topik lain. Peringkasan teks dapat diimplementasikan pada kasus pembacaan data twitter karena banyaknya jumlah tweet yang tersedia, sehingga akan mempermudah dalam memperoleh informasi penting terkini terkait COVID-19. Jumlah variasi bahasan pada teks tweet mengakibatkan hasil ringkasan yang kurang baik. Oleh karena itu dibutuhkan adanya eliminasi tweet yang tidak berkaitan dengan konteks sebelum dilakukan peringkasan. Kontribusi penelitian ini adalah adanya metode pemodelan topik sebagai bagian tahapan dalam serangkaian proses eliminasi data. Metode pemodelan topik sebagai salah satu teknik eliminasi data dapat digunakan dalam berbagai kasus namun pada penelitian ini difokuskan pada COVID-19. Tujuannya adalah untuk mempermudah masyarakat memperoleh informasi terkini secara ringkas. Tahapan yang dilakukan adalah pra-pemrosesan, eliminasi data menggunakan pemodelan topik dan peringkasan otomatis. Penelitian ini menggunakan kombinasi beberapa metode word embedding, pemodelan topik dan peringkasan otomatis sebagai pembanding. Ringkasan diuji menggunakan metode ROUGE dari setiap kombinasi untuk ditemukan kombinasi terbaik dari penelitian ini. Hasil pengujian menunjukkan kombinasi metode Word2Vec, LSI dan TextRank memiliki nilai ROUGE terbaik yaitu 0.67. Sedangkan kombinasi metode TFIDF, LDA dan Okapi BM25 memiliki nilai ROUGE terendah yaitu 0.35. AbstractTwitter accounts, such as Suara Surabaya, can help spread information about COVID-19 even though there are other topics such as accidents, traffic jams or other topics. Text summarization can be implemented in the case of reading Twitter data because of the large number of tweets available, making it easier to obtain the latest important information related to COVID-19. The number of discussion variations in the tweet text results in poor summary results. Therefore, it is necessary to eliminate tweets that are not related to the context before summarization is carried out. The contribution to this research is the topic modeling method as part of a series of data elimination processes. The topic modeling method as a data elimination technique can be used in various cases, but this research focuses on COVID-19. The aim is to make it easier for the public to obtain current information in a concise manner. The steps taken in this study were pre-processing, data elimination using topic modeling and automatic summarization. This study uses a combination of several word embedding methods, topic modeling and automatic summarization as a comparison. The summary is tested using the ROUGE method of each combination to find the best combination of this study. The test results show that the combination of Word2Vec, LSI and TextRank methods has the best ROUGE value, 0.67. While the combination of TFIDF, LDA and Okapi BM25 methods has the lowest ROUGE value, 0.35.

Download Full-text