Empirical Evaluation of the BCOC Method on Multi-Domain Sentiment Analysis Data Sets

Topic features for machine learning-based sentiment analysis in Indonesian tweets

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-04-2018-0057 ◽

2019 ◽

Vol 12 (1) ◽

pp. 70-81 ◽

Cited By ~ 1

Author(s):

Hendri Murfi ◽

Furida Lusi Siagian ◽

Yudi Satria

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Analysis Data ◽

Data Representation ◽

Support Vector ◽

Data Sets ◽

Content Type ◽

Textual Data ◽

Standard Word

Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis

Artificial Intelligence Review ◽

10.1007/s10462-017-9584-0 ◽

2017 ◽

Vol 52 (3) ◽

pp. 2081-2097 ◽

Cited By ~ 4

Author(s):

Carlos Gómez-Rodríguez ◽

Iago Alonso-Alonso ◽

David Vilares

Keyword(s):

Sentiment Analysis ◽

Empirical Evaluation ◽

Syntactic Parsing ◽

Rule Based

Download Full-text

A data-driven neural network architecture for sentiment analysis

Data Technologies and Applications ◽

10.1108/dta-03-2018-0017 ◽

2019 ◽

Vol 53 (1) ◽

pp. 2-19 ◽

Cited By ~ 1

Author(s):

Erion Çano ◽

Maurizio Morisio

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Network Architecture ◽

Network Models ◽

Data Sets ◽

Feature Maps ◽

Neural Network Architecture ◽

Neural Network Models ◽

Content Type ◽

Max Pooling

Purpose The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough data for feeding such networks, optimize their parameters, and make the right design choices when constructing network architectures. The purpose of this paper is to present the creation steps of two big data sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network architecture are also compared. Design/methodology/approach The intention was to spot any important patterns that can serve as guidelines for parameter optimization of similar models. The authors also wanted to identify architecture design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a series of experiments with neural architectures of various configurations. Findings The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for capturing relevant text features. Also, max-pooling region size should be adapted to the length of text documents for producing the best feature maps. Originality/value Top results the authors got are obtained with feature maps of lengths 6–18. An improvement on future neural network models for sentiment analysis could be generating sentiment polarity prediction of documents using aggregation of predictions on smaller excerpt of the entire text.

Download Full-text

Diurnal Variations in Lower-Tropospheric Wind over Japan Part II: Analysis of Japan Meteorological Agency Mesoscale Analysis Data and Four Global Reanalysis Data Sets

Journal of the Meteorological Society of Japan Ser II ◽

10.2151/jmsj.2010-306 ◽

2010 ◽

Vol 88 (3) ◽

pp. 349-372 ◽

Cited By ~ 3

Author(s):

Takatoshi SAKAZAKI ◽

Masatomo FUJIWARA

Keyword(s):

Analysis Data ◽

Reanalysis Data ◽

Diurnal Variations ◽

Japan Meteorological Agency ◽

Data Sets ◽

Mesoscale Analysis ◽

Global Reanalysis ◽

Tropospheric Wind

Download Full-text

ASPECT BASED SENTIMENT ANALYSIS DATA KUESIONER DI RUMAH SAKIT MUHAMMADIYAH LAMONGAN MENGGUNAKAN ALGORITMA K-NN.

JOUTICA ◽

10.30736/jti.v6i2.677 ◽

2021 ◽

Vol 6 (2) ◽

pp. 506

Author(s):

Mustain Mustain Mustain

Keyword(s):

Vector Space ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Vector Space Model ◽

Analysis Data ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Space Model

Kesulitan untuk mengorganisir data kuesioner yang bersifat konvensional melatarbelakangi penelitian ini. Oleh karena itu dibuat sistem yang memudahkan pengelompokan data kuesioner secara otomatis yang lengkap dengan sentimen yang terkandung didalamnya. Dataset yang digunakan dalam penelitian ini adalah data kuesioner rumah sakit Muhammadiyah lamongan. Penelitian ini hanya menangani kuesioner yang berbentuk teks. Data dengan fisik kertas direkap kemudian diinput ke database lengkap dengan kategori unit kerja dan sentiment. Selanjutnya dataset tersebut di dilakukan pre-prosesing yang meliputi penanganan negasi case folding, tokenizing, filtering dan stemming. Sebagai data uji komentar dari kuesioner akan dilakukan pre-prosesing selanjutnya dihitung tingkat kemiripan document dengan menggunakan metode K- Nearest Neighbor dan Vector Space Model. Jumlah data yang ditangani mempengaruhi performa system terutama dari akurasi dan kecepatan pada saat proses klasifikasi. Hasil dari sistem yang dibuat berupa ranking dokumen yang paling mirip dengan dataset berdasarkan urutan nilai cosine similarity. Ujicoba klasifikasi berdasarkan kelas kategori menghasilkan nilai akurasi 91 %. Ujicoba berdasarkan Kelas Sentimen sebesar 94 %.dari kombinasi keduanya system berhasil mendapat akurasi sebesar 86 %

Download Full-text

SKT

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476287 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2369-2382

Author(s):

Monica Chiosa ◽

Thomas B. Preußer ◽

Gustavo Alonso

Keyword(s):

Frequency Distribution ◽

Empirical Evaluation ◽

Large Data ◽

Cloud Service ◽

Data Sets ◽

Data Set ◽

Single Pass ◽

Trade Offs ◽

Significant Performance ◽

Spatial Architecture

Data analysts often need to characterize a data stream as a first step to its further processing. Some of the initial insights to be gained include, e.g., the cardinality of the data set and its frequency distribution. Such information is typically extracted by using sketch algorithms, now widely employed to process very large data sets in manageable space and in a single pass over the data. Often, analysts need more than one parameter to characterize the stream. However, computing multiple sketches becomes expensive even when using high-end CPUs. Exploiting the increasing adoption of hardware accelerators, this paper proposes SKT , an FPGA-based accelerator that can compute several sketches along with basic statistics (average, max, min, etc.) in a single pass over the data. SKT has been designed to characterize a data set by calculating its cardinality, its second frequency moment, and its frequency distribution. The design processes data streams coming either from PCIe or TCP/IP, and it is built to fit emerging cloud service architectures, such as Microsoft's Catapult or Amazon's AQUA. The paper explores the trade-offs of designing sketch algorithms on a spatial architecture and how to combine several sketch algorithms into a single design. The empirical evaluation shows how SKT on an FPGA offers a significant performance gain over high-end, server-class CPUs.

Download Full-text

An Efficient Framework for Vietnamese Sentiment Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200579 ◽

2020 ◽

Author(s):

Cuong V. Nguyen ◽

Khiem H. Le ◽

Anh M. Tran ◽

Binh T. Nguyen

Keyword(s):

Product Quality ◽

Sentiment Analysis ◽

New Products ◽

Classification Problem ◽

Research Community ◽

Sentiment Classification ◽

Experimental Results ◽

Data Sets ◽

Online Retailers

With the booming development of E-commerce platforms in many counties, there is a massive amount of customers’ review data in different products and services. Understanding customers’ feedbacks in both current and new products can give online retailers the possibility to improve the product quality, meet customers’ expectations, and increase the corresponding revenue. In this paper, we investigate the Vietnamese sentiment classification problem on two datasets containing Vietnamese customers’ reviews. We propose eight different approaches, including Bi-LSTM, Bi-LSTM + Attention, Bi-GRU, Bi-GRU + Attention, Recurrent CNN, Residual CNN, Transformer, and PhoBERT, and conduct all experiments on two datasets, AIVIVN 2019 and our dataset self-collected from multiple Vietnamese e-commerce websites. The experimental results show that all our proposed methods outperform the winning solution of the competition “AIVIVN 2019 Sentiment Champion” with a significant margin. Especially, Recurrent CNN has the best performance in comparison with other algorithms in terms of both AUC (98.48%) and F1-score (93.42%) in this competition dataset and also surpasses other techniques in our dataset collected. Finally, we aim to publish our codes, and these two data-sets later to contribute to the current research community related to the field of sentiment analysis.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Defying the Footprint Oracle: Implications of Country Resource Trends

Sustainability ◽

10.3390/su11072164 ◽

2019 ◽

Vol 11 (7) ◽

pp. 2164 ◽

Cited By ~ 13

Author(s):

Mathis Wackernagel ◽

David Lin ◽

Mikel Evans ◽

Laurel Hanscom ◽

Peter Raven

Keyword(s):

International Development ◽

Low Income ◽

Economic Performance ◽

Analysis Data ◽

Low Income Countries ◽

Data Sets ◽

Economic Policies ◽

Economic Implications ◽

Financial Assessment ◽

Resource Dynamics

Mainstream competitiveness and international development analyses pay little attention to the significance of a country’s resource security for its economic performance. This paper challenges this neglect, examining the economic implications of countries resource dynamics, particularly for low-income countries. It explores typologies of resource patterns in the context of those countries’ economic prospects. To begin, the paper explains why it uses Ecological Footprint and biocapacity accounting for its analysis. Data used for the analysis stem from Global Footprint Network’s 2018 edition of its National Footprint and Biocapacity Accounts. Ranging from 1961 to 2014, these accounts are computed from UN data sets. The accounts track, year by year, how much biologically productive space is occupied by people’s consumption and compare this with how much productive space is available. Both demand and availability are expressed in productivity-adjusted hectares, called global hectares. Using this biophysical accounting perspective, the paper predicts countries’ future socio-economic performance. This analysis is then contrasted with a financial assessment of those countries. The juxtaposition reveals a paradox: Financial assessments seem to contradict assessments based on biophysical trends. The paper offers a way to reconcile this paradox, which also elevates the significance of biophysical country assessments for shaping successful economic policies.

Download Full-text