BigFeel—A Distributed Processing Environment for the Integration of Sentiment Analysis Methods

Abstract Sentiment analysis has been the main focus of plenty of research efforts, particularly justified by its commercial significance, both for consumers and businesses. Thus, many methods have been proposed so far, and the most prominent have been compared in terms of effectiveness. Nonetheless, the literature is deficient when it comes to assessing the efficiency of these methods for processing large volumes of data. In this study, we performed an experimental assessment of the efficiency of 22 methods in total, whose implementations were available. We also proposed and assessed an environment for distributed processing methods for sentiment analysis, using the Apache Spark platform, named BigFeel. In this environment, the existing methods, outlined to run in a non-distributed way, can be adapted, without altering their source code, to run in a distributed manner. The experimental results reveal that (i) few methods are efficient in their native form, (ii) the methods improve their efficiency after having been integrated into BigFeel, (iii) some of them, which were unfeasible to process a large dataset, became viable when deployed in a computer cluster and (iv) some methods can only handle small datasets, even in a distributed manner.

Download Full-text

DISTRIBUTED PROCESSING OF LARGE VOLUMES OF TRANSACTIONAL DATA

Naukovyi visnyk Donetskoho natsionalnoho tekhnichnoho universytetu ◽

10.31474/2415-7902-2020-1(4)-2(5)-27-36 ◽

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Download Full-text

A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis

Applied Sciences ◽

10.3390/app112311344 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11344

Author(s):

Wei Ke ◽

Ka-Hou Chan

Keyword(s):

Probability Distribution ◽

Information Extraction ◽

Sentiment Analysis ◽

State Of The Art ◽

Final Analysis ◽

The State ◽

Experimental Results ◽

Content Adaptive

Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type of CNN-based model as an extractor to explore and capture useful features in the hidden state, which represent the content of the entire paragraph. In particular, we introduce the Chebyshev pooling to connect to the end of the CNN-based extractor instead of using the maximum pooling. This can project the features into a probability distribution so as to provide an interpretable evaluation for the final analysis. Experimental results demonstrate the superiority of the proposed approach, being compared to the state-of-the-art models.

Download Full-text

SentiProdBR: Building Domain-Specific Sentiment Lexicons for the Portuguese Language

10.5753/sbbd.2021.17897 ◽

2021 ◽

Author(s):

Tiago de Melo

Keyword(s):

Decision Making ◽

Sentiment Analysis ◽

Online Reviews ◽

Bayes Theorem ◽

Experimental Results ◽

Product Categories ◽

Domain Specific ◽

Alternative Approaches ◽

The Web

Online reviews are readily available on the Web and widely used for decision-making. However, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical collections. In this paper, we present an effective methodology using probabilities of the Bayes’ Theorem for building a set of lexicons, called SentiProdBR, for 10 different product categories for the Portuguese language. Experimental results indicate that our methodology significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.

Download Full-text

ATLaS: Assistant Software for Life Scientists to Use in Calculations of Buffer Solutions

Tehnički glasnik ◽

10.31803/tg-20210326070107 ◽

2021 ◽

Vol 15 (4) ◽

pp. 541-545

Author(s):

Ugur Comlekcioglu ◽

Nazan Comlekcioglu

Keyword(s):

Programming Language ◽

Life Science ◽

Source Code ◽

Experimental Results ◽

Acid Base ◽

Buffer Solutions ◽

Python Programming Language ◽

Science Laboratories ◽

Executable File ◽

Python Programming

Many solutions such as percentage, molar and buffer solutions are used in all experiments conducted in life science laboratories. Although the preparation of the solutions is not difficult, miscalculations that can be made during intensive laboratory work negatively affect the experimental results. In order for the experiments to work correctly, the solutions must be prepared completely correctly. In this project, a software, ATLaS (Assistant Toolkit for Laboratory Solutions), has been developed to eliminate solution errors arising from calculations. Python programming language was used in the development of ATLaS. Tkinter and Pandas libraries were used in the program. ATLaS contains five main modules (1) Percent Solutions, (2) Molar Solutions, (3) Acid-Base Solutions, (4) Buffer Solutions and (5) Unit Converter. Main modules have sub-functions within themselves. With PyInstaller, the software was converted into a stand-alone executable file. The source code of ATLaS is available at https://github.com/cugur1978/ATLaS.

Download Full-text

TEXT SENTIMENT ANALYSIS BASED ON CNNS AND SVM

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i6.2019.761 ◽

2019 ◽

Vol 7 (6) ◽

pp. 77-83 ◽

Cited By ~ 1

Author(s):

Dr. C. Arunabala ◽

P. Jwalitha ◽

Soniya Nuthalapati

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Sentiment Analysis ◽

Expressive Power ◽

Sentiment Classification ◽

Experimental Results ◽

Analysis Method ◽

Mapping Functions ◽

Generalization Ability ◽

Text Sentiment Analysis

The traditional text sentiment analysis method is mainly based on machine learning. However, its dependence on emotion dictionary construction and artificial design and extraction features makes the generalization ability limited. In contrast, depth models have more powerful expressive power, and can learn complex mapping functions from data to affective semantics better. In this paper, a Convolution Neural Networks (CNNs) model combined with SVM text sentiment analysis is proposed. The experimental results show that the proposed method improves the accuracy of text sentiment classification effectively compared with traditional CNN, and confirms the effectiveness of sentiment analysis based on CNNs and SVM

Download Full-text

Experimental Assessment of the Performance of CECO Wave Energy Converter in Irregular Waves

Volume 10: Ocean Renewable Energy ◽

10.1115/omae2018-77686 ◽

2018 ◽

Cited By ~ 1

Author(s):

Claudio A. Rodríguez ◽

F. Taveira-Pinto ◽

P. Rosa-Santos

Keyword(s):

Wave Energy ◽

Transfer Functions ◽

Nonlinear Effects ◽

Model Tests ◽

Experimental Results ◽

Irregular Waves ◽

Scale Model ◽

Experimental Assessment ◽

Simplified Approach ◽

Wave Basin

A new concept of wave energy device (CECO) has been proposed and developed at the Hydraulics, Water Resources and Environment Division of the Faculty of Engineering of the University of Porto (FEUP). In a first stage, the proof of concept was performed through physical model tests at the wave basin (Rosa-Santos et al., 2015). These experimental results demonstrated the feasibility of the concept to harness wave energy and provided a preliminary assessment of its performance. Later, an extensive experimental campaign was conducted with an enhanced 1:20 scale model of CECO under regular and irregular long and short-crested waves (Marinheiro et al., 2015). An electric PTO system with adjustable damping levels was also installed on CECO as a mechanism of quantification of the WEC power. The results of regular waves tests have been used to validate a numerical model to gain insight into different potential configurations of CECO and its performance (López et al., 2017a,b). This paper presents the results and analyses of the model tests in irregular waves. A simplified approach based on spectral analyses of the WEC motions is presented as a means of experimental assessment of the damping level of the PTO mechanism and its effect on the WEC power absorption. Transfer functions are also computed to identify nonlinear effects associated to higher waves and to characterize the range of periods where wave absorption is maximized. Furthermore, based on the comparison of the present experimental results with those corresponding to a linear numerical potential model, some discussions are addressed regarding viscous and other nonlinear effects on CECO performance.

Download Full-text

An Efficient Framework for Vietnamese Sentiment Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200579 ◽

2020 ◽

Author(s):

Cuong V. Nguyen ◽

Khiem H. Le ◽

Anh M. Tran ◽

Binh T. Nguyen

Keyword(s):

Product Quality ◽

Sentiment Analysis ◽

New Products ◽

Classification Problem ◽

Research Community ◽

Sentiment Classification ◽

Experimental Results ◽

Data Sets ◽

Online Retailers

With the booming development of E-commerce platforms in many counties, there is a massive amount of customers’ review data in different products and services. Understanding customers’ feedbacks in both current and new products can give online retailers the possibility to improve the product quality, meet customers’ expectations, and increase the corresponding revenue. In this paper, we investigate the Vietnamese sentiment classification problem on two datasets containing Vietnamese customers’ reviews. We propose eight different approaches, including Bi-LSTM, Bi-LSTM + Attention, Bi-GRU, Bi-GRU + Attention, Recurrent CNN, Residual CNN, Transformer, and PhoBERT, and conduct all experiments on two datasets, AIVIVN 2019 and our dataset self-collected from multiple Vietnamese e-commerce websites. The experimental results show that all our proposed methods outperform the winning solution of the competition “AIVIVN 2019 Sentiment Champion” with a significant margin. Especially, Recurrent CNN has the best performance in comparison with other algorithms in terms of both AUC (98.48%) and F1-score (93.42%) in this competition dataset and also surpasses other techniques in our dataset collected. Finally, we aim to publish our codes, and these two data-sets later to contribute to the current research community related to the field of sentiment analysis.

Download Full-text

Real time Sentiment Analysis of Tweets using Apache Spark and Scala

ACS Journal for Science and Engineering ◽

10.34293/acsjse.v1i2.9 ◽

2021 ◽

Vol 1 (2) ◽

pp. 9-15

Author(s):

V Mareeswari ◽

Sunita S Patil ◽

Ramanan G

Keyword(s):

Sentiment Analysis ◽

Real Time ◽

Ad Hoc ◽

Apache Spark ◽

Data Streaming ◽

Real Time Processing ◽

Open Source Data ◽

Textual Data ◽

Bayes Algorithm ◽

Processing Platform

Sentiment Analysis is becoming the field of focus with time considering the user experience weighs much more for the business to grow and for the studies as well. The sentimental expressions refers to the emotions or feeling of a person across certain point of focus or issues. So, in this project, with the assistance of Apache Spark Framework, an open source data streaming and processing platform, sentiment evaluation is done on the tweets from Twitter by the means of real time processing as well as an Ad-hoc Run. Some preprocessing of the textual data has been done upon for better characteristics extraction thus resulting in greater accuracy. The validation of this has been done for achieving better result by comparing the other processes when Naive Bayes algorithm is used.

Download Full-text

A Virtual Laboratory for Digital Signal Processing

Virtual Technologies ◽

10.4018/978-1-59904-955-7.ch029 ◽

2008 ◽

pp. 474-487

Author(s):

Chyi-Ren Dow ◽

Yi-Hsung Li ◽

Jin-Yu Bai

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Mobile Agent ◽

Source Code ◽

Digital Signal ◽

Experimental Results ◽

Positive Feedbacks ◽

Network Bandwidth ◽

Network Capability ◽

Java Native Interface

This work designs and implements a virtual digital signal processing laboratory, VDSPL. VDSPL consists of four parts: mobile agent execution environments, mobile agents, DSP development software, and DSP experimental platforms. The network capability of VDSPL is created by using mobile agent and wrapper techniques without modifying the source code of the original programs. VDSPL provides human-human and human-computer interaction for students and teachers, and it can also lighten the loading of teachers, increase the learning result of students, and improve the usage of network bandwidth. A prototype of VDSPL has been implemented by using the IBM Aglet system and Java Native Interface for DSP experimental platforms. Also, experimental results demonstrate that our system has received many positive feedbacks from both students and teachers.

Download Full-text

Smart City Services Monitoring Framework using Fuzzy Logic Based Sentiment Analysis and Apache Spark

2019 1st International Conference on Smart Systems and Data Science (ICSSD) ◽

10.1109/icssd47982.2019.9002687 ◽

2019 ◽

Author(s):

Bahra Mohamed ◽

Fennan Abdelhadi ◽

Bouktaib Adil ◽

Hmami Haytam

Keyword(s):

Fuzzy Logic ◽

Sentiment Analysis ◽

Smart City ◽

Apache Spark ◽

Monitoring Framework

Download Full-text