Real time Sentiment Analysis of Tweets using Apache Spark and Scala

V Mareeswari;  Sunita S Patil;  Ramanan G

doi:10.34293/acsjse.v1i2.9

Real time Sentiment Analysis of Tweets using Apache Spark and Scala

ACS Journal for Science and Engineering ◽

10.34293/acsjse.v1i2.9 ◽

2021 ◽

Vol 1 (2) ◽

pp. 9-15

Author(s):

V Mareeswari ◽

Sunita S Patil ◽

Ramanan G

Keyword(s):

Sentiment Analysis ◽

Real Time ◽

Ad Hoc ◽

Apache Spark ◽

Data Streaming ◽

Real Time Processing ◽

Open Source Data ◽

Textual Data ◽

Bayes Algorithm ◽

Processing Platform

Sentiment Analysis is becoming the field of focus with time considering the user experience weighs much more for the business to grow and for the studies as well. The sentimental expressions refers to the emotions or feeling of a person across certain point of focus or issues. So, in this project, with the assistance of Apache Spark Framework, an open source data streaming and processing platform, sentiment evaluation is done on the tweets from Twitter by the means of real time processing as well as an Ad-hoc Run. Some preprocessing of the textual data has been done upon for better characteristics extraction thus resulting in greater accuracy. The validation of this has been done for achieving better result by comparing the other processes when Naive Bayes algorithm is used.

Download Full-text

An Embedded Real-Time Processing Platform for Optogenetic Neuroprosthetic Applications

IEEE Transactions on Neural Systems and Rehabilitation Engineering ◽

10.1109/tnsre.2017.2763130 ◽

2018 ◽

Vol 26 (1) ◽

pp. 233-243 ◽

Cited By ~ 3

Author(s):

Boyuan Yan ◽

Sheila Nirenberg

Keyword(s):

Real Time ◽

Real Time Processing ◽

Time Processing ◽

Processing Platform

Download Full-text

Real-time processing of IoT events with historic data using Apache Kafka and Apache Spark with dashing framework

2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) ◽

10.1109/rteict.2017.8256910 ◽

2017 ◽

Cited By ~ 7

Author(s):

Godson Michael D'silva ◽

Azharuddin Khan ◽

Gaurav ◽

Siddhesh Bari

Keyword(s):

Real Time ◽

Apache Spark ◽

Real Time Processing ◽

Time Processing ◽

Historic Data

Download Full-text

Visualization of Real-time Twitter Data based on Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4533.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1868-1872

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Real Time ◽

Social Networking Sites ◽

Political Issues ◽

The Public ◽

Public Sentiment ◽

Challenges And Opportunities ◽

Product Personality ◽

Bayes Algorithm

Analyzing information from social media sites could bring great challenges and opportunities to solve many real time problems. It gives the public opinion about almost every product, personality or any service. The data from social networking sites is more accurate and useful to analyze the public sentiment about the trending topics. The activity of analyzing opinions, sentiments and also the subjectivity of data that is provided, is called sentiment analysis. Tweepy is an easy-to-use python library which is used to extract source data from twitter. From these tweets, features are extracted and then classified using Naïve Bayes algorithm to identify sentiment. This aims to provide an interactive automatic system which predicts the sentiment of the tweets posted in social media using python in real-time. These applications of sentiment analysis are broad and they tend to be very useful in today’s lifestyle. It will evaluate people's sentiment about the trends, entertainment, political issues and products which helps to improve marketing strategies with the help of hashtags, keywords etc.

Download Full-text

CereBridge: An Efficient, FPGA-based Real-Time Processing Platform for True Mobile Brain-Computer Interfaces*

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) ◽

10.1109/embc44109.2020.9175623 ◽

2020 ◽

Author(s):

Marc-Nils Wahalla ◽

Guillermo Paya Vaya ◽

Holger Blume

Keyword(s):

Real Time ◽

Brain Computer Interfaces ◽

Real Time Processing ◽

Time Processing ◽

Computer Interfaces ◽

Processing Platform

Download Full-text

Deterministic, elastic and real - time processing in the big data era

10.12681/eadd/44613 ◽

2018 ◽

Author(s):

Νικόλαος Ζαχείλας

Keyword(s):

Big Data ◽

Real Time ◽

Apache Spark ◽

Trade Off ◽

Real Time Processing ◽

Time Processing ◽

Apache Storm

Τα τελευταία χρόνια παρατηρούμε μία ραγδαία αύξηση του πλήθους τωνδεδομένων τα οποία είναι απαραίτητο να αναλυθούν σε πραγματικό χρόνοαπό διαφορετικού είδους εφαρμογές, οι οποίες περιλαμβάνουν, εφαρμογέςανάλυσης της κυκλοφοριακής συμφόρησης, ιατροφαρμακευτικής περίθαλψηςκαθώς και χρηματοοικονομικές εφαρμογές. Προκείμενου να γίνει μεαποδοτικό τρόπο η επεξεργασία τόσου μεγάλου όγκου δεδομένων, έχουνπροταθεί πρωτότυπα κατανεμημένα συστήματα επεξεργασίας μεγάλου όγκουδεδομένων όπως το Apache Storm και το Apache Spark. Αυτά τα συστήματαείναι κλιμακωτά και παρέχουν χαμηλό χρόνο απόκρισης με το νακατανέμουν την επεξεργασία των δεδομένων σε πολλαπλούς και παράλληλαεκτελέσιμους υπολογιστικούς πόρους. Παρόλα αυτά, υπάρχουν αρκετές καισημαντικές ερευνητικές προκλήσεις που πρέπει να διευθετηθούνπροκειμένου να χρησιμοποιηθούν με τις πλήρεις δυνατότητές τους. Αυτέςοι προκλήσεις περιλαμβάνουν - αλλά δεν περιορίζονται μόνο σε αυτές -την παροχή της ντετερμινιστικής επεξεργασίας των δεδομένων, τονκαθορισμό των υπολογιστικών πόρων που πρέπει να χρησιμοποιηθούν, τηδιαχείριση της ασυμμετρίας που παρουσιάζεται στα δεδομένα που πρέπεινα επεξεργαστούν παράλληλα καθώς και τον αποτελεσματικόχρονοπρογραμματισμό των πολλαπλών εφαρμογών που εκτελούνται πάνω απότο σύστημα όταν οι εφαρμογές έχουν απαιτήσεις απόκρισης πραγματικούχρόνου. Ο σκοπός αυτής της διδακτορικής διατριβής είναι η πρότασηπρακτικών μεθόδων για την αντιμετώπιση αυτών των προβλημάτων.Το πρώτο μέρος της διδακτορικής διατριβής περιλαμβάνει προτάσεις γιατη βελτίωση της απόδοσης των συστημάτων επεξεργασίας μεγάλου όγκουδεδομένων μέσω της αντιμετώπισης του προβλήματος της παροχήςντετερμιστικής επεξεργασίας των δεδομένων και ταυτόχρονα τηνικανοποίηση απαιτήσεων απόκρισης πραγματικού χρόνου. Προκειμένου ταδεδομένα να επεξεργαστούν ντετερμινιστικά, είναι απαραίτητη η χρήσημηχανισμών που καθορίζουν τη σειρά με την οποία τα δεδομέναεπεξεργάζονται από τους υπολογιστικούς πόρους. Αφ'ετέρου, ηικανοποίηση περιορισμών πραγματικού χρόνου σε τέτοια συστήματα απαιτείτην αποτελεσματική διαχείριση του αντισταθμίσματος (trade-off) μεταξύτης ντετερμινιστικής επεξεργασίας και του μικρού χρόνου απόκρισης.Εξετάζοντας αυτές τις προκλήσεις και το αντιστάθμισμα πουδημιουργείται λόγω των περιορισμών πραγματικού χρόνου, ένα σύνολο απόσυστήματα και μεθοδολογίες προτείνονται στα πλαίσια του διδακτορικούπροκειμένου να επιτραπεί στους χρήστες των εφαρμογών και τουςδιαχειριστές των συστημάτων να χαλαρώνουν δυναμικά τους περιορισμούςτου ντετερμινισμού οταν είναι απαραίτητο, ώστε να ικανοποιηθούν οιπεριορισμοί στο χρόνο απόκρισης των εφαρμογών.Το δεύτερο μέρος της διατριβής επικεντρώνεται στο πρόβλημα τουχρονοπρογραμματισμού εφαρμογών με απαιτήσεις πραγματικού χρόνου σεσυστήματα επεξεργασίας δεδομένων μεγάλης κλίμακας δεδομένων πουχρησιμοποιούν το MapReduce προγραμματιστικό μοντέλο, και επιπροσθέτωςστο πρόβλημα της δημιουργίας υψηλής ακρίβειας μοντέλων πρόβλεψης τουχρόνου εκτέλεσης των εφαρμογών. Η δυσκολία των προβλημάτων έγκειταιστο ότι οι εφαρμογές πολλές φορές εκτελούνται σε ετερογενήπεριβάλλοντα, στην ανομοιομορφία των δεδομένων που επεξεργάζονται οιυπολογιστικοί πόροι λόγου της ασυμμετρίας (skewness) των δεδομένων,στις απαιτήσεις πραγματικού χρόνου που έχουν οι εφαρμογές καθώς και τοπεριορισμένο πλήθος διαθέσιμων εκτελέσεων για τη δημιουργία μοντέλωνπρόβλεψης υψηλής ακρίβειας. Προκειμένου να αντιμετωπιστούν ταπροαναφερθέντα προβλήματα προτείνονται ένα σύνολο από αλγορίθμουςχρονοπρογραμματισμού καθώς και ένα πρωτότυπο σύστημα δημιουργίαςπροφιλ εφαρμογών . Τέλος, το τρίτο μέρος της διδακτορικής διατριβήςσυνεισφέρει στο πρόβλημα της αποτελεσματικής διαχείρισης των πόρων σεκατανεμημένα συστήματα ροών δεδομένων. Προτείνεται ένας πρωτότυποςμηχανισμός ελαστικότητας (elasticity) ο οποίος επιτρέπει των καθορισμόεκ των προτέρων του βαθμού του παραλληλισμού των επεξεργαστικών πόρωντου συστήματος, καθώς επίσης εξετάζεται και η χρήση γνωστών αλγορίθμωνδιαμοιρασμού του φόρτου εργασίας μεταξύ των υπολογιστικών πόρωνπροκειμένου να βελτιώσουμε περαιτέρω το διαμετακομιστικό ρυθμό(throughput) του συστήματος.Καθ' όλη την διατριβή, οι μεθοδολογίες που αναπτύχθηκαν έχουναξιολογηθεί σε πραγματικά δεδομένα και σενάρια. Τα πειραματικάαποτελέσματα έδειξαν ότι οι προτεινόμενοι αλγόριθμοι ξεπερνούνσυστηματικά τις υπάρχουσες προσεγγίσεις και ότι αποτελούν πρακτικέςτεχνικές που μπορούν να χρησιμοποιηθούν σε δημοφιλή κατανεμημένασυστήματα επεξεργασίας ροών δεδομένων, όπως το Apache Storm και τοApache Spark.

Download Full-text

A real-time big data sentiment analysis for iraqi tweets using spark streaming

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i4.1897 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1411-1419

Author(s):

Nashwan Dheyaa Zaki ◽

Nada Yousif Hashim ◽

Yasmin Makki Mohialden ◽

Mostafa Abdulghafoor Mohammed ◽

Tole Sutikno ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Sentiment Analysis ◽

Real Time ◽

Data Science ◽

Opinion Mining ◽

Data Streaming ◽

Machine Learning Applications ◽

Learning Research

The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.

Download Full-text

Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Empiria Revista de metodología de ciencias sociales ◽

10.5944/empiria.42.2019.23254 ◽

2019 ◽

pp. 113 ◽

Cited By ~ 2

Author(s):

Carlos Arcila Calderón ◽

Félix Ortega Mohedano ◽

Mateo Álvarez ◽

Miguel Vicente Mariño

Keyword(s):

Machine Learning ◽

Big Data ◽

Sentiment Analysis ◽

Real Time ◽

Large Scale ◽

Apache Spark ◽

Audience Research ◽

Distributed Environment ◽

Cross Sectional ◽

Large Scale Analysis

The large-scale analysis of tweets in real-time using supervised sentiment analysis depicts a unique opportunity for communication and audience research. Bringing together machine learning and streaming analytics approaches in a distributed environment might help scholars to obtain valuable data from Twitter in order to immediately classify messages depending on the context with no restrictions of time or storage, empowering cross-sectional, longitudinal and experimental designs with new inputs. Even when communication and audience researchers begin to use computational methods, most of them remain unfamiliar with distributed technologies to face big data challenges. This paper describes the implementation of parallelized machine learning methods in Apache Spark to predict sentiments in real-time tweets and explains how this process can be scaled up using academic or commercial distributed computing when personal computers do not support computations and storage. We discuss the limitation of these methods and their implications in communication, audience and media studies.El análisis a gran escala de tweets en tiempo real utilizando el análisis de sentimiento supervisado representa una oportunidad única para la investigación de comunicación y audiencias. El poner juntos los enfoques de aprendizaje automático y de analítica en tiempo real en un entorno distribuido puede ayudar a los investigadores a obtener datos valiosos de Twitter con el fin de clasificar de forma inmediata mensajes en función de su contexto, sin restricciones de tiempo o almacenamiento, mejorando los diseños transversales, longitudinales y experimentales con nuevas fuentes de datos. A pesar de que los investigadores de comunicación y audiencias ya han comenzado a utilizar los métodos computacionales en sus rutinas, la mayoría desconocen el uso de las tecnologías de computo distribuido para afrontar retos de dimensión big data. Este artículo describe la implementación de métodos de aprendizaje automático paralelizados en Apache Spark para predecir sentimientos de tweets en tiempo real y explica cómo este proceso puede ser escalado usando computación distribuida tanto comercial como académica, cuando los ordenadores personales son insuficientes para almacenar y analizar los datos. Se discuten las limitaciones de estos métodos y sus implicaciones en los estudios de medios, comunicación y audiencias.

Download Full-text

Real-time Data Streaming using Apache Spark on Fully Configured Hadoop Cluster

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2018.12.00013 ◽

2018 ◽

Vol 13 (5) ◽

Author(s):

Kashi Sai Prasad

Keyword(s):

Real Time ◽

Apache Spark ◽

Data Streaming ◽

Time Data ◽

Real Time Data ◽

Hadoop Cluster

Download Full-text

Sentiment Analysis - An Assessment of Online Public Opinion: A Conceptual Review

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i5.2266 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1881-1887

Author(s):

Dr. Garima Sainger

Keyword(s):

Sentiment Analysis ◽

Real Time ◽

Large Data ◽

Tool Support ◽

Time Data ◽

Data Collection And Analysis ◽

Textual Data ◽

Conceptual Paper ◽

Textual Content ◽

Learning Language

This conceptual paper discusses sentiment analysis as a technique of research. It is a tool support decision for textual data collection and analysis available on the internet. It is also considered as a technique of data mining. It uses machine learning language to evaluate textual content. As a method of research, it is computational by nature and identify and categories opinions in the form of text. It targets a large data without any delay and hurdle and also facilitates the collection of data and its analysis. It helps domain leaders to collect real time data about emotions, opinion and attitude, without compromising, validity, reliability and generalizability. The paper also presents this as a way to divide quantitative and qualitative data through real time innovative ways of collection and analysis of data. The paper also discusses limitations one experience when applying this in their domain of research.

Download Full-text

Design of High Reliable Spaceborne SAR Real-time Processing Platform

10.23919/ciss51089.2021.9652290 ◽

2021 ◽

Author(s):

Qingbo Liu ◽

Fengjiao Wang ◽

Huan Yu ◽

Hongzhi Li ◽

Jianhua Zhao

Keyword(s):

Real Time ◽

Real Time Processing ◽

Time Processing ◽

Processing Platform

Download Full-text