Topic Modeling and Sentiment Analysis of Electric Vehicles of Twitter Data

Twitter is a well-known social media tool for people to communicate their thoughts and feelings about products or services. In this project, I collect electric vehicles related user tweets from Twitter using Twitter API and analyze public perceptions and feelings regarding electric vehicles. After collecting the data, To begin with, as the first step, I built a pre-processed data model based on natural language processing (NLP) methods to select tweets. In the second step, I use topic modeling, word cloud, and EDA to examine several aspects of electric vehicles. By using Latent Dirichlet allocation, do Topic modeling to infer the various topics of electric vehicles. The topic modeling in this study was compared with LSA and LDA, and I found that LDA provides a better insight into topics, as well as better accuracy than LSA.In the third step, the “Valence Aware Dictionary (VADER)” and “sEntiment Reasoner (SONAR)” are used to analyze sentiment of electric vehicles, and its related tweets are either positive, negative, or neutral. In this project, I collected 45000 tweets from Twitter API, related hashtags, user location, and different topics of electric vehicles. Tesla is the top hashtag Twitter users tweeted while sharing tweets related to electric vehicles. Ekero Sweden is the most common location of users related to electric vehicles tweets. Tesla is the most common word in the tweets related to electric vehicles. Elon-musk is the common bi-gram found in the tweets related to electric vehicles. 47.1% of tweets are positive, 42.4% are neutral, and 10.5% are negative as per VADER Finally, I deploy this project work as a fully functional web app.

Download Full-text

Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study (Preprint)

10.2196/preprints.21978 ◽

2020 ◽

Author(s):

Sakun Boon-Itt ◽

Yukolpat Skunkan

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Topic Modeling ◽

Public Perception ◽

English Language ◽

Latent Dirichlet Allocation ◽

Public Awareness ◽

Good Communication ◽

Three Stages ◽

Twitter Users

BACKGROUND COVID-19 is a scientifically and medically novel disease that is not fully understood because it has yet to be consistently and deeply studied. Among the gaps in research on the COVID-19 outbreak, there is a lack of sufficient infoveillance data. OBJECTIVE The aim of this study was to increase understanding of public awareness of COVID-19 pandemic trends and uncover meaningful themes of concern posted by Twitter users in the English language during the pandemic. METHODS Data mining was conducted on Twitter to collect a total of 107,990 tweets related to COVID-19 between December 13 and March 9, 2020. The analyses included frequency of keywords, sentiment analysis, and topic modeling to identify and explore discussion topics over time. A natural language processing approach and the latent Dirichlet allocation algorithm were used to identify the most common tweet topics as well as to categorize clusters and identify themes based on the keyword analysis. RESULTS The results indicate three main aspects of public awareness and concern regarding the COVID-19 pandemic. First, the trend of the spread and symptoms of COVID-19 can be divided into three stages. Second, the results of the sentiment analysis showed that people have a negative outlook toward COVID-19. Third, based on topic modeling, the themes relating to COVID-19 and the outbreak were divided into three categories: the COVID-19 pandemic emergency, how to control COVID-19, and reports on COVID-19. CONCLUSIONS Sentiment analysis and topic modeling can produce useful information about the trends in the discussion of the COVID-19 pandemic on social media as well as alternative perspectives to investigate the COVID-19 crisis, which has created considerable public awareness. This study shows that Twitter is a good communication channel for understanding both public concern and public awareness about COVID-19. These findings can help health departments communicate information to alleviate specific public concerns about the disease.

Download Full-text

Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study

JMIR Public Health and Surveillance ◽

10.2196/21978 ◽

2020 ◽

Vol 6 (4) ◽

pp. e21978

Author(s):

Sakun Boon-Itt ◽

Yukolpat Skunkan

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Topic Modeling ◽

Public Perception ◽

English Language ◽

Latent Dirichlet Allocation ◽

Public Awareness ◽

Good Communication ◽

Three Stages ◽

Twitter Users

Background COVID-19 is a scientifically and medically novel disease that is not fully understood because it has yet to be consistently and deeply studied. Among the gaps in research on the COVID-19 outbreak, there is a lack of sufficient infoveillance data. Objective The aim of this study was to increase understanding of public awareness of COVID-19 pandemic trends and uncover meaningful themes of concern posted by Twitter users in the English language during the pandemic. Methods Data mining was conducted on Twitter to collect a total of 107,990 tweets related to COVID-19 between December 13 and March 9, 2020. The analyses included frequency of keywords, sentiment analysis, and topic modeling to identify and explore discussion topics over time. A natural language processing approach and the latent Dirichlet allocation algorithm were used to identify the most common tweet topics as well as to categorize clusters and identify themes based on the keyword analysis. Results The results indicate three main aspects of public awareness and concern regarding the COVID-19 pandemic. First, the trend of the spread and symptoms of COVID-19 can be divided into three stages. Second, the results of the sentiment analysis showed that people have a negative outlook toward COVID-19. Third, based on topic modeling, the themes relating to COVID-19 and the outbreak were divided into three categories: the COVID-19 pandemic emergency, how to control COVID-19, and reports on COVID-19. Conclusions Sentiment analysis and topic modeling can produce useful information about the trends in the discussion of the COVID-19 pandemic on social media as well as alternative perspectives to investigate the COVID-19 crisis, which has created considerable public awareness. This study shows that Twitter is a good communication channel for understanding both public concern and public awareness about COVID-19. These findings can help health departments communicate information to alleviate specific public concerns about the disease.

Download Full-text

Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s

Revista de Estudos da Linguagem ◽

10.17851/2237-2083.23.3.695-726 ◽

2015 ◽

Vol 23 (3) ◽

pp. 695 ◽

Cited By ~ 1

Author(s):

Arnaldo Candido Junior ◽

Célia Magalhães ◽

Helena Caseli ◽

Régis Zangirolami

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Keyword Extraction ◽

Processing Methods ◽

Dirichlet Allocation

Este artigo tem o objetivo da avaliar a aplicação de dois métodos automáticos eficientes na extração de palavras-chave, usados pelas comunidades da Linguística de Corpus e do Processamento da Língua Natural para gerar palavras-chave de textos literários: o WordSmith Tools e o Latent Dirichlet Allocation (LDA). As duas ferramentas escolhidas para este trabalho têm suas especificidades e técnicas diferentes de extração, o que nos levou a uma análise orientada para a sua performance. Objetivamos entender, então, como cada método funciona e avaliar sua aplicação em textos literários. Para esse fim, usamos análise humana, com conhecimento do campo dos textos usados. O método LDA foi usado para extrair palavras-chave por meio de sua integração com o Portal Min@s: Corpora de Fala e Escrita, um sistema geral de processamento de corpora, concebido para diferentes pesquisas de Linguística de Corpus. Os resultados do experimento confirmam a eficácia do WordSmith Tools e do LDA na extração de palavras-chave de um corpus literário, além de apontar que é necessária a análise humana das listas em um estágio anterior aos experimentos para complementar a lista gerada automaticamente, cruzando os resultados do WordSmith Tools e do LDA. Também indicam que a intuição linguística do analista humano sobre as listas geradas separadamente pelos dois métodos usados neste estudo foi mais favorável ao uso da lista de palavras-chave do WordSmith Tools.

Download Full-text

Hidden Stories in Hydrologic Literature: An Interactive Topic-Based Ontology

10.5194/egusphere-egu2020-882 ◽

2020 ◽

Author(s):

Mashrekur Rahman ◽

Grey Nearing ◽

Jonathan Frame

Keyword(s):

Water Resources ◽

Computational Linguistics ◽

Language Processing ◽

Topic Modeling ◽

Science Communication ◽

Latent Dirichlet Allocation ◽

Relevant Literature ◽

Optimal Number ◽

Primary Objective ◽

Climate Change Research

Hydrologic research generates massive volumes of peer-reviewed literature across a plethora of evolving topics and sub-topics. It&#8217;s becoming increasingly difficult for scientists and practitioners to synthesize and leverage the full body of scientific literature. Recent advancement of computational linguistics, machine learning, including a variety of toolboxes for Natural Language Processing (NLP), help facilitate analysis of vast electronic corpuses for a multitude of objectives. Research papers published as electronic text files in different journals offer windows into trending topics and developments, and NLP allows us to extract information and insight about these trends.&#160;&#160;This project applies Latent Dirichlet Allocation (LDA) Topic Modeling for bibliometric analyses of all peer-reviewed articles in selected high-impact (Impact Factor > 0.9) journals in hydrology (Water Resources Research, Hydrology and Earth System Sciences, Journal of Hydrology,&#160; Hydrological Processes, Advances in Water Resources, Hydrological Sciences Journal, Journal of Hydrometeorology). Topic modeling uses statistical algorithms to extract semantic information from a collection of texts and has become an emerging quantitative method to assess substantial textual data. After acquiring all the papers published in the aforementioned journals and applying multiple pre-processing routines including removing punctuations, nonsensical texts, stopwords, and tokenizing, stemming, lemmatization etc., the resultant corpus was fed to the LDA model for &#8216;learning&#8217; latent intellectual topics. We achieved this using Gensim, an open-source Python library widely used for unsupervised semantic modeling with LDA. The optimal number of topics (k) and model hyperparameters were decided using coherence and perplexity values for multiple LDA models with varying k.&#160; The resulting generated topics are interpretable based on our prior knowledge of hydrology and related sub-disciplines. Comparative topic trend, term, and document level cluster analyses based on different time periods, journals and authors were performed. These analyses revealed topics such as climate change research gaining popularity in Hydrology over the last decade.&#160;&#160;We aim to use these results combined with probability distribution between topics, journals and authors to create an interactive ontology map that is useful for research scientists and environmental consultants for exploring relevant literature based on topics and topic relationships. The primary objective of this work is to allow science practitioners to explore new branches and connections in the Hydrology literature, and to facilitate comprehensive and inclusive literature reviews. Second-order beneficiaries are decision and policy makers: the proposed project will provide insights into current research trends and help identify transitions and argumentative viewpoints in hydrologic research. The outcomes of this project will also serve as tools to facilitate effective science communication and aid in bridging gaps between scientists and stakeholders of their research.

Download Full-text

Social Media Activism and Convergence in Tweet Topics After the Initial #MeToo Movement for Two Distinct Groups of Twitter Users

Journal of Interpersonal Violence ◽

10.1177/08862605211001481 ◽

2021 ◽

pp. 088626052110014

Author(s):

Jason M. Baik ◽

Thet H. Nyein ◽

Sepideh Modrek

Keyword(s):

Social Media ◽

Sexual Assault ◽

Frequency Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Bigram Frequency ◽

Media Activism ◽

Online Social Media ◽

Before And After ◽

Twitter Users

Online social media movements are now common and support cultural discussions on difficult health and social topics. The #MeToo movement, focusing on the pervasiveness of sexual assault and harassment, has been one of the largest and most influential online movements. Our study examines topics of conversation on Twitter by supporters of the #MeToo movement and by Twitter users who were uninvolved in the movement to explore the extent to which tweet topics for these two groups converge over time. We identify and collect one year’s worth of tweets for supporters of the #MeToo movement ( N = 168 users; N = 105,538 tweets) and users not involved in the movement ( N = 147 users; N = 112,301 tweets referred to as the Neutral Sample). We conduct topic frequency analysis and implement an unsupervised machine learning topic modeling algorithm, latent Dirichlet allocation, to explore topics of discussion on Twitter for these two groups of users before and after the initial #MeToo movement. Our results suggest that supporters of #MeToo discussed different topics compared to the Neutral Sample of Twitter users before #MeToo with some overlap on politics. The supporters were already discussing sexual assault and harassment issues six months before #MeToo, and discussion on this topic increased 13.7-fold in the six months after. For the Neutral Sample, sexual assault and harassment was not a key topic of discussion on Twitter before #MeToo, but there was some limited increase afterward. Results of bigram frequency analysis and topic modeling showed a clear increase in topic related to gender for the supporters of #MeToo but gave mixed results for the Neutral Sample comparison group. Our results suggest limited shifts in the conversation on Twitter for the Neutral Sample. Our methods and results have implications for measuring the extent to which online social media movements, like #MeToo, reach a broad audience.

Download Full-text

Mental Health Concerns Related to the COVID-19 Pandemic on Twitter in the UK

10.1101/2021.09.27.21264177 ◽

2021 ◽

Author(s):

Daiwei Zhang ◽

Yue Liu ◽

Senqi Zhang ◽

Li Sun ◽

Pin Li ◽

...

Keyword(s):

Mental Health ◽

Urban Areas ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Allocation Model ◽

Health Concerns ◽

Health Related ◽

Mental Health Concerns ◽

Twitter Users ◽

The Uk

AbstractBackgroundAmid the COVID-19 pandemic, mental health-related symptoms (such as depression and anxiety) have been actively mentioned on social media.ObjectiveIn this study, we aimed to monitor mental health concerns on Twitter during the COVID-19 pandemic in the United Kingdom (UK), and assess the potential impact of the COVID-19 pandemic on mental health concerns of Twitter users.MethodsWe collected COVID-19 and mental health-related tweets from the UK between March 5, 2020 and January 31, 2021 through the Twitter Streaming API. We conducted topic modeling using Latent Dirichlet Allocation model to examine discussions about mental health concerns. Deep learning algorithms including Face++ were used to infer the demographic characteristics (age and gender) of Twitter users who expressed mental health concerns related to the COVID-19 pandemic.ResultsWe showed a positive correlation between COVID-19-related mental health concerns on Twitter and the severity of the COVID-19 pandemic in the UK. Geographic analysis showed that populated urban areas have a higher proportion of Twitter users with mental health concerns compared to England as a whole. Topic modeling showed that general concerns, COVID-19 skeptics, and Death toll were the top topics discussed in mental health-related tweets. Demographic analysis showed that middle-aged and older adults might be more likely to suffer from mental health issues or express their mental health concerns on Twitter during the COVID-19 pandemic.ConclusionsThe COVID-19 pandemic has noticeable effects on mental health concerns on Twitter in the UK, which varied among demographic and geographic groups.

Download Full-text

Exploring Occupation Differences in Reactions to COVID-19 Pandemic on Twitter

Data and Information Management ◽

10.2478/dim-2020-0032 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Yi Zhao ◽

Haixu Xi ◽

Chengzhi Zhang

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Experimental Results ◽

Social Implications ◽

Income Levels ◽

Related Information ◽

The Social ◽

Twitter Users

AbstractCoronavirus disease 2019 (COVID-19) pandemic-related information are flooded on social media, and analyzing this information from an occupational perspective can help us to understand the social implications of this unprecedented disruption. In this study, using a COVID-19-related dataset collected with the Twitter IDs, we conduct topic and sentiment analysis from the perspective of occupation, by leveraging Latent Dirichlet Allocation (LDA) topic modeling and Valence Aware Dictionary and sEntiment Reasoning (VADER) model, respectively. The experimental results indicate that there are significant topic preference differences between Twitter users with different occupations. However, occupation-linked affective differences are only partly demonstrated in our study; Twitter users with different income levels have nothing to do with sentiment expression on covid-19-related topics.

Download Full-text

Customers' experience of purchasing event tickets: mining online reviews based on topic modeling and sentiment analysis

International Journal of Event and Festival Management ◽

10.1108/ijefm-06-2020-0034 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Krzysztof Celuch

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Online Reviews ◽

Third Party ◽

Content Type

PurposeIn search of creating an extraordinary experience for customers, services have gone beyond the means of a transaction between buyers and sellers. In the event industry, where purchasing tickets online is a common procedure, it remains unclear as to how to enhance the multifaceted experience. This study aims at offering a snapshot into the most valued aspects for consumers and to uncover consumers' feelings toward their experience of purchasing event tickets on third-party ticketing platforms.Design/methodology/approachThis is a cross-disciplinary study that applies knowledge from both data science and services marketing. Under the guise of natural language processing, latent Dirichlet allocation topic modeling and sentiment analysis were used to interpret the embedded meanings based on online reviews.FindingsThe findings conceptualized ten dimensions valued by eventgoers, including technical issues, value of core product and service, word-of-mouth, trustworthiness, professionalism and knowledgeability, customer support, information transparency, additional fee, prior experience and after-sales service. Among these aspects, consumers rated the value of the core product and service to be the most positive experience, whereas the additional fee was considered the least positive one.Originality/valueDrawing from the intersection of natural language processing and the status quo of the event industry, this study offers a better understanding of eventgoers' experiences in the case of purchasing online event tickets. It also provides a hands-on guide for marketers to stage memorable experiences in the era of digitalization.

Download Full-text

Distributed Latent Dirichlet Allocation on Streams

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451528 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-20

Author(s):

Yunyan Guo ◽

Jianzhong Li

Keyword(s):

Real Time ◽

Language Processing ◽

Real World ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Real Life ◽

Streaming Data ◽

Real World Datasets ◽

Dirichlet Allocation ◽

Online Inference

Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution , data turbulence , and real-time inference . In this article, we propose a novel distributed LDA algorithm—referred to as StreamFed-LDA— to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics from the most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.

Download Full-text

Topic Modeling for Twitter Users Regarding the "Ruanggguru" Application

Jurnal ILMU DASAR ◽

10.19184/jid.v21i2.17112 ◽

2020 ◽

Vol 21 (2) ◽

pp. 149

Author(s):

Bagus Wicaksono Arianto ◽

Gangga Anuraga

Keyword(s):

Topic Modeling ◽

Public Perception ◽

Latent Dirichlet Allocation ◽

The Public ◽

Allocation Method ◽

Twitter Account ◽

Twitter Users ◽

A Company ◽

Dirichlet Allocation ◽

Expansion Strategies

PT Ruang Raya Indonesia ("Ruangguru") is the largest and most comprehensive technology company in Indonesia that focuses on education-based services. In 2019 there were 15 million Ruangguru users and 300.00 teachers who had joined and were present in 32 provinces in Indonesia. It prepared a number of expansion strategies to become a company valued at more than US $ 1 billion in the next year or two. The purpose of this research is to classify the opinions of Ruangguru users about the services provided so that it can be an evaluation material in improving their services using the latent direchlet allocation method. The data used comes from a collection of tweets of Twitter users in Indonesia using the Twitter API. The Twitter account used in this study is @ruangguru. The results of the analysis showed that the public perception of Twitter users by using latent dirichlet allocation was formed into 28 topics.Keywords: latent dirichlet allocation, ruangguru, twitter.

Download Full-text