Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news

2016 ◽  
Vol 42 (6) ◽  
pp. 763-781 ◽  
Author(s):  
Erin Hea-Jin Kim ◽  
Yoo Kyung Jeong ◽  
Yuyoung Kim ◽  
Keun Young Kang ◽  
Min Song

The present study investigates topic coverage and sentiment dynamics of two different media sources, Twitter and news publications, on the hot health issue of Ebola. We conduct content and sentiment analysis by: (1) applying vocabulary control to collected datasets; (2) employing the n-gram LDA topic modeling technique; (3) adopting entity extraction and entity network; and (4) introducing the concept of topic-based sentiment scores. With the query term ‘Ebola’ or ‘Ebola virus’, we collected 16,189 news articles from 1006 different publications and 7,106,297 tweets with the Twitter stream API. The experiments indicate that topic coverage of Twitter is narrower and more blurry than that of the news media. In terms of sentiment dynamics, the life span and variance of sentiment on Twitter is shorter and smaller than in the news. In addition, we observe that news articles focus more on event-related entities such as person, organization and location, whereas Twitter covers more time-oriented entities. Based on the results, we report on the characteristics of Twitter and news media as two distinct news outlets in terms of content coverage and sentiment dynamics.

10.2196/24585 ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e24585
Author(s):  
Tiago de Melo ◽  
Carlos M S Figueiredo

Background The COVID-19 pandemic is severely affecting people worldwide. Currently, an important approach to understand this phenomenon and its impact on the lives of people consists of monitoring social networks and news on the internet. Objective The purpose of this study is to present a methodology to capture the main subjects and themes under discussion in news media and social media and to apply this methodology to analyze the impact of the COVID-19 pandemic in Brazil. Methods This work proposes a methodology based on topic modeling, namely entity recognition, and sentiment analysis of texts to compare Twitter posts and news, followed by visualization of the evolution and impact of the COVID-19 pandemic. We focused our analysis on Brazil, an important epicenter of the pandemic; therefore, we faced the challenge of addressing Brazilian Portuguese texts. Results In this work, we collected and analyzed 18,413 articles from news media and 1,597,934 tweets posted by 1,299,084 users in Brazil. The results show that the proposed methodology improved the topic sentiment analysis over time, enabling better monitoring of internet media. Additionally, with this tool, we extracted some interesting insights about the evolution of the COVID-19 pandemic in Brazil. For instance, we found that Twitter presented similar topic coverage to news media; the main entities were similar, but they differed in theme distribution and entity diversity. Moreover, some aspects represented negative sentiment toward political themes in both media, and a high incidence of mentions of a specific drug denoted high political polarization during the pandemic. Conclusions This study identified the main themes under discussion in both news and social media and how their sentiments evolved over time. It is possible to understand the major concerns of the public during the pandemic, and all the obtained information is thus useful for decision-making by authorities.


2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


2021 ◽  
pp. 1-14
Author(s):  
Hamed Zargari ◽  
Morteza Zahedi ◽  
Marziea Rahimi

Words are one of the most essential elements of expressing sentiments in context although they are not the only ones. Also, syntactic relationships between words, morphology, punctuation, and linguistic phenomena are influential. Merely considering the concept of words as isolated phenomena causes a lot of mistakes in sentiment analysis systems. So far, a large amount of research has been conducted on generating sentiment dictionaries containing only sentiment words. A number of these dictionaries have addressed the role of combinations of sentiment words, negators, and intensifiers, while almost none of them considered the heterogeneous effect of the occurrence of multiple linguistic phenomena in sentiment compounds. Regarding the weaknesses of the existing sentiment dictionaries, in addressing the heterogeneous effect of the occurrence of multiple intensifiers, this research presents a sentiment dictionary based on the analysis of sentiment compounds including sentiment words, negators, and intensifiers by considering the multiple intensifiers relative to the sentiment word and assigning a location-based coefficient to the intensifier, which increases the covered sentiment phrase in the dictionary, and enhanced efficiency of proposed dictionary-based sentiment analysis methods up to 7% compared to the latest methods.


Author(s):  
Sardar Haider Waseem Ilyas ◽  
Zainab Tariq Soomro ◽  
Ahmed Anwar ◽  
Hamza Shahzad ◽  
Ussama Yaqub

2019 ◽  
Vol 27 (3) ◽  
pp. 449-456
Author(s):  
James R Rogers ◽  
Hollis Mills ◽  
Lisa V Grossman ◽  
Andrew Goldstein ◽  
Chunhua Weng

Abstract Scientific commentaries are expected to play an important role in evidence appraisal, but it is unknown whether this expectation has been fulfilled. This study aims to better understand the role of scientific commentary in evidence appraisal. We queried PubMed for all clinical research articles with accompanying comments and extracted corresponding metadata. Five percent of clinical research studies (N = 130 629) received postpublication comments (N = 171 556), resulting in 178 882 comment–article pairings, with 90% published in the same journal. We obtained 5197 full-text comments for topic modeling and exploratory sentiment analysis. Topics were generally disease specific with only a few topics relevant to the appraisal of studies, which were highly prevalent in letters. Of a random sample of 518 full-text comments, 67% had a supportive tone. Based on our results, published commentary, with the exception of letters, most often highlight or endorse previous publications rather than serve as a prominent mechanism for critical appraisal.


2018 ◽  
Vol 40 (8) ◽  
pp. 1270-1280
Author(s):  
Tokunbo Ojo

With the mixture of government-owned media outlets and private media establishments, Nigerian news media industry is deemed as one of the leading media industries in Africa. But, in spite of its leading status on the continent, the industry is plagued with a series of multi-faceted challenges of sustainability that is rooted in the socio-economic and political contexts. Consequently, privately owned media outlets have short-life span in Nigeria. This article assesses the challenges of news media sustainability in Nigeria. The article underscores the adverse effects of structural deficit in the democratic norms and institutional capabilities on the news media sustainability in Nigeria.


2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


2021 ◽  
Author(s):  
Lucas Rodrigues ◽  
Antonio Jacob Junior ◽  
Fábio Lobato

Posts with defamatory content or hate speech are constantly foundon social media. The results for readers are numerous, not restrictedonly to the psychological impact, but also to the growth of thissocial phenomenon. With the General Law on the Protection ofPersonal Data and the Marco Civil da Internet, service providersbecame responsible for the content in their platforms. Consideringthe importance of this issue, this paper aims to analyze the contentpublished (news and comments) on the G1 News Portal with techniquesbased on data visualization and Natural Language Processing,such as sentiment analysis and topic modeling. The results showthat even with most of the comments being neutral or negative andclassified or not as hate speech, the majority of them were acceptedby the users.


Sign in / Sign up

Export Citation Format

Share Document