scholarly journals Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set (Preprint)

2020 ◽  
Author(s):  
Emily Chen ◽  
Kristina Lerman ◽  
Emilio Ferrara

BACKGROUND At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. OBJECTIVE In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. METHODS We started this ongoing data collection on January 28, 2020, leveraging Twitter’s streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter’s search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. RESULTS Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. CONCLUSIONS It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic—and undoubtedly more.

10.2196/19273 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e19273 ◽  
Author(s):  
Emily Chen ◽  
Kristina Lerman ◽  
Emilio Ferrara

Background At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. Objective In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. Methods We started this ongoing data collection on January 28, 2020, leveraging Twitter’s streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter’s search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. Results Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. Conclusions It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic—and undoubtedly more.


2017 ◽  
Vol 36 (2) ◽  
pp. 195-211 ◽  
Author(s):  
Patrick Rafail

Twitter data are widely used in the social sciences. The Twitter Application Programming Interface (API) allows researchers to build large databases of user activity efficiently. Despite the potential of Twitter as a data source, less attention has been paid to issues of sampling, and in particular, the implications of different sampling strategies on overall data quality. This research proposes a set of conceptual distinctions between four types of populations that emerge when analyzing Twitter data and suggests sampling strategies that facilitate more comprehensive data collection from the Twitter API. Using three applications drawn from large databases of Twitter activity, this research also compares the results from the proposed sampling strategies, which provide defensible representations of the population of activity, to those collected with more frequently used hashtag samples. The results suggest that hashtag samples misrepresent important aspects of Twitter activity and may lead researchers to erroneous conclusions.


The rise of social media platforms like Twitter and the increasing adoption by people in order to stay connected provide a large source of data to perform analysis based on the various trends, events and even various personalities. Such analysis also provides insight into a person’s likes and inclinations in real time independent of the data size. Several techniques have been created to retrieve such data however the most efficient technique is clustering. This paper provides an overview of the algorithms of the various clustering methods as well as looking at their efficiency in determining trending information. The clustered data may be further classified by topics for real time analysis on a large dynamic data set. In this paper, data classification is performed and analyzed for flaws followed by another classification on the same data set.


2021 ◽  
Vol 14 (1) ◽  
pp. 410-419
Author(s):  
Mohammed Jabardi ◽  
◽  
Asaad Hadi ◽  

One of the most popular social media platforms, Twitter is used by millions of people to share information, broadcast tweets, and follow other users. Twitter is an open application programming interface and thus vulnerable to attack from fake accounts, which are primarily created for advertisement and marketing, defamation of an individual, consumer data acquisition, increase fake blog or website traffic, share disinformation, online fraud, and control. Fake accounts are harmful to both users and service providers, and thus recognizing and filtering out such content on social media is essential. This study presents a new approach to detect fake Twitter accounts using ontology and Semantic Web Rule Language (SWRL) rules. SWRL rules-based reasoner is utilized under predefined rules to infer whether the profile is trust or fake. This approach achieves a high detection accuracy of 97%. Furthermore, ontology classifier is an interpretable model that offers straightforward and human-interpretable decision rules.


2020 ◽  
Author(s):  
Yankun Gao ◽  
Zidian Xie ◽  
Dongmei Li

BACKGROUND Previous studies have shown that electronic cigarette (e-cigarette) users might be more vulnerable to COVID-19 infection and could develop more severe symptoms if they contract the disease owing to their impaired immune responses to viral infections. Social media platforms such as Twitter have been widely used by individuals worldwide to express their responses to the current COVID-19 pandemic. OBJECTIVE In this study, we aimed to examine the longitudinal changes in the attitudes of Twitter users who used e-cigarettes toward the COVID-19 pandemic, as well as compare differences in attitudes between e-cigarette users and nonusers based on Twitter data. METHODS The study dataset containing COVID-19–related Twitter posts (tweets) posted between March 5 and April 3, 2020, was collected using a Twitter streaming application programming interface with COVID-19–related keywords. Twitter users were classified into two groups: Ecig group, including users who did not have commercial accounts but posted e-cigarette–related tweets between May 2019 and August 2019, and non-Ecig group, including users who did not post any e-cigarette–related tweets. Sentiment analysis was performed to compare sentiment scores towards the COVID-19 pandemic between both groups and determine whether the sentiment expressed was positive, negative, or neutral. Topic modeling was performed to compare the main topics discussed between the groups. RESULTS The US COVID-19 dataset consisted of 4,500,248 COVID-19–related tweets collected from 187,399 unique Twitter users in the Ecig group and 11,479,773 COVID-19–related tweets collected from 2,511,659 unique Twitter users in the non-Ecig group. Sentiment analysis showed that Ecig group users had more negative sentiment scores than non-Ecig group users. Results from topic modeling indicated that Ecig group users had more concerns about deaths due to COVID-19, whereas non-Ecig group users cared more about the government’s responses to the COVID-19 pandemic. CONCLUSIONS Our findings show that Twitter users who tweeted about e-cigarettes had more concerns about the COVID-19 pandemic. These findings can inform public health practitioners to use social media platforms such as Twitter for timely monitoring of public responses to the COVID-19 pandemic and educating and encouraging current e-cigarette users to quit vaping to minimize the risks associated with COVID-19.


2019 ◽  
Vol 19 (4) ◽  
pp. 513-530
Author(s):  
Stuart Palmer ◽  
Nilupa Udawatta

PurposeSustainable construction is widely considered to be the best practice in construction, helping to create a healthy built environment. Social media is identified as a valuable data source for research on sustainable construction, and Twitter is a popular social media platform in relation to the construction. Green Building construction is identified as one of the methods that promotes sustainable construction. The purpose of this study is to characterise “Green Building” as a topic in Twitter.Design/methodology/approachSocial network analysis methods were applied to a large set of Twitter data related to “green building”. Time sequence analysis and network visualisation were used to characterise Twitter activity and to identify influential users. Text analytics and visualisation methods were applied to the same data set to visualise the text content of Twitter posts relating to green building.FindingsPeaks in Twitter activity were associated with physical “green building” events. The network visualisation of the Twitter data revealed a complex structure and a range of types of interactions. The most “influential” users depended on the ranking method used; however, a number of users had high influence in all measures used. The tweet text visualisation showed evidence of a global and interactive audience on Twitter engaged in conversations about green building. Also, it was found that external links, emoji and popular terms related to a particular topic can be used to increase the engagement of Twitter users on that topic.Originality/valueCertain Green Building events were observed to be associated with high levels of Twitter activity. The virtual was found to be closely linked to the physical, and for the promotion of green building construction, their respective impact is potentially the most powerful when used in conjunction. The most influential Twitter accounts did not belong to one class of user, including both individuals and organisations. Twitter offers a platform for a range of stakeholders in the area of green building construction to reach a substantial audience and to be influential in the public sphere. The findings of this research provide a valuable reference for industry practitioners and researchers to deepen their understanding of the application of Twitter to green building construction, and the methods of using Twitter to promote important information related to sustainable construction.


Data ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 20
Author(s):  
Amir Haghighati ◽  
Kamran Sedig

Through social media platforms, massive amounts of data are being produced. As a microblogging social media platform, Twitter enables its users to post short updates as “tweets” on an unprecedented scale. Once analyzed using machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight into different domains of discussion and public opinion. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. In this paper, we describe VARTTA (Visual Analytics for Real-Time Twitter datA), a visual analytics system that combines data visualizations, human-data interaction, and ML algorithms to help users monitor, analyze, and make sense of the streams of tweets in a real-time manner. As a case study, we demonstrate the use of VARTTA in political discussions. VARTTA not only provides users with powerful analytical tools, but also enables them to diagnose and to heuristically suggest fixes for the errors in the outcome, resulting in a more detailed understanding of the tweets. Finally, we outline several issues to be considered while designing other similar visual analytics systems.


2017 ◽  
Vol 14 ◽  
pp. 63-69
Author(s):  
Valentina Grasso ◽  
Alfonso Crisci ◽  
Marco Morabito ◽  
Paolo Nesi ◽  
Gianni Pantaleo ◽  
...  

Abstract. During emergencies, an increasing number of messages are shared through social media platforms, becoming a primary source of information for lay people and emergency managers. Weather services and institutions have started to employ social media to deliver weather warnings even if sometimes this communication lacks in strategy. In Twitter, for example, hashtagging is very important to associate messages with certain topics; in recent years, codified hashtagging is emerging as a practical way to coordinate Twitter conversations during emergencies and quickly retrieve relevant information. In 2014, a syntax for codified hashtags for weather warning was proposed in Italy: a list of 20 hashtags, realized by combining #allertameteo (weather warning) + XXX, where final letters code the regional identification. This contribution presents a monitoring of Twitter usage of weather warning codified hashtags in Italy (since July 2015) and an analysis of different contexts. Twitter messages were retrieved using TwitterVigilance, a multi-users platform to crawl Twitter data, collect and store messages and perform quantitative analytics, about users, hashtags, tweets/retweets volumes. The Codified Hashtags data set is presented and discussed with main analytics and evaluation of regional contexts where it was successfully employed.


2021 ◽  
pp. postgradmedj-2021-140685
Author(s):  
Robert Marcec ◽  
Robert Likic

IntroductionA worldwide vaccination campaign is underway to bring an end to the SARS-CoV-2 pandemic; however, its success relies heavily on the actual willingness of individuals to get vaccinated. Social media platforms such as Twitter may prove to be a valuable source of information on the attitudes and sentiment towards SARS-CoV-2 vaccination that can be tracked almost instantaneously.Materials and methodsThe Twitter academic Application Programming Interface was used to retrieve all English-language tweets mentioning AstraZeneca/Oxford, Pfizer/BioNTech and Moderna vaccines in 4 months from 1 December 2020 to 31 March 2021. Sentiment analysis was performed using the AFINN lexicon to calculate the daily average sentiment of tweets which was evaluated longitudinally and comparatively for each vaccine throughout the 4 months.ResultsA total of 701 891 tweets have been retrieved and included in the daily sentiment analysis. The sentiment regarding Pfizer and Moderna vaccines appeared positive and stable throughout the 4 months, with no significant differences in sentiment between the months. In contrast, the sentiment regarding the AstraZeneca/Oxford vaccine seems to be decreasing over time, with a significant decrease when comparing December with March (p<0.0000000001, mean difference=−0.746, 95% CI=−0.915 to −0.577).ConclusionLexicon-based Twitter sentiment analysis is a valuable and easily implemented tool to track the sentiment regarding SARS-CoV-2 vaccines. It is worrisome that the sentiment regarding the AstraZeneca/Oxford vaccine appears to be turning negative over time, as this may boost hesitancy rates towards this specific SARS-CoV-2 vaccine.


Author(s):  
Vishal R. Patel ◽  
Sofia Gereta ◽  
Christopher J. Blanton ◽  
Alexander L. Chu ◽  
Neha K. Reddy ◽  
...  

PURPOSE Colorectal cancer (CRC) is the second leading cause of cancer-related mortality worldwide. Social media platforms such as Twitter are extensively used to communicate about cancer care, yet little is known about the role of these online platforms in promoting early detection or sharing the lived experiences of patients with CRC. This study tracked Twitter discussions about CRC and characterized participating users to better understand public communication and perceptions of CRC during the COVID-19 pandemic. METHODS Tweets containing references to CRC were collected from January 2020 to April 2021 using Twitter's Application Programming Interface. Account metadata was used to predict user demographic information and classify users as either organizations, individuals, clinicians, or influencers. We compared the number of impressions across users and analyzed the content of tweets using natural language processing models to identify prominent topics of discussion. RESULTS There were 72,229 unique CRC-related tweets by 31,170 users. Most users were male (66%) and older than 40 years (57%). Individuals accounted for most users (44%); organizations (35%); clinicians (19%); and influencers (2%). Influencers made the most median impressions (35,853). Organizations made the most overall impressions (1,067,189,613). Tweets contained the following topics: bereavement (20%), appeals for early detection (20%), research (17%), National Colorectal Cancer Awareness Month (15%), screening access (14%), and risk factors (14%). CONCLUSION Discussions about CRC largely focused on bereavement and early detection. Online coverage of National Colorectal Cancer Awareness Month and personal experiences with CRC effectively stimulated goal-oriented tweets about early detection. Our findings suggest that although Twitter is commonly used for communicating about CRC, partnering with influencers may be an effective strategy for improving communication of future public health recommendations related to CRC.


Sign in / Sign up

Export Citation Format

Share Document