scholarly journals A Chronological and Geographical Analysis of Personal Reports of COVID-19 on Twitter

Author(s):  
Ari Z. Klein ◽  
Arjun Magge ◽  
Karen O’Connor ◽  
Haitao Cai ◽  
Davy Weissenbacher ◽  
...  

ABSTRACTThe rapidly evolving outbreak of COVID-19 presents challenges for actively monitoring its spread. In this study, we assessed a social media mining approach for automatically analyzing the chronological and geographical distribution of users in the United States reporting personal information related to COVID-19 on Twitter. The results suggest that our natural language processing and machine learning framework could help provide an early indication of the spread of COVID-19.

Author(s):  
S Golder ◽  
Ari Z. Klein ◽  
Arjun Magge ◽  
Karen O’Connor ◽  
Haitao Cai ◽  
...  

AbstractThe rapidly evolving COVID-19 pandemic presents challenges for actively monitoring its transmission. In this study, we extend a social media mining approach used in the US to automatically identify personal reports of COVID-19 on Twitter in England, UK. The findings indicate that natural language processing and machine learning framework could help provide an early indication of the chronological and geographical distribution of COVID-19 in England.


2015 ◽  
Vol 22 (3) ◽  
pp. 671-681 ◽  
Author(s):  
Azadeh Nikfarjam ◽  
Abeed Sarker ◽  
Karen O’Connor ◽  
Rachel Ginn ◽  
Graciela Gonzalez

Abstract Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words’ semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.


10.2196/17196 ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. e17196 ◽  
Author(s):  
Robin Stevens ◽  
Stephen Bonett ◽  
Jacqueline Bannon ◽  
Deepti Chittamuru ◽  
Barry Slaff ◽  
...  

Background Adolescents and young adults in the age range of 13-24 years are at the highest risk of developing HIV infections. As social media platforms are extremely popular among youths, researchers can utilize these platforms to curb the HIV epidemic by investigating the associations between the discourses on HIV infections and the epidemiological data of HIV infections. Objective The goal of this study was to examine how Twitter activity among young men is related to the incidence of HIV infection in the population. Methods We used integrated human-computer techniques to characterize the HIV-related tweets by male adolescents and young male adults (age range: 13-24 years). We identified tweets related to HIV risk and prevention by using natural language processing (NLP). Our NLP algorithm identified 89.1% (2243/2517) relevant tweets, which were manually coded by expert coders. We coded 1577 HIV-prevention tweets and 17.5% (940/5372) of general sex-related tweets (including emojis, gifs, and images), and we achieved reliability with intraclass correlation at 0.80 or higher on key constructs. Bivariate and multivariate analyses were performed to identify the spatial patterns in posting HIV-related tweets as well as the relationships between the tweets and local HIV infection rates. Results We analyzed 2517 tweets that were identified as relevant to HIV risk and prevention tags; these tweets were geolocated in 109 counties throughout the United States. After adjusting for region, HIV prevalence, and social disadvantage index, our findings indicated that every 100-tweet increase in HIV-specific tweets per capita from noninstitutional accounts was associated with a multiplicative effect of 0.97 (95% CI [0.94-1.00]; P=.04) on the incidence of HIV infections in the following year in a given county. Conclusions Twitter may serve as a proxy of public behavior related to HIV infections, and the association between the number of HIV-related tweets and HIV infection rates further supports the use of social media for HIV disease prevention.


2019 ◽  
Author(s):  
Robin Stevens ◽  
Stephen Bonett ◽  
Jacqueline Bannon ◽  
Deepti Chittamuru ◽  
Barry Slaff ◽  
...  

BACKGROUND Adolescents and young adults in the age range of 13-24 years are at the highest risk of developing HIV infections. As social media platforms are extremely popular among youths, researchers can utilize these platforms to curb the HIV epidemic by investigating the associations between the discourses on HIV infections and the epidemiological data of HIV infections. OBJECTIVE The goal of this study was to examine how Twitter activity among young men is related to the incidence of HIV infection in the population. METHODS We used integrated human-computer techniques to characterize the HIV-related tweets by male adolescents and young male adults (age range: 13-24 years). We identified tweets related to HIV risk and prevention by using natural language processing (NLP). Our NLP algorithm identified 89.1% (2243/2517) relevant tweets, which were manually coded by expert coders. We coded 1577 HIV-prevention tweets and 17.5% (940/5372) of general sex-related tweets (including emojis, gifs, and images), and we achieved reliability with intraclass correlation at 0.80 or higher on key constructs. Bivariate and multivariate analyses were performed to identify the spatial patterns in posting HIV-related tweets as well as the relationships between the tweets and local HIV infection rates. RESULTS We analyzed 2517 tweets that were identified as relevant to HIV risk and prevention tags; these tweets were geolocated in 109 counties throughout the United States. After adjusting for region, HIV prevalence, and social disadvantage index, our findings indicated that every 100-tweet increase in HIV-specific tweets per capita from noninstitutional accounts was associated with a multiplicative effect of 0.97 (95% CI [0.94-1.00]; <i>P</i>=.04) on the incidence of HIV infections in the following year in a given county. CONCLUSIONS Twitter may serve as a proxy of public behavior related to HIV infections, and the association between the number of HIV-related tweets and HIV infection rates further supports the use of social media for HIV disease prevention.


Among the foremost challenges with big data is how to go about analyzing it. What new tools are needed to be able to properly investigate and model the large quantities of highly complex, often messy data? Chapter 4 addresses this question by introducing and briefly exploring the fields of Machine Learning, Natural Language Processing, and Social Network Analysis, focusing on how these methods and toolsets can be utilized to make sense of big data. The authors provide a broad overview of tools, ideas, and caveats for each of these fields. This chapter ends with a look at how one major public university in the United States, the University of Texas at Arlington, is beginning to address some of the questions surrounding big data in an institutional setting. A list of additional readings is provided.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S695-S695
Author(s):  
Timothy Sullivan

Abstract Background Outpatient antibiotic misuse is common, yet it is difficult to identify and prevent. Novel methods are needed to better identify unnecessary antibiotic use in the outpatient setting. Methods The Twitter developer platform was accessed to identify Tweets describing outpatient antibiotic use in the United States between November 2018 and March 2019. Unique English-language Tweets reporting recent antibiotic use were aggregated, reviewed, and labeled as describing possible misuse or not describing misuse. Possible misuse was defined as antibiotic use for a diagnosis or symptoms for which antibiotics are not indicated based on national guidelines, or the use of antibiotics without evaluation by a healthcare provider (Figure 1). Tweets were randomly divided into training and testing sets consisting of 80% and 20% of the data, respectively. Training set Tweets were preprocessed via a natural language processing pipeline, converted into numerical vectors, and used to generate a logistic regression algorithm to predict misuse in the testing set. Analyses were performed in Python using the scikit-learn and nltk libraries. Results 4000 Tweets were included, of which 1028 were labeled as describing possible outpatient antibiotic misuse. The algorithm correctly identified Tweets describing possible antibiotic misuse in the testing set with specificity = 94%, sensitivity = 55%, PPV = 75%, NPV = 87%, and area under the ROC curve = 0.91 (Figure 2). Conclusion A machine learning algorithm using Twitter data identified episodes of self-reported antibiotic misuse with good test performance, as defined by the area under the ROC curve. Analysis of Twitter data captured some episodes of antibiotic misuses, such as the use of non-prescribed antibiotics, that are not easily identified by other methods. This approach could be used to generate novel insights into the causes and extent of antibiotic misuse in the United States, and to monitor antibiotic misuse in real time. Disclosures All authors: No reported disclosures.


2020 ◽  
Vol 6 (30) ◽  
pp. eabb5824 ◽  
Author(s):  
Meysam Alizadeh ◽  
Jacob N. Shapiro ◽  
Cody Buntain ◽  
Joshua A. Tucker

We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.


2021 ◽  
Author(s):  
Emmanuel Odame ◽  
Oluwabunmi Dada ◽  
Jordan Nelson ◽  
Ayorinde Ogunyiola ◽  
Jessica Ashley Haley

BACKGROUND Vaccine hesitancy remains a major barrier to the successful campaign of vaccine programs, including COVID-19, globally. Understanding themes in perceptions among populations regarding vaccine science can aid in improving program implementation as well as potentially reduce socially induced vaccine hesitancy. Social media has been shown to be an increasingly useful tool for rapidly understanding public perceptions regarding public health concerns including vaccine adoption. However, specific themes regarding vaccine perceptions immediately after the COVID-19 vaccine release to the public has yet to be investigated. OBJECTIVE This study aimed to investigate the perceptions surrounding the COVID-19 vaccine among United States, Brazil, and India Twitter users within weeks post vaccine release. METHODS We collected Twitter data through Meltwater software using keywords including coronavirus, vaccines, Pfizer-BioTech, Moderna, Johnson & Johnson/Jassen, AstraZeneca, Novavax, Sinovac-Biotech, Covaxin, Covishield, Sputnik, United States, Brazil, and India in our search query. These keywords were also combined in a Boolean search style (i.e. COVID-19 and Pfizer) to retrieve relevant social media posts on COVID-19 vaccination and vaccines. We used R software to remove usernames, weblinks and other personal information and then the Nvivo 12 statistical software to analyze tweets and draw meanings through a qualitative interpretative approach. RESULTS Three key themes related to vaccine perception among 2,858 Twitter posts in the United States, Brazil, and India emerged in our analysis. These themes were mistrust in vaccine science (91.5%), religious push backs (5.4%), and politics of vaccination (3.5%). Several subthemes also emerged from these Twitter data. CONCLUSIONS Identifying social implications of COVID-19 vaccine hesitancy is vital in combating vaccine related misinformation as well as provision of accurate vaccine related public communications regarding vaccine acceptance among populations globally.


10.2196/31983 ◽  
2021 ◽  
Vol 1 (1) ◽  
pp. e31983
Author(s):  
Arriel Benis ◽  
Anat Chatsubi ◽  
Eugene Levner ◽  
Shai Ashkenazi

Background Discussions of health issues on social media are a crucial information source reflecting real-world responses regarding events and opinions. They are often important in public health care, since these are influencing pathways that affect vaccination decision-making by hesitant individuals. Artificial intelligence methodologies based on internet search engine queries have been suggested to detect disease outbreaks and population behavior. Among social media, Twitter is a common platform of choice to search and share opinions and (mis)information about health care issues, including vaccination and vaccines. Objective Our primary objective was to support the design and implementation of future eHealth strategies and interventions on social media to increase the quality of targeted communication campaigns and therefore increase influenza vaccination rates. Our goal was to define an artificial intelligence–based approach to elucidate how threads in Twitter on influenza vaccination changed during the COVID-19 pandemic. Such findings may support adapted vaccination campaigns and could be generalized to other health-related mass communications. Methods The study comprised the following 5 stages: (1) collecting tweets from Twitter related to influenza, vaccines, and vaccination in the United States; (2) data cleansing and storage using machine learning techniques; (3) identifying terms, hashtags, and topics related to influenza, vaccines, and vaccination; (4) building a dynamic folksonomy of the previously defined vocabulary (terms and topics) to support the understanding of its trends; and (5) labeling and evaluating the folksonomy. Results We collected and analyzed 2,782,720 tweets of 420,617 unique users between December 30, 2019, and April 30, 2021. These tweets were in English, were from the United States, and included at least one of the following terms: “flu,” “influenza,” “vaccination,” “vaccine,” and “vaxx.” We noticed that the prevalence of the terms vaccine and vaccination increased over 2020, and that “flu” and “covid” occurrences were inversely correlated as “flu” disappeared over time from the tweets. By combining word embedding and clustering, we then identified a folksonomy built around the following 3 topics dominating the content of the collected tweets: “health and medicine (biological and clinical aspects),” “protection and responsibility,” and “politics.” By analyzing terms frequently appearing together, we noticed that the tweets were related mainly to COVID-19 pandemic events. Conclusions This study focused initially on vaccination against influenza and moved to vaccination against COVID-19. Infoveillance supported by machine learning on Twitter and other social media about topics related to vaccines and vaccination against communicable diseases and their trends can lead to the design of personalized messages encouraging targeted subpopulations’ engagement in vaccination. A greater likelihood that a targeted population receives a personalized message is associated with higher response, engagement, and proactiveness of the target population for the vaccination process.


2020 ◽  
Author(s):  
Wallace Chipidza ◽  
Jie Yan

There is vigorous debate as to whether influential social media platforms like Twitter and Facebook should censor objectionable posts by government officials in the United States and elsewhere. Although these platforms have resisted pressure to censor such posts in the past, Twitter recently flagged five posts by the United States President Donald J. Trump on the rationale that the tweets contained inaccurate or inflammatory content. In this paper, we examine preliminary evidence as to whether these posts were retweeted less or more than expected. We employ 10 machine learning (ML) algorithms to estimate the expected number of retweets based on 8 features of each tweet from historical data since President Trump was elected: number of likes, word count, readability, polarity, subjectivity, presence of link or multimedia content, time of day of posting, and number of days since Trump’s election. Our results indicate agreement from all 10 ML algorithms that the three flagged tweets for which we had retweet data were retweeted at higher rates than expected. These results suggest that flagging tweets by government officials might be counterproductive towards the spread of content deemed objectionable by social media platforms.


Sign in / Sign up

Export Citation Format

Share Document