scholarly journals Semantic Analysis of Open Source Data for Syndromic Surveillance

Author(s):  
Erica Briscoe ◽  
Scott Appling ◽  
Edward Clarkson ◽  
Nikolay Lipskiy ◽  
James Tyson ◽  
...  

ObjectiveThe objective of this analysis is to leverage recent advances innatural language processing (NLP) to develop new methods andsystem capabilities for processing social media (Twitter messages)for situational awareness (SA), syndromic surveillance (SS), andevent-based surveillance (EBS). Specifically, we evaluated the useof human-in-the-loop semantic analysis to assist public health (PH)SA stakeholders in SS and EBS using massive amounts of publiclyavailable social media data.IntroductionSocial media messages are often short, informal, and ungrammatical.They frequently involve text, images, audio, or video, which makesthe identification of useful information difficult. This complexityreduces the efficacy of standard information extraction techniques1.However, recent advances in NLP, especially methods tailoredto social media2, have shown promise in improving real-time PHsurveillance and emergency response3. Surveillance data derived fromsemantic analysis combined with traditional surveillance processeshas potential to improve event detection and characterization. TheCDC Office of Public Health Preparedness and Response (OPHPR),Division of Emergency Operations (DEO) and the Georgia TechResearch Institute have collaborated on the advancement of PH SAthrough development of new approaches in using semantic analysisfor social media.MethodsTo understand how computational methods may benefit SS andEBS, we studied an iterative refinement process, in which the datauser actively cultivated text-based topics (“semantic culling”) in asemi-automated SS process. This ‘human-in-the-loop’ process wascritical for creating accurate and efficient extraction functions in large,dynamic volumes of data. The general process involved identifyinga set of expert-supplied keywords, which were used to collect aninitial set of social media messages. For purposes of this analysisresearchers applied topic modeling to categorize related messages intoclusters. Topic modeling uses statistical techniques to semanticallycluster and automatically determine salient aggregations. A user thensemantically culled messages according to their PH relevance.In June 2016, researchers collected 7,489 worldwide English-language Twitter messages (tweets) and compared three samplingmethods: a baseline random sample (C1, n=2700), a keyword-basedsample (C2, n=2689), and one gathered after semantically cullingC2 topics of irrelevant messages (C3, n=2100). Researchers utilizeda software tool, Luminoso Compass4, to sample and perform topicmodeling using its real-time modeling and Twitter integrationfeatures. For C2 and C3, researchers sampled tweets that theLuminoso service matched to both clinical and layman definitions ofRash, Gastro-Intestinal syndromes5, and Zika-like symptoms. Laymanterms were derived from clinical definitions from plain languagemedical thesauri. ANOVA statistics were calculated using SPSSsoftware, version. Post-hoc pairwise comparisons were completedusing ANOVA Turkey’s honest significant difference (HSD) test.ResultsAn ANOVA was conducted, finding the following mean relevancevalues: 3% (+/- 0.01%), 24% (+/- 6.6%) and 27% (+/- 9.4%)respectively for C1, C2, and C3. Post-hoc pairwise comparison testsshowed the percentages of discovered messages related to the eventtweets using C2 and C3 methods were significantly higher than forthe C1 method (random sampling) (p<0.05). This indicates that thehuman-in-the-loop approach provides benefits in filtering socialmedia data for SS and ESB; notably, this increase is on the basis ofa single iteration of semantic culling; subsequent iterations could beexpected to increase the benefits.ConclusionsThis work demonstrates the benefits of incorporating non-traditional data sources into SS and EBS. It was shown that an NLP-based extraction method in combination with human-in-the-loopsemantic analysis may enhance the potential value of social media(Twitter) for SS and EBS. It also supports the claim that advancedanalytical tools for processing non-traditional SA, SS, and EBSsources, including social media, have the potential to enhance diseasedetection, risk assessment, and decision support, by reducing the timeit takes to identify public health events.

2019 ◽  
Vol 29 (Supplement_4) ◽  
Author(s):  

Abstract Digital health has revolutionised healthcare, with implications for understanding public reaction to health emergencies and interventions. Social media provides a space where like-minded people can share interests and concerns in real-time, regardless of their location. This can be a force for good, as platforms like Twitter can spread correct information about outbreaks, for example in the 2009 swine flu pandemic. However, social media can also disseminate incorrect information or deliberately spread misinformation leading to adverse public health sentiment and outcomes. The current issues around trust in vaccines is the best-known example. Vaccine hesitancy, traditionally linked to issues of trust, misinformation and prior beliefs, has been increasingly fueled by influential groups on social media and the Internet. Ultimately, anti-vaccination movements have the potential to lead to outbreaks of vaccine-preventable diseases, especially if refusal is concentrated locally, creating vulnerable populations. For example, 2018-19 saw a large increase in incidence of measles in the US and Europe (where cases tripled from 2017), two regions where the disease was already or almost eliminated. In 2019, the World Health Organisation listed anti-vaccination movements as one of the top 10 threats to global public health. HPV vaccination is another example of the impact of anti-vaccination movements. As viral videos originating on YouTube spread across social networks, uptake has tumbled in a number of countries, with Japan, Denmark, Colombia and Ireland being badly hit. In Japan, the government came under sufficient pressure that they de-recommended HPV vaccine, seeing an 80% uptake rate fall below 1% in 2014. There have been reports of successful interventions by national governments. A recent campaign run by the HPV Alliance (a coalition of some 35 private companies, charities and public institutions) in Ireland has seen rates below 40% back up to a national average of 75%. A combination of hard-hitting personal testimonials, social media and traditional media promoted the HPV vaccine. Despite this, systematic engagement and supranational strategies are still in the early stages of being formulated. As misleading information spread through social media and digital networks has undesirable impact on attitudes to vaccination (and uptake rates), urgent actions are required. Analysis and visualisation techniques mining data streams from social media platforms, such as Twitter, Youtube enable real-time understanding of vaccine sentiments and information flows. Through identification of key influencers and flashpoints in articles about vaccination going viral, targeted public health responses could be developed. This roundtable discussion will showcase different ways in which media and social networks, accessible in real-time provide an opportunity for detecting a change in public confidence in vaccines, for identifying users and rumors and for assessing potential impact in order to know how to best respond. Key messages Social media has significantly enhanced our understanding of anti-vaccination movements and potential impact on public health attitudes and behaviors regarding vaccination. Innovative methods of analysing social media data, from digital health, data science and computer science, have an important role in developing health promotions to counter anti-vaccination movements.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
W De Caro

Abstract Introduction Covid-19 epidemic lead a huge use of social media to comment and spread information from the widest sources. Infodemia looks at excessive amount of information circulating, which makes it difficult to orientate communities on a given topic due to the difficulty of identifying reliable sources. Using text mining analysis it is possible to identify what drives public conversation and impact of Covid-19. Methods Public perceptions in emergencies is traditionally measured with surveys. However, to have a global sight of the pandemia, Twitter represents a powerful tool which gives real-time monitoring of public perception. The study aimed to: 1) monitor the use of the terms “Covid-19” or “Coronarivus” over time; and 2) to conduct a specific text and sentiment analysis. Results Between January 10 and May 8, 2020, over 600 million tweets were retrieved. Of those 600.000 tweets were randomly selected, coded, and analyzed. About 10% of cases were identified as misinformation. Public figures, experts in public health, and virologists represent the most popular sources in comparison to the official government and health agencies. There is a positive correlation between Twitter activity peaks and COVID-19 infection peaks. Text mining analysis was carried out, as well as a content analysis, also in order to identify changing emotions and sentiments during time. This analysis, particularly during the lockdown, clearly shows that participation on social media can potentially have an effect on building social capital and social support. Conclusions This study confirms that using social media to conduct infodemic studies is an important area of development in public health arena. COVID-19 tweets were primarily used to disseminate information from credible sources, but were also a source of opinions, emotion and experiences. Tweets can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns. Key messages Social media is crucial for health information. Infodemia as new way for study health.


2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Ryan M. Arnold ◽  
Wesley McNeely ◽  
Kasimu Muhetaer ◽  
Biru Yang ◽  
Raouf R. Arafat

Firearm-related injuries pose a substantial public health risk in the United States, and traditional means of studying this issue rely primarily on retrospective analyses. Syndromic surveillance, collected in over 30 Houston area emergency departments, is well suited to characterize and analyze gunshot injuries in the area in near real-time. Over the past two years, more than 900 gunshot-related injury visits were identified using this method, and ArcGIS effectively identified incident densities in ZIP codes throughout Houston. Most patients were males (86.3%), between the ages of 18 and 34 (64.7%).


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Kayley Dotson ◽  
Mandy Billman

ObjectiveTo identify surveillance coverage gaps in emergency department (ED) and urgent care facility data due to missing discharge diagnoses.IntroductionIndiana utilizes the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE) to collect and analyze data from participating hospital emergency departments. This real-time collection of health related data is used to identify disease clusters and unusual disease occurrences. By Administrative Code, the Indiana State Department of Health (ISDH) requires electronic submission of chief complaints from patient visits to EDs. Submission of discharge diagnosis is not required by Indiana Administrative Code, leaving coverage gaps. Our goal was to identify which areas in the state may see under reporting or incomplete surveillance due to the lack of the discharge diagnosis field.MethodsEmergency department data were collected from Indiana hospitals and urgent care clinics via ESSENCE. Discharge diagnosis was analyzed by submitting facility to determine percent completeness of visits. A descriptive analysis was conducted to identify the distribution of facilities that provide discharge diagnosis. A random sample of 20 days of data were extracted from visits that occurred between January 1, 2017 and September 6, 2017.ResultsA random sample of 179,039 (8%) ED entries from a total of 2,220,021 were analyzed from 121 reporting facilities. Of the sample entries, 102,483 (57.24%) were missing the discharge diagnosis field. Over 40 (36%) facilities were missing more than 90% of discharge diagnosis data. Facilities are more likely to be missing >90% or <19% of discharge diagnoses, rather than between those points.Comparing the percent of syndromic surveillance entries missing discharge diagnosis across facilities reveals large variability. For example, some facilities provide no discharge diagnoses while other facilities provide 100%. The number of facilities missing 100% of discharge diagnoses (n = 19) is 6.3 times that of the facilities that are missing 0% (n = 3).The largest coverage gap was identified in Public Health Preparedness District (PHPD)1 three (93.16%), with districts five (64.97%), seven (61.94%), and four (61.34%) making up the lowest reporting districts. See Table 2 and Figure 12 for percent missing by district and geographic distribution. PHPD three and five contain a large proportion (38%) of the sample population ED visits which results in a coverage gap in the most populated areas of the state.ConclusionsQuerying ESSENCE via chief complaint data is useful for real-time surveillance, but is more informative when discharge diagnoses are available. Indiana does not require facilities to report discharge diagnosis, but regulatory changes are being proposed that would require submission of discharge diagnosis data to ISDH. The addition of discharge diagnose is aimed to improve the completeness of disease clusters and unusual disease occurrence surveillance data.References1. Preparedness Districts [Internet]. Indianapolis (IN): Indiana State Department of Health, Public Health Preparedness; 2017 [Cited 2017 Sept 20]. Available from: https://www.in.gov/isdh/17944.htm. 


2019 ◽  
Author(s):  
Abhisek Chowdhury

Social media feeds are rapidly emerging as a novel avenue for the contribution and dissemination of geographic information. Among which Twitter, a popular micro-blogging service, has recently gained tremendous attention for its real-time nature. For instance, during floods, people usually tweet which enable detection of flood events by observing the twitter feeds promptly. In this paper, we propose a framework to investigate the real-time interplay between catastrophic event and peo-ples’ reaction such as flood and tweets to identify disaster zones. We have demonstrated our approach using the tweets following a flood in the state of Bihar in India during year 2017 as a case study. We construct a classifier for semantic analysis of the tweets in order to classify them into flood and non-flood categories. Subsequently, we apply natural language processing methods to extract information on flood affected areas and use elevation maps to identify potential disaster zones.


2020 ◽  
Author(s):  
Hyeju Jang ◽  
Emily Rempel ◽  
David Roth ◽  
Giuseppe Carenini ◽  
Naveed Z. Janjua

BACKGROUND Social media is a rich source where we can learn about people’s reactions to social issues. As COVID-19 has significantly impacted on people’s lives, it is essential to capture how people react to public health interventions and understand their concerns. OBJECTIVE We aim to investigate people’s reactions and concerns about COVID-19 in North America, especially focusing on Canada. METHODS We analyze COVID-19 related tweets using topic modeling and aspect-based sentiment analysis (ABSA), and interpret the results with public health experts. To generate insights on the effectiveness of specific public health interventions for COVID-19, we compare timelines of topics discussed with timing of implementation of interventions, synergistically including information on people’s sentiment about COVID-19 related aspects in our analysis. In addition, to further investigate anti-Asian racism, we compare timelines of sentiments for Asians and Canadians. RESULTS Topic modeling identified 20 topics and public health experts provided interpretations of the topics based on top-ranked words and representative tweets for each topic. The interpretation and timeline analysis showed that the discovered topics and their trend are highly related to public health promotions and interventions, such as physical distancing, border restrictions, hand washing, staying-home, and face coverings. After training the data using ABSA with human-in-the-loop, we obtained 545 aspect terms (e.g., “vaccines”, “economy”, and “masks”) and 60 opinion terms (e.g., “infectious”- negative, and “professional”- positive), which were used for inference of sentiments of 20 selected aspects. The results showed negative sentiments related to overall outbreak, misinformation, and Asians and positive sentiments related to physical distancing. CONCLUSIONS Analyses using Natural Language Processing (NLP) techniques with domain expert involvement can produce useful information for public health. This study is the first to analyze COVID-19 related tweets in Canada in comparison with tweets in the United States by using topic modeling and human-in-the-loop domain-specific aspect-based sentiment analysis. This kind of information could help public health agencies to understand public concerns as well as what public health messages are resonating in our populations who use Twitter, which can be helpful for public health agencies when designing a policy for new interventions.


2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Anneliese Depoux ◽  
Sam Martin ◽  
Emilie Karafillakis ◽  
Raman Preet ◽  
Annelies Wilder-Smith ◽  
...  

We need to rapidly detect and respond to public rumours, perceptions, attitudes and behaviours around COVID-19 and control measures. The creation of an interactive platform and dashboard to provide real-time alerts of rumours and concerns about coronavirus spreading globally would enable public health officials and relevant stakeholders to respond rapidly with a proactive and engaging narrative that can mitigate misinformation.


2017 ◽  
Vol 48 (3) ◽  
pp. 588-607 ◽  
Author(s):  
James P. Houghton ◽  
Michael Siegel ◽  
Stuart Madnick ◽  
Nobuaki Tounaka ◽  
Kazutaka Nakamura ◽  
...  

The potential of social media to give insight into the dynamic evolution of public conversations, and into their reactive and constitutive role in political activities, has to date been underdeveloped. While topic modeling can give static insight into the structure of a conversation, and keyword volume tracking can show how engagement with a specific idea varies over time, there is need for a method of analysis able to understand how conversations about societal values evolve and react to events in the world by incorporating new ideas and relating them to existing themes. In this article, we propose a method for analyzing social media messages that formalizes the structure of public conversations and allows the sociologist to study the evolution of public discourse in a rigorous, replicable, and data-driven fashion. This approach may be useful to those studying the social construction of meaning, the origins of factionalism and internecine conflict, or boundary-setting and group-identification exercises and has potential implications.


Sign in / Sign up

Export Citation Format

Share Document