Spikes and Variance: Using Google Trends to Detect and Forecast Protests

2021 ◽  
pp. 1-18
Author(s):  
Joan C. Timoneda ◽  
Erik Wibbels

Abstract Google search is ubiquitous, and Google Trends (GT) is a potentially useful access point for big data on many topics the world over. We propose a new ‘variance-in-time’ method for forecasting events using GT. By collecting multiple and overlapping samples of GT data over time, our algorithm leverages variation both in the mean and the variance of a search term in order to accommodate some idiosyncracies in the GT platform. To elucidate our approach, we use it to forecast protests in the United States. We use data from the Crowd Counting Consortium between 2017 and 2019 to build a sample of true protest events as well as a synthetic control group where no protests occurred. The model’s out-of-sample forecasts predict protests with higher accuracy than extant work using structural predictors, high frequency event data, or other sources of big data such as Twitter. Our results provide new insights into work specifically on political protests, while providing a general approach to GT that should be useful to researchers of many important, if rare, phenomena.

Lupus ◽  
2017 ◽  
Vol 26 (8) ◽  
pp. 886-889 ◽  
Author(s):  
M Radin ◽  
S Sciascia

Objective People affected by chronic rheumatic conditions, such as systemic lupus erythematosus (SLE), frequently rely on the Internet and search engines to look for terms related to their disease and its possible causes, symptoms and treatments. ‘Infodemiology’ and ‘infoveillance’ are two recent terms created to describe a new developing approach for public health, based on Big Data monitoring and data mining. In this study, we aim to investigate trends of Internet research linked to SLE and symptoms associated with the disease, applying a Big Data monitoring approach. Methods We analysed the large amount of data generated by Google Trends, considering ‘lupus’, ‘relapse’ and ‘fatigue’ in a 10-year web-based research. Google Trends automatically normalized data for the overall number of searches, and presented them as relative search volumes, in order to compare variations of different search terms across regions and periods. The Menn–Kendall test was used to evaluate the overall seasonal trend of each search term and possible correlation between search terms. Results We observed a seasonality for Google search volumes for lupus-related terms. In the Northern hemisphere, relative search volumes for ‘lupus’ were correlated with ‘relapse’ (τ = 0.85; p = 0.019) and with fatigue (τ = 0.82; p = 0.003), whereas in the Southern hemisphere we observed a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.85; p = 0.018). Similarly, a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.70; p < 0.001) was seen also in the Northern hemisphere. Conclusion Despite the intrinsic limitations of this approach, Internet-acquired data might represent a real-time surveillance tool and an alert for healthcare systems in order to plan the most appropriate resources in specific moments with higher disease burden.


2018 ◽  
Vol 33 (4) ◽  
pp. 611-615 ◽  
Author(s):  
Zachary H. Hopkins ◽  
Aaron M. Secrest

Purpose: Google Trends (GT) offers insights into public interests and behaviors and holds potential for guiding public health campaigns. We evaluated trends in US searches for sunscreen, sunburn, skin cancer, and melanoma and their relationships with melanoma outcomes. Design: Google Trends was queried for US search volumes from 2004 to 2017. Time-matched search term data were correlated with melanoma outcomes data from Surveillance Epidemiology and End Results Program and United States Cancer Statistics databases (2004-2014 and 2010-2014, respectively). Setting: Users of the Google search engine in the United States. Participants: Google search engine users in the United States. This represents approximately 65% of the population. Measures: Search volumes, melanoma outcomes. Analysis: Pearson correlations between search term volumes, time, and national melanoma outcomes. Spearman correlations between state-level search data and melanoma outcomes. Results: The terms “sunscreen,” “sunburn,” “skin cancer,” and “melanoma” were all highly correlated ( P < .001), with sunscreen and sunburn having the greatest correlation ( r = 0.95). Sunscreen/sunburn searches have increased over time, but skin cancer/melanoma searches have decreased ( P < .05). Nationally, sunscreen, sunburn, and skin cancer were significantly correlated with melanoma incidence. At the state level, only sunscreen and melanoma searches were significantly correlated with melanoma incidence. Conclusions: We conclude that online skin cancer prevention campaigns should focus on the search terms “sunburn” and “sunscreen,” given the decreasing online searches for skin cancer and melanoma. This is reinforced by the finding that sunscreen searches are higher in areas with higher melanoma incidence.


2019 ◽  
Vol 40 (11) ◽  
pp. 1253-1262 ◽  
Author(s):  
Jonathan D Tijerina ◽  
Shane D Morrison ◽  
Ian T Nolan ◽  
Matthew J Parham ◽  
Rahim Nazerali

Abstract Background Google Trends (GT) provides cost-free, customizable analyses of search traffic for specified terms entered into Google’s search engine. GT may inform plastic surgery marketing decisions and resource allocation. Objectives The aim of this study was to determine GT’s utility in tracking and predicting public interest in nonsurgical cosmetic procedures and to examine trends over time of public interest in nonsurgical procedures. Methods GT search volume for terms in 6 ASPS and ASAPS nonsurgical procedure categories (Botox injections, chemical peel, laser hair removal, laser skin resurfacing, microdermabrasion and soft tissue fillers [subcategories: collagen, fat, and hyaluronic acid]) were compared with ASPS and ASAPS case volumes for available dates between January 2004 and March 2019 with the use of univariate linear regression, taking P &lt; 0.01 as the cutoff for significance. Results Total search volume varied by search term within the United States and internationally. Significant positive correlations were demonstrated for 17 GT terms in all 6 ASPS and ASAPS categories: “Botox®,” “collagen injections,” “collagen lip injections” with both databases; and “chemical skin peel,” “skin peel,” “acne scar treatment,” “CO2 laser treatment,” “dermabrasion,” “collagen injections,” “collagen lip injections,” “fat transfer,” “hyaluronic acid fillers,” “hyaluronic acid injection,” “hyaluronic acid injections,” “Juvederm®,” and “fat transfer” with just 1 database. Many search terms were not significant, emphasizing the need for careful selection of search terms. Conclusions Our analysis further elaborates on recent characterization of GT as a powerful and intuitive data set for plastic surgeons, with the potential to accurately gauge global and national interest in topics and procedures related to nonsurgical cosmetic procedures.


2021 ◽  
Vol 5 (Supplement_2) ◽  
pp. 391-391
Author(s):  
Anthony Basile

Abstract Objectives When looking to follow or change a diet, Americans have many choices across numerous commercial (COM) and non-commercial (NCOM) diets. Since the Internet is a go-to source for information, this study sought to use Google Trend Data (Google Search Interest; GSI) from 2010–2020 to examine the popularity of diets included in the 2021 U.S. News & World Reports’ (USN) Best Diet's Report. There were four aims: 1) identify which COM and NCOM diets had the highest GSI, 2) identify which diets had a higher GSI compared to the healthiest COM and NCOM diet (Weight Watchers and the Mediterranean Diet, respectively; determined by USN), 3) determine if any relationship exists between GSI and various USN healthfulness scores, and 4) determine which diet type (e.g., Balanced, Elimination, Low Calorie, etc.) has the highest GSI. Methods COM (n = 15) and NCOM (n = 24) diet names as well as their diet type (Balanced, Elimination, High protein, Low Calorie, Low Carbohydrate, or Low Fat) and scorings (Overall, Health, Managing or Preventing Diabetes, Heart-Health, Short-term and Long-term Weight Loss, Ease of Following, Nutrition, and Safety) were collected from USN and popularity was measured using GSI data (GSI Range: 0–100) collected from Google Trends for the United States from 2010–2020. Spearman or Pearson correlation analyses were used when necessary with alpha set to 0.05. Results Weight Watchers (mean GSI: 29.80) and the Keto Diet (mean GSI: 15.05) had the highest mean GSI over the past ten years of all COM and NCOM diets, respectively. While Weight Watchers had the highest mean GSI and was the highest ranked COM diet by USN, four diets had a higher GSI compared to the Mediterranean Diet (in descending order: Keto Diet, Paleo Diet, Vegan Diet, and the Fast Diet). The diet type with the highest mean GSI was low carbohydrate for both COM and NCOM diets. Mean GSI was not correlated with any of the USN diet scores (P &gt; 0.05 for all). Conclusions These results suggest that numerous, less-healthful NCOM diets are more popular than the most healthful NCOM diet, the Mediterranean Diet. Understanding what attracts people to these less-healthful NCOM diets can provide insight into diet selection that can be used to better support diet choice. Funding Sources None.


2021 ◽  
Author(s):  
Alex Wang ◽  
Robert McCarron ◽  
Daniel Azzam ◽  
Annamarie Stehli ◽  
Glen Xiong ◽  
...  

BACKGROUND The epidemiology of mental health disorders has important theoretical and practical implications for healthcare service and planning. The recent increase in big data storage and subsequent development of analytical tools suggests that mining search databases may yield important trends on mental health, which can be used to replace or support existing population health studies. OBJECTIVE This study aimed to map out depression search intent in the United States based on internet mental health queries. METHODS Weekly data on mental health searches were extracted from Google Trends for an 11-year period (2010-2021) and separated by US state for the following terms: “feeling sad,” “depressed,” “depression,” “empty,” “insomnia,” “fatigue,” “guilty,” “feeling guilty,” and “suicide”. Multivariable regression models were created based on geographic and environmental factors and normalized to control terms “sports,” “news,” “google,” “youtube,” “facebook,” and “netflix”. Heat maps of population depression were generated based on search intent. RESULTS Depression search intent grew 67% from January 2010 to March 2021. Depression search intent showed significant seasonal patterns with peak intensity during winter (adjusted P < 0.001) and early spring months (adjusted P < 0.001), relative to summer months. Geographic location correlated to depression search intent with states in the Northeast (adjusted P = 0.01) having higher search intent than states in the South. CONCLUSIONS The trends extrapolated from Google Trends successfully correlate with known risk factors for depression, such as seasonality and increasing latitude. These findings suggest that Google Trends may be a valid novel epidemiological tool to map out depression prevalence in the United States.


2018 ◽  
Author(s):  
Lauren N Wood ◽  
Juzar Jamnagerwalla ◽  
Melissa A Markowitz ◽  
D Joseph Thum ◽  
Philip McCarty ◽  
...  

BACKGROUND Uterine power morcellation, where the uterus is shred into smaller pieces, is a widely used technique for removal of uterine specimens in patients undergoing minimally invasive abdominal hysterectomy or myomectomy. Complications related to power morcellation of uterine specimens led to US Food and Drug Administration (FDA) communications in 2014 ultimately recommending against the use of power morcellation for women undergoing minimally invasive hysterectomy. Subsequently, practitioners drastically decreased the use of morcellation. OBJECTIVE We aimed to determine the effect of increased patient awareness on the decrease in use of the morcellator. Google Trends is a public tool that provides data on temporal patterns of search terms, and we correlated this data with the timing of the FDA communication. METHODS Weekly relative search volume (RSV) was obtained from Google Trends using the term “morcellation.” Higher RSV corresponds to increases in weekly search volume. Search volumes were divided into 3 groups: the 2 years prior to the FDA communication, a 1-year period following, and thereafter, with the distribution of the weekly RSV over the 3 periods tested using 1-way analysis of variance. Additionally, we analyzed the total number of websites containing the term “morcellation” over this time. RESULTS The mean RSV prior to the FDA communication was 12.0 (SD 15.8), with the RSV being 60.3 (SD 24.7) in the 1-year after and 19.3 (SD 5.2) thereafter (P<.001). The mean number of webpages containing the term “morcellation” in 2011 was 10,800, rising to 18,800 during 2014 and 36,200 in 2017. CONCLUSIONS Google search activity about morcellation of uterine specimens increased significantly after the FDA communications. This trend indicates an increased public awareness regarding morcellation and its complications. More extensive preoperative counseling and alteration of surgical technique and clinician practice may be necessary.


2019 ◽  
Author(s):  
Anne Zepecki ◽  
Sylvia Guendelman ◽  
John DeNero ◽  
Ndola Prata

BACKGROUND Individuals are increasingly turning to search engines like Google to obtain health information and access resources. Analysis of Google search queries offers a novel approach, which is part of the methodological toolkit for infodemiology or infoveillance researchers, to understanding population health concerns and needs in real time or near-real time. While searches predominantly have been examined with the Google Trends website tool, newer application programming interfaces (APIs) are now available to academics to draw a richer landscape of searches. These APIs allow users to write code in languages like Python to retrieve sample data directly from Google servers. OBJECTIVE The purpose of this paper is to describe a novel protocol to determine the top queries, volume of queries, and the top sites reached by a population searching on the web for a specific health term. The protocol retrieves Google search data obtained from three Google APIs: Google Trends, Google Health Trends (also referred to as Flu Trends), and Google Custom Search. METHODS Our protocol consisted of four steps: (1) developing a master list of top search queries for an initial search term using Google Trends, (2) gathering information on relative search volume using Google Health Trends, (3) determining the most popular sites using Google Custom Search, and (4) calculating estimated total search volume. We tested the protocol following key procedures at each step and verified its usefulness by examining search traffic on <i>birth control</i> in 2017 in the United States. Two separate programmers working independently achieved similar results with insignificant variation due to sample variability. RESULTS We successfully tested the methodology on the initial search term <i>birth control</i>. We identified top search queries for <i>birth control</i>, of which <i>birth control pill</i> was the most popular and obtained the relative and estimated total search volume for the top queries: relative search volume was 0.54 for the pill, corresponding to an estimated 9.3-10.7 million searches. We used the estimates of the proportion of search activity for the top queries to arrive at a generated list of the most popular websites: for the pill, the Planned Parenthood website was the top site. CONCLUSIONS The proposed methodological framework demonstrates how to retrieve Google query data from multiple Google APIs and provides thorough documentation required to systematically identify search queries and websites, as well as estimate relative and total search volume of queries in real time or near-real time in specific locations and time periods. Although the protocol needs further testing, it allows researchers to replicate the steps and shows promise in advancing our understanding of population-level health concerns. INTERNATIONAL REGISTERED REPORT RR1-10.2196/16543


10.2196/22880 ◽  
2021 ◽  
Vol 7 (4) ◽  
pp. e22880
Author(s):  
Milad Asgari Mehrabadi ◽  
Nikil Dutt ◽  
Amir M Rahmani

Background The COVID-19 pandemic has affected virtually every region in the world. At the time of this study, the number of daily new cases in the United States was greater than that in any other country, and the trend was increasing in most states. Google Trends provides data regarding public interest in various topics during different periods. Analyzing these trends using data mining methods may provide useful insights and observations regarding the COVID-19 outbreak. Objective The objective of this study is to consider the predictive ability of different search terms not directly related to COVID-19 with regard to the increase of daily cases in the United States. In particular, we are concerned with searches related to dine-in restaurants and bars. Data were obtained from the Google Trends application programming interface and the COVID-19 Tracking Project. Methods To test the causation of one time series on another, we used the Granger causality test. We considered the causation of two different search query trends related to dine-in restaurants and bars on daily positive cases in the US states and territories with the 10 highest and 10 lowest numbers of daily new cases of COVID-19. In addition, we used Pearson correlations to measure the linear relationships between different trends. Results Our results showed that for states and territories with higher numbers of daily cases, the historical trends in search queries related to bars and restaurants, which mainly occurred after reopening, significantly affected the number of daily new cases on average. California, for example, showed the most searches for restaurants on June 7, 2020; this affected the number of new cases within two weeks after the peak, with a P value of .004 for the Granger causality test. Conclusions Although a limited number of search queries were considered, Google search trends for restaurants and bars showed a significant effect on daily new cases in US states and territories with higher numbers of daily new cases. We showed that these influential search trends can be used to provide additional information for prediction tasks regarding new cases in each region. These predictions can help health care leaders manage and control the impact of the COVID-19 outbreak on society and prepare for its outcomes.


2020 ◽  
Author(s):  
Joseph Younis ◽  
Harvy Freitag ◽  
Jeremy S Ruthberg ◽  
Jonathan P Romanes ◽  
Craig Nielsen ◽  
...  

BACKGROUND  The magnitude and time course of the COVID-19 epidemic in the United States depends on early interventions to reduce the basic reproductive number to below 1. It is imperative, then, to develop methods to actively assess where quarantine measures such as social distancing may be deficient and suppress those potential resurgence nodes as early as possible. OBJECTIVE We ask if social media is an early indicator of public social distancing measures in the United States by investigating its correlation with the time-varying reproduction number (R<sub>t</sub>) as compared to social mobility estimates reported from Google and Apple Maps. METHODS  In this observational study, the estimated R<sub>t</sub> was obtained for the period between March 5 and April 5, 2020, using the EpiEstim package. Social media activity was assessed using queries of “social distancing” or “#socialdistancing” on Google Trends, Instagram, and Twitter, with social mobility assessed using Apple and Google Maps data. Cross-correlations were performed between R<sub>t</sub> and social media activity or mobility for the United States. We used Pearson correlations and the coefficient of determination (ρ) with significance set to <i>P</i>&lt;.05. RESULTS Negative correlations were found between Google search interest for “social distancing” and R<sub>t</sub> in the United States (<i>P</i>&lt;.001), and between search interest and state-specific R<sub>t</sub> for 9 states with the highest COVID-19 cases (<i>P</i>&lt;.001); most states experienced a delay varying between 3-8 days before reaching significance. A negative correlation was seen at a 4-day delay from the start of the Instagram hashtag “#socialdistancing” and at 6 days for Twitter (<i>P</i>&lt;.001). Significant correlations between R<sub>t</sub> and social media manifest earlier in time compared to social mobility measures from Google and Apple Maps, with peaks at –6 and –4 days. Meanwhile, changes in social mobility correlated best with R<sub>t</sub> at –2 days and +1 day for workplace and grocery/pharmacy, respectively. CONCLUSIONS Our study demonstrates the potential use of Google Trends, Instagram, and Twitter as epidemiological tools in the assessment of social distancing measures in the United States during the early course of the COVID-19 pandemic. Their correlation and earlier rise and peak in correlative strength with R<sub>t</sub> when compared to social mobility may provide proactive insight into whether social distancing efforts are sufficiently enacted. Whether this proves valuable in the creation of more accurate assessments of the early epidemic course is uncertain due to limitations. These limitations include the use of a biased sample that is internet literate with internet access, which may covary with socioeconomic status, education, geography, and age, and the use of subtotal social media mentions of social distancing. Future studies should focus on investigating how social media reactions change during the course of the epidemic, as well as the conversion of social media behavior to actual physical behavior.


10.2196/21340 ◽  
2020 ◽  
Vol 6 (4) ◽  
pp. e21340 ◽  
Author(s):  
Joseph Younis ◽  
Harvy Freitag ◽  
Jeremy S Ruthberg ◽  
Jonathan P Romanes ◽  
Craig Nielsen ◽  
...  

Background  The magnitude and time course of the COVID-19 epidemic in the United States depends on early interventions to reduce the basic reproductive number to below 1. It is imperative, then, to develop methods to actively assess where quarantine measures such as social distancing may be deficient and suppress those potential resurgence nodes as early as possible. Objective We ask if social media is an early indicator of public social distancing measures in the United States by investigating its correlation with the time-varying reproduction number (Rt) as compared to social mobility estimates reported from Google and Apple Maps. Methods  In this observational study, the estimated Rt was obtained for the period between March 5 and April 5, 2020, using the EpiEstim package. Social media activity was assessed using queries of “social distancing” or “#socialdistancing” on Google Trends, Instagram, and Twitter, with social mobility assessed using Apple and Google Maps data. Cross-correlations were performed between Rt and social media activity or mobility for the United States. We used Pearson correlations and the coefficient of determination (ρ) with significance set to P<.05. Results Negative correlations were found between Google search interest for “social distancing” and Rt in the United States (P<.001), and between search interest and state-specific Rt for 9 states with the highest COVID-19 cases (P<.001); most states experienced a delay varying between 3-8 days before reaching significance. A negative correlation was seen at a 4-day delay from the start of the Instagram hashtag “#socialdistancing” and at 6 days for Twitter (P<.001). Significant correlations between Rt and social media manifest earlier in time compared to social mobility measures from Google and Apple Maps, with peaks at –6 and –4 days. Meanwhile, changes in social mobility correlated best with Rt at –2 days and +1 day for workplace and grocery/pharmacy, respectively. Conclusions Our study demonstrates the potential use of Google Trends, Instagram, and Twitter as epidemiological tools in the assessment of social distancing measures in the United States during the early course of the COVID-19 pandemic. Their correlation and earlier rise and peak in correlative strength with Rt when compared to social mobility may provide proactive insight into whether social distancing efforts are sufficiently enacted. Whether this proves valuable in the creation of more accurate assessments of the early epidemic course is uncertain due to limitations. These limitations include the use of a biased sample that is internet literate with internet access, which may covary with socioeconomic status, education, geography, and age, and the use of subtotal social media mentions of social distancing. Future studies should focus on investigating how social media reactions change during the course of the epidemic, as well as the conversion of social media behavior to actual physical behavior.


Sign in / Sign up

Export Citation Format

Share Document