Need of care in interpreting Google Trends-based COVID-19 infodemiological study results: potential risk of false-positivity
Introduction: Google Trends (GT) is being used as an epidemiological tool to study coronavirus disease (COVID-19) by identifying keywords in search trends that are predictive for the COVID-19 epidemiological burden. However, many of the earlier GT-based studies include potential statistical fallacies by measuring the correlation between non-stationary time sequences without adjusting for multiple comparisons or the confounding of media coverage, leading to concerns about the increased risk of obtaining false-positive results. In this study, we aimed to apply statistically more favorable methods to validate the earlier GT-based COVID-19 study results. Methods: We extracted the relative GT search volume for keywords associated with COVID-19 symptoms, and evaluated their Granger-causality to weekly COVID-19 positivity in eight English-speaking countries and Japan. In addition, the impact of media coverage on keywords with significant Granger-causality was further evaluated using Japanese regional data. Results: Our Granger causality-based approach largely decreased (by up to approximately one-third) the number of keywords identified as having a significant temporal relationship with the COVID-19 trend when compared to those identified by the Pearson correlation-based approach. 'Sense of smell' and 'loss of smell' were the most reliable GT keywords across all the evaluated countries; however, when adjusted with their media coverage, these keyword trends did not Granger-cause the COVID-19 positivity trends (in Japan). Conclusions: Our results suggest that some of the search keywords reported as candidate predictive measures in earlier GT-based COVID-19 studies may potentially be unreliable; therefore, caution is necessary when interpreting published GT-based study results.