BACKGROUND
Social media data can yield important insights into population-level health. However, search terms that are selected in an arbitrary or non-systematic manner may not yield results that are comprehensive, meaningful, or reflect the full range of social responses to public health issues. To address this issue, we developed a conceptual model that incorporated multiple public health concepts and then used a multifaceted approach to generate search terms for addressing each concept.
OBJECTIVE
The goal of this study was to develop a conceptually based approach to identifying search terms for social media mining that encompassed a broad array of perceptions and behavioral responses to wildfire smoke. The long-term goal of this program of research is to inform the development of salient public health messages during wildfire season.
METHODS
A methods study was conducted to assess the feasibility of mapping respiratory health related social media messages to key words and phrases. Five concepts were included: 1) ambient air quality conditions, 2) respiratory symptoms and exacerbations, 3) risk perception and self-efficacy, 4) behavioral responses and self-care management, and 5) quality of life and healthcare utilization. Keywords and phrases related to respiratory health were extracted from existing literature and public health instruments/tools and sorted to the five concept lists. Once identified, the concept lists were reviewed for applicability by an expert panel with clinical knowledge of respiratory illnesses and adjusted to language appropriate to social media. The expert panel also added additional terms characteristic of their clinical experience.
RESULTS
The result of this process was a multifaceted methodology for generating search terms used in social media mining that capitalized on existing literature and respiratory health instruments/tools and incorporated a unique clinical perspective. The five overarching public health concepts did not change throughout the process. The result was five lists containing 150 search terms and phrases that relate to each concept and can be used in the data mining process.
CONCLUSIONS
Our conceptually based approach yielded search terms that reflected nuanced ideas addressing risk, risk perception, risk appraisal, and risk reduction actions. Combining three unique approaches to the isolation of terms for social media mining will result in messages that offer a more comprehensive and thoughtful insight into the effects that wildfires have on population-level respiratory health. Preliminary data from this study suggest there is value added from being thoughtful and systematic in the vetting, selection, and consideration of words for social media mining of respiratory health related messages.