scholarly journals Contemporary English Pain Descriptors as Detected on Social Media Using Artificial Intelligence and Emotion Analytics Algorithms: Cross-sectional Study

10.2196/31366 ◽  
2021 ◽  
Vol 5 (11) ◽  
pp. e31366
Author(s):  
Ming Yi Tan ◽  
Charlene Enhui Goh ◽  
Hee Hon Tan

Background Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before. Objective The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing. Methods All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including sarcasm, anger, fear, sadness, joy, and valence. Word2Vec was used to identify new pain descriptors associated with the original descriptors from the MPQ. Analysis of count and pain intensity formed the basis for proposing new pain descriptors and determining the order of pain descriptors within each subclass. Results A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new Psychological subclass comprising 9 descriptors was added. Conclusions This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ.

2021 ◽  
Author(s):  
Ming Yi Tan ◽  
Charlene Enhui Goh ◽  
Hee Hon Tan

BACKGROUND Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before. OBJECTIVE The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing. METHODS All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including <i>sarcasm</i>, <i>anger</i>, <i>fear</i>, <i>sadness</i>, <i>joy</i>, and <i>valence</i>. Word2Vec was used to identify new pain descriptors associated with the original descriptors from the MPQ. Analysis of count and pain intensity formed the basis for proposing new pain descriptors and determining the order of pain descriptors within each subclass. RESULTS A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new <i>Psychological</i> subclass comprising 9 descriptors was added. CONCLUSIONS This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ.


Author(s):  
Kathrin Cresswell ◽  
Ahsen Tahir ◽  
Zakariya Sheikh ◽  
Zain Hussain ◽  
Andrés Domínguez Hernández ◽  
...  

2021 ◽  
Author(s):  
Su Golder ◽  
Robin Stevens ◽  
Karen O'Conor ◽  
Richard James ◽  
Graciela Gonzalez-Hernandez

BACKGROUND Background: A growing amount of health research uses social media data. Those critical of social media research often cite that it may be unrepresentative of the population, but the suitability of social media data in digital epidemiology is more nuanced. Identifying the demographics of social media users can help establish representativeness. OBJECTIVE Objectives: We sought to identify the different approaches or combination of approaches to extract race or ethnicity from social media and report on the challenges of using these methods. METHODS Methods: We present a scoping review to identify the methods used to extract race or ethnicity from Twitter datasets. We searched 17 electronic databases and carried out reference checking and handsearching in order to identify relevant articles. Sifting of each record was undertaken independently by at least two researchers with any disagreement discussed. The included studies could be categorized by the methods the authors applied to extract race or ethnicity. RESULTS Results: From 1249 records we identified 67 that met our inclusion criteria. The majority focus on US based users and English language tweets. A range of types of data were used including Twitter profile -pictures or information from bios (such as names or self-declarations), or location and/or content in the tweets themselves. A range of methodologies were used including using manual inference, linkage to census data, commercial software, language/dialect recognition and machine learning. Not all studies evaluated their methods. Those that did found accuracy to vary from 45% to 93% with significantly lower accuracy identifying non-white race categories. The inference of race/ethnicity raises important ethical questions which can be exacerbated by the data and methods used. The comparative accuracy of different methods is also largely unknown. CONCLUSIONS Conclusion: There is no standard accepted approach or current guidelines for extracting or inferring race or ethnicity of Twitter users. Social media researchers must use careful interpretation of race or ethnicity and not over-promise what can be achieved, as even manual screening is a subjective, imperfect method. Future research should establish the accuracy of methods to inform evidence-based best practice guidelines for social media researchers, and be guided by concerns of equity and social justice.


2020 ◽  
Author(s):  
Mirjam Elisabeth Eiswirth

BACKGROUND There is a large amount of valuable and rich qualitative data from interviews or conversations about living with Type 1 Diabetes which could be used for qualitative analysis. However, especially if this data was not collected to answer one specific research question, data saturation and representativeness need to be assessed. Social media data from the Diabetes online community, for example from Twitter, can easily be collected to create a parallel corpus that can be compared with the conversations. This social media data covers a large number of participants and localities and can thus be used to situate the conversational recordings in question within a larger context. The present study puts forward one way in which such a comparison can be implemented, and discusses the findings. OBJECTIVE The objective of this study is to show how a collection of Tweets from the English-speaking online Diabetes Community can be used to situate a smaller set of interviews and conversational recordings about living with Type 1 Diabetes in the broader discourse. METHODS Two sets of data were collected, one from Twitter using hashtags common in the Diabetes Online Community, the other consists of 17 hours of audio-recorded face-to-face conversations and interviews with people living with Type 1 Diabetes in Scotland. Both corpora contain about 200.000 words. They were analyzed in R using common metrics of word frequency and distinctiveness. The most frequent words were hand-coded for broader topics using a bottom-up data driven approach to coding. RESULTS The conversations largely mirror the global diabetes online community’s discourse. The small differences are accounted for by the nature of the medium or the geographical context of the conversations. Both sources of data corroborate findings from previous work on the experience of people living with Type 1 Diabetes in terms of key topics and concerns. CONCLUSIONS This strategy of comparing small conversational corpora to potentially very large online corpora is presented as a methodology for making non-purpose-built corpora accessible for different types of analysis, situating purpose-built corpora within a wider context, and developing new research questions based on such a textual analysis. CLINICALTRIAL No trial registration was needed. Data collection was approved by the Linguistics and English Language Ethics Committee at the University of Edinburgh and an ethics approval waiver was obtained from the Scottish National Health Service. Participant data were anonymized using pseudonyms selected by the participants.


Sign in / Sign up

Export Citation Format

Share Document