Evaluating the representativeness of a small qualitative sample with data from the Diabetes Online Community on Twitter: a mixed methods study (Preprint)

2020 ◽  
Author(s):  
Mirjam Elisabeth Eiswirth

BACKGROUND There is a large amount of valuable and rich qualitative data from interviews or conversations about living with Type 1 Diabetes which could be used for qualitative analysis. However, especially if this data was not collected to answer one specific research question, data saturation and representativeness need to be assessed. Social media data from the Diabetes online community, for example from Twitter, can easily be collected to create a parallel corpus that can be compared with the conversations. This social media data covers a large number of participants and localities and can thus be used to situate the conversational recordings in question within a larger context. The present study puts forward one way in which such a comparison can be implemented, and discusses the findings. OBJECTIVE The objective of this study is to show how a collection of Tweets from the English-speaking online Diabetes Community can be used to situate a smaller set of interviews and conversational recordings about living with Type 1 Diabetes in the broader discourse. METHODS Two sets of data were collected, one from Twitter using hashtags common in the Diabetes Online Community, the other consists of 17 hours of audio-recorded face-to-face conversations and interviews with people living with Type 1 Diabetes in Scotland. Both corpora contain about 200.000 words. They were analyzed in R using common metrics of word frequency and distinctiveness. The most frequent words were hand-coded for broader topics using a bottom-up data driven approach to coding. RESULTS The conversations largely mirror the global diabetes online community’s discourse. The small differences are accounted for by the nature of the medium or the geographical context of the conversations. Both sources of data corroborate findings from previous work on the experience of people living with Type 1 Diabetes in terms of key topics and concerns. CONCLUSIONS This strategy of comparing small conversational corpora to potentially very large online corpora is presented as a methodology for making non-purpose-built corpora accessible for different types of analysis, situating purpose-built corpora within a wider context, and developing new research questions based on such a textual analysis. CLINICALTRIAL No trial registration was needed. Data collection was approved by the Linguistics and English Language Ethics Committee at the University of Edinburgh and an ethics approval waiver was obtained from the Scottish National Health Service. Participant data were anonymized using pseudonyms selected by the participants.


2019 ◽  
Vol 13 (3) ◽  
pp. 493-497 ◽  
Author(s):  
Valerie Gavrila ◽  
Ashley Garrity ◽  
Emily Hirschfeld ◽  
Breann Edwards ◽  
Joyce M. Lee

Background: Caregivers and individuals living with type 1 diabetes (T1D) who are members of CGM in the Cloud, a Facebook group associated with the Nightscout Project, were interviewed to assess how the online community impacted peer support. Methods: Semistructured qualitative interviews were conducted with caregivers and patients who are part of CGM in the Cloud Facebook group. Interview transcripts were analyzed to identify various themes related to peer support in the online group. Results: Members of the CGM in the Cloud Facebook group identified peer support through giving and receiving technical, emotional, and medical support, as well as giving back to the larger community by paying it forward. Peer support also extended beyond the online forum, connecting people in person, whether they were local or across the country. Conclusions: An online community can provide many avenues for peer support through emotional and technical support, as well as serve as a tool of empowerment. The community as a whole also had a spirit of altruism that bolstered confidence in others as well as those who paid it forward.



2019 ◽  
pp. 089443931989330 ◽  
Author(s):  
Ashley Amaya ◽  
Ruben Bach ◽  
Florian Keusch ◽  
Frauke Kreuter

Social media are becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from those with which many social scientists are used to working, so the assumptions often used to plan and manage a project may no longer hold. For example, social media data are so large that they may not be able to be processed on a single machine; they are in file formats with which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this article, we attempt to document several challenges and opportunities encountered when working with Reddit, the self-proclaimed “front page of the Internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this article is specific to Reddit, researchers may also view it as a list of the type of information one may seek to acquire prior to conducting a project that uses any type of social media data.



2021 ◽  
Author(s):  
Su Golder ◽  
Robin Stevens ◽  
Karen O'Conor ◽  
Richard James ◽  
Graciela Gonzalez-Hernandez

BACKGROUND Background: A growing amount of health research uses social media data. Those critical of social media research often cite that it may be unrepresentative of the population, but the suitability of social media data in digital epidemiology is more nuanced. Identifying the demographics of social media users can help establish representativeness. OBJECTIVE Objectives: We sought to identify the different approaches or combination of approaches to extract race or ethnicity from social media and report on the challenges of using these methods. METHODS Methods: We present a scoping review to identify the methods used to extract race or ethnicity from Twitter datasets. We searched 17 electronic databases and carried out reference checking and handsearching in order to identify relevant articles. Sifting of each record was undertaken independently by at least two researchers with any disagreement discussed. The included studies could be categorized by the methods the authors applied to extract race or ethnicity. RESULTS Results: From 1249 records we identified 67 that met our inclusion criteria. The majority focus on US based users and English language tweets. A range of types of data were used including Twitter profile -pictures or information from bios (such as names or self-declarations), or location and/or content in the tweets themselves. A range of methodologies were used including using manual inference, linkage to census data, commercial software, language/dialect recognition and machine learning. Not all studies evaluated their methods. Those that did found accuracy to vary from 45% to 93% with significantly lower accuracy identifying non-white race categories. The inference of race/ethnicity raises important ethical questions which can be exacerbated by the data and methods used. The comparative accuracy of different methods is also largely unknown. CONCLUSIONS Conclusion: There is no standard accepted approach or current guidelines for extracting or inferring race or ethnicity of Twitter users. Social media researchers must use careful interpretation of race or ethnicity and not over-promise what can be achieved, as even manual screening is a subjective, imperfect method. Future research should establish the accuracy of methods to inform evidence-based best practice guidelines for social media researchers, and be guided by concerns of equity and social justice.



2021 ◽  
Author(s):  
Ming Yi Tan ◽  
Charlene Enhui Goh ◽  
Hee Hon Tan

BACKGROUND Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before. OBJECTIVE The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing. METHODS All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including <i>sarcasm</i>, <i>anger</i>, <i>fear</i>, <i>sadness</i>, <i>joy</i>, and <i>valence</i>. Word2Vec was used to identify new pain descriptors associated with the original descriptors from the MPQ. Analysis of count and pain intensity formed the basis for proposing new pain descriptors and determining the order of pain descriptors within each subclass. RESULTS A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new <i>Psychological</i> subclass comprising 9 descriptors was added. CONCLUSIONS This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ.



10.2196/31366 ◽  
2021 ◽  
Vol 5 (11) ◽  
pp. e31366
Author(s):  
Ming Yi Tan ◽  
Charlene Enhui Goh ◽  
Hee Hon Tan

Background Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before. Objective The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing. Methods All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including sarcasm, anger, fear, sadness, joy, and valence. Word2Vec was used to identify new pain descriptors associated with the original descriptors from the MPQ. Analysis of count and pain intensity formed the basis for proposing new pain descriptors and determining the order of pain descriptors within each subclass. Results A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new Psychological subclass comprising 9 descriptors was added. Conclusions This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ.



PsycCRITIQUES ◽  
2016 ◽  
Vol 61 (51) ◽  
Author(s):  
Daniel Keyes


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.





2014 ◽  
Author(s):  
Kathleen M. Carley ◽  
L. R. Carley ◽  
Jonathan Storrick


Sign in / Sign up

Export Citation Format

Share Document