scholarly journals Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework

2019 ◽  
Vol 27 (2) ◽  
pp. 315-329 ◽  
Author(s):  
Abeed Sarker ◽  
Annika DeRoos ◽  
Jeanmarie Perrone

Abstract Objective Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media–based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource. Materials and Methods We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings. Results A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning. Discussion There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data. Conclusion The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.

2021 ◽  
Vol 10 (7) ◽  
pp. 474
Author(s):  
Bingqing Wang ◽  
Bin Meng ◽  
Juan Wang ◽  
Siyu Chen ◽  
Jian Liu

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.


2021 ◽  
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL


Author(s):  
Mohamad Hasan

This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.


2020 ◽  
Vol 6 (1) ◽  
pp. 205630511989732
Author(s):  
Alireza Karduni ◽  
Eric Sauda

Black Lives Matter, like many modern movements in the age of information, makes significant use of social media as well as public space to demand justice. In this article, we study the protests in response to the shooting of Keith Lamont Scott by police in Charlotte, North Carolina, on September 2016. Our goal is to measure the significance of urban space within the virtual and physical network of protesters. Using a mixed-methods approach, we identify and study urban space and social media generated by these protests. We conducted interviews with protesters who were among the first to join the Keith Lamont Scott shooting demonstrations. From the interviews, we identify places that were significant in our interviewees’ narratives. Using a combination of natural language processing and social network analysis, we analyze social media data related to the Charlotte protests retrieved from Twitter. We found that social media, local community, and public space work together to organize and motivate protests and that public events such as protests cause a discernible increase in social media activity. Finally, we find that there are two distinct communities who engage social media in different ways; one group involved with social media, local community and urban space, and a second group connected almost exclusively through social media.


10.2196/15861 ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. e15861 ◽  
Author(s):  
Karen O'Connor ◽  
Abeed Sarker ◽  
Jeanmarie Perrone ◽  
Graciela Gonzalez Hernandez

Background Social media data are being increasingly used for population-level health research because it provides near real-time access to large volumes of consumer-generated data. Recently, a number of studies have explored the possibility of using social media data, such as from Twitter, for monitoring prescription medication abuse. However, there is a paucity of annotated data or guidelines for data characterization that discuss how information related to abuse-prone medications is presented on Twitter. Objective This study discusses the creation of an annotated corpus suitable for training supervised classification algorithms for the automatic classification of medication abuse–related chatter. The annotation strategies used for improving interannotator agreement (IAA), a detailed annotation guideline, and machine learning experiments that illustrate the utility of the annotated corpus are also described. Methods We employed an iterative annotation strategy, with interannotator discussions held and updates made to the annotation guidelines at each iteration to improve IAA for the manual annotation task. Using the grounded theory approach, we first characterized tweets into fine-grained categories and then grouped them into 4 broad classes—abuse or misuse, personal consumption, mention, and unrelated. After the completion of manual annotations, we experimented with several machine learning algorithms to illustrate the utility of the corpus and generate baseline performance metrics for automatic classification on these data. Results Our final annotated set consisted of 16,443 tweets mentioning at least 20 abuse-prone medications including opioids, benzodiazepines, atypical antipsychotics, central nervous system stimulants, and gamma-aminobutyric acid analogs. Our final overall IAA was 0.86 (Cohen kappa), which represents high agreement. The manual annotation process revealed the variety of ways in which prescription medication misuse or abuse is discussed on Twitter, including expressions indicating coingestion, nonmedical use, nonstandard route of intake, and consumption above the prescribed doses. Among machine learning classifiers, support vector machines obtained the highest automatic classification accuracy of 73.00% (95% CI 71.4-74.5) over the test set (n=3271). Conclusions Our manual analysis and annotations of a large number of tweets have revealed types of information posted on Twitter about a set of abuse-prone prescription medications and their distributions. In the interests of reproducible and community-driven research, we have made our detailed annotation guidelines and the training data for the classification experiments publicly available, and the test data will be used in future shared tasks.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jinyan Chen ◽  
Susanne Becken ◽  
Bela Stantic

Purpose This paper aims to examine key parameters of scholarly context and geographic focus and provide an assessment of theoretical underpinnings of studies in the field of social media and visitor mobility. This review also summarised the characteristics of social media data, including how data are collected from different social media platforms and their advantages and limitations. The stocktake of research in this field was completed by examining technologies and applied methods that supported different research questions. Design/methodology/approach This literature review applied a mix of methods to conduct a literature review. This review analysed 82 journal articles on using social media to track visitors’ movements between 2014 and November 2020. The literature compared the different social media, discussed current applied theories, available technologies, analysed the current trend and provided advice for future directions. Findings This review provides a state-of-the-art assessment of the research to date on tourist mobility analysed using social media data. The diversity of scales (with a dominant focus on the city-scale), platforms and methods highlight that this field is emerging, but it also reflects the complexity of the tourism phenomenon. This review identified a lack of theory in this field, and it points to ongoing challenges in ensuring appropriate use of data (e.g. differentiating travellers from residents) and the ethics surrounding them. Originality/value The findings guide researchers, especially those with no computer science background, on the different types of approaches, data sources and methods available for tracking tourist mobility by harnessing social media. Depending on the particular research interest, different tools for processing and visualization are available.


2020 ◽  
Author(s):  
Oladapo Oyebode ◽  
Chinenye Ndulue ◽  
Ashfaq Adib ◽  
Dinesh Mulchandani ◽  
Banuchitra Suruliraj ◽  
...  

BACKGROUND The COVID-19 pandemic has caused a global health crisis that affects many aspects of human lives. In the absence of vaccines and antivirals, several behavioural change and policy initiatives, such as physical distancing, have been implemented to control the spread of the coronavirus. Social media data can reveal public perceptions toward how governments and health agencies across the globe are handling the pandemic, as well as the impact of the disease on people regardless of their geographic locations in line with various factors that hinder or facilitate the efforts to control the spread of the pandemic globally. OBJECTIVE This paper aims to investigate the impact of the COVID-19 pandemic on people globally using social media data. METHODS We apply natural language processing (NLP) and thematic analysis to understand public opinions, experiences, and issues with respect to the COVID-19 pandemic using social media data. First, we collect over 47 million COVID-19-related comments from Twitter, Facebook, YouTube, and three online discussion forums. Second, we perform data preprocessing which involves applying NLP techniques to clean and prepare the data for automated theme extraction. Third, we apply context-aware NLP approach to extract meaningful keyphrases or themes from over 1 million randomly-selected comments, as well as compute sentiment scores for each theme and assign sentiment polarity (i.e., positive, negative, or neutral) based on the scores using lexicon-based technique. Fourth, we categorize related themes into broader themes. RESULTS A total of 34 negative themes emerged, out of which 15 are health-related issues, psychosocial issues, and social issues related to the COVID-19 pandemic from the public perspective. Some of the health-related issues are increased mortality, health concerns, struggling health systems, and fitness issues; while some of the psychosocial issues include frustrations due to life disruptions, panic shopping, and expression of fear. Social issues include harassment, domestic violence, and wrong societal attitude. In addition, 20 positive themes emerged from our results. Some of the positive themes include public awareness, encouragement, gratitude, cleaner environment, online learning, charity, spiritual support, and innovative research. CONCLUSIONS We uncover various negative and positive themes representing public perceptions toward the COVID-19 pandemic and recommend interventions that can help address the health, psychosocial, and social issues based on the positive themes and other remedial ideas rooted in research. These interventions will help governments, health professionals and agencies, institutions, and individuals in their efforts to curb the spread of COVID-19 and minimize its impact, as well as in reacting to any future pandemics.


Author(s):  
Juan M. Banda ◽  
Gurdas Viguruji Singh ◽  
Osaid Alser ◽  
DANIEL PRIETO-ALHAMBRA

As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually not publicly available or standardized to perform longitudinal analyses on them. Therefore, there is a need to use additional data sources for continued follow-up and identification of latent symptoms that might be underreported in other places. In this work we present a preliminary characterization of post-COVID-19 symptoms using social media data from Twitter. We use a combination of natural language processing and clinician reviews to identify long term self-reported symptoms on a set of Twitter users.


10.2196/18767 ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. e18767
Author(s):  
Jooyun Lee ◽  
Hyeoun-Ae Park ◽  
Seul Ki Park ◽  
Tae-Min Song

Background Analysis of posts on social media is effective in investigating health information needs for disease management and identifying people’s emotional status related to disease. An ontology is needed for semantic analysis of social media data. Objective This study was performed to develop a cancer ontology with terminology containing consumer terms and to analyze social media data to identify health information needs and emotions related to cancer. Methods A cancer ontology was developed using social media data, collected with a crawler, from online communities and blogs between January 1, 2014 and June 30, 2017 in South Korea. The relative frequencies of posts containing ontology concepts were counted and compared by cancer type. Results The ontology had 9 superclasses, 213 class concepts, and 4061 synonyms. Ontology-driven natural language processing was performed on the text from 754,744 cancer-related posts. Colon, breast, stomach, cervical, lung, liver, pancreatic, and prostate cancer; brain tumors; and leukemia appeared most in these posts. At the superclass level, risk factor was the most frequent, followed by emotions, symptoms, treatments, and dealing with cancer. Conclusions Information needs and emotions differed according to cancer type. The observations of this study could be used to provide tailored information to consumers according to cancer type and care process. Attention should be paid to provision of cancer-related information to not only patients but also their families and the general public seeking information on cancer.


Sign in / Sign up

Export Citation Format

Share Document