Using Named Entity Recognition to Identify Substances Used in Self-Medication of Opioid Withdrawal: Natural Language Processing Study of Reddit Data (Preprint)

2021 ◽  
Author(s):  
Alexander Preiss ◽  
Peter Baumgartner ◽  
Mark J Edlund ◽  
Georgiy V Bobashev

BACKGROUND Abrupt cessation of opioid use can cause withdrawal symptoms, ranging from moderate to severe. People often continue opioid misuse to avoid these symptoms. Many people who use opioids self-treat withdrawal symptoms with a wide range of substances, some of which could help and some potentially harm. Little is known about the substances people use or their effects. OBJECTIVE To validate a methodology for identifying substances used to treat symptoms of opioid withdrawal by a community of people who use opioids on the social media site Reddit. METHODS We developed a named entity recognition model and used it to extract substances and effects from nearly 4 million comments from the r/opiates and r/OpiatesRecovery subreddits. We categorized effects as (1) DSM-5 symptoms of opioid withdrawal, e.g., body aches, (2) effects of opioid use, e.g., euphoria, (3) neither, or (4) other. In this analysis, we focused on those effects which are symptoms of opioid withdrawal and substances which are potential remedies for those withdrawal symptoms. To identify these subsets, we began by deduplicating substances and effects using a combination of clustering on word embeddings and manual review. We then built a bipartite network of substance and effect co-occurrence. For each of 16 effects identified as DSM-5 symptoms of opioid withdrawal, we identified the top 10 substances most strongly associated with the effect, based on a weighted average of edge count and positive pointwise mutual information. We classified these symptom and potential remedy pairs as (1) common treatments, (2) not accepted practice but potentially useful, (3) natural/home remedies, (4) causes, or (5) other. We developed the Withdrawal Remedy Explorer app to facilitate further exploration of the data. RESULTS Our named entity recognition model achieved F1 scores of 92.1 (substances) and 81.7 (effects) on holdout data. After deduplication, we identified 458 unique substances and 253 unique effects. Of 130 potential remedies strongly associated with withdrawal symptoms, 41.54% were common, accepted treatments for the symptom; 13.08% were not accepted practice, but could be useful given their pharmacology; 10.00% were natural/home remedies; 5.38% were causes of the symptom; and 30.00% were other. We identified both potentially promising new remedies (e.g., gabapentin for body aches) and potentially common but harmful remedies (e.g., antihistamines for restless leg syndrome). CONCLUSIONS Social media is a promising source of data on self-medication of opioid withdrawal. Many of the withdrawal remedies discussed by Reddit users are either clinically proven or potentially useful. These results suggest that this methodology is a valid way to study the self-treatment behavior of an online community of people who use opioids. Our Withdrawal Remedy Explorer app provides a platform to use this data for pharmacovigilance, identification of new treatments, and better understanding the needs of people undergoing opioid withdrawal. Furthermore, this approach could be applied to many other disease states where people self-manage their symptoms (to any degree) and discuss their experiences online.

2020 ◽  
Author(s):  
Shintaro Tsuji ◽  
Andrew Wen ◽  
Naoki Takahashi ◽  
Hongjian Zhang ◽  
Katsuhiko Ogasawara ◽  
...  

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.


Author(s):  
Aditya Kiran Brahma ◽  
Prathyush Potluri ◽  
Meghana Kanapaneni ◽  
Sumanth Prabhu ◽  
Sundeep Teki

Data ◽  
2021 ◽  
Vol 6 (7) ◽  
pp. 71
Author(s):  
Gonçalo Carnaz ◽  
Mário Antunes ◽  
Vitor Beires Nogueira

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.


Sign in / Sign up

Export Citation Format

Share Document