scholarly journals Forty-two Million Ways to Describe Pain: Topic Modeling of 200,000 PubMed Pain-Related Abstracts Using Natural Language Processing and Deep Learning–Based Text Generation

Pain Medicine ◽  
2020 ◽  
Vol 21 (11) ◽  
pp. 3133-3160
Author(s):  
Patrick J Tighe ◽  
Bharadwaj Sannapaneni ◽  
Roger B Fillingim ◽  
Charlie Doyle ◽  
Michael Kent ◽  
...  

Abstract Objective Recent efforts to update the definitions and taxonomic structure of concepts related to pain have revealed opportunities to better quantify topics of existing pain research subject areas. Methods Here, we apply basic natural language processing (NLP) analyses on a corpus of >200,000 abstracts published on PubMed under the medical subject heading (MeSH) of “pain” to quantify the topics, content, and themes on pain-related research dating back to the 1940s. Results The most common stemmed terms included “pain” (601,122 occurrences), “patient” (508,064 occurrences), and “studi-” (208,839 occurrences). Contrarily, terms with the highest term frequency–inverse document frequency included “tmd” (6.21), “qol” (6.01), and “endometriosis” (5.94). Using the vector-embedded model of term definitions available via the “word2vec” technique, the most similar terms to “pain” included “discomfort,” “symptom,” and “pain-related.” For the term “acute,” the most similar terms in the word2vec vector space included “nonspecific,” “vaso-occlusive,” and “subacute”; for the term “chronic,” the most similar terms included “persistent,” “longstanding,” and “long-standing.” Topic modeling via Latent Dirichlet analysis identified peak coherence (0.49) at 40 topics. Network analysis of these topic models identified three topics that were outliers from the core cluster, two of which pertained to women’s health and obstetrics and were closely connected to one another, yet considered distant from the third outlier pertaining to age. A deep learning–based gated recurrent units abstract generation model successfully synthesized several unique abstracts with varying levels of believability, with special attention and some confusion at lower temperatures to the roles of placebo in randomized controlled trials. Conclusions Quantitative NLP models of published abstracts pertaining to pain may point to trends and gaps within pain research communities.

Author(s):  
Sameerah Talafha ◽  
Banafsheh Rekabdar

Arabic poetry generation is a very challenging task since the linguistic structure of the Arabic language is considered a severe challenge for many researchers and developers in the Natural Language Processing (NLP) field. In this paper, we propose a poetry generation model with extended phonetic and semantic embeddings (Phonetic CNNsubword embeddings). We show that Phonetic CNNsubword embeddings have an effective contribution to the overall model performance compared to FastTextsubword embeddings. Our poetry generation model consists of a two-stage approach: (1.) generating the first verse which explicitly incorporates the theme related phrase, (2.) other verses generation with the proposed Hierarchy-Attention Sequence-to-Sequence model (HAS2S), which adequately capture word, phrase, and verse information between contexts. A comprehensive human evaluation confirms that the poems generated by our model outperform the base models in criteria such as Meaning, Coherence, Fluency, and Poeticness. Extensive quantitative experiments using Bi-Lingual Evaluation Understudy (BLEU) scores also demonstrate significant improvements over strong baselines.


Author(s):  
K.G.C.M Kooragama ◽  
L.R.W.D. Jayashanka ◽  
J.A. Munasinghe ◽  
K.W. Jayawardana ◽  
Muditha Tissera ◽  
...  

2021 ◽  
Author(s):  
Dilith Sasanka ◽  
H. K. N Malshani ◽  
Uchitha I. Wickramaratne ◽  
Yashmitha Kavindi ◽  
Muditha Tissera ◽  
...  

2015 ◽  
Vol 54 (04) ◽  
pp. 338-345 ◽  
Author(s):  
A. Fong ◽  
R. Ratwani

SummaryObjective: Patient safety event data repositories have the potential to dramatically improve safety if analyzed and leveraged appropriately. These safety event reports often consist of both structured data, such as general event type categories, and unstructured data, such as free text descriptions of the event. Analyzing these data, particularly the rich free text narratives, can be challenging, especially with tens of thousands of reports. To overcome the resource intensive manual review process of the free text descriptions, we demonstrate the effectiveness of using an unsupervised natural language processing approach.Methods: An unsupervised natural language processing technique, called topic modeling, was applied to a large repository of patient safety event data to identify topics, or themes, from the free text descriptions of the data. Entropy measures were used to evaluate and compare these topics to the general event type categories that were originally assigned by the event reporter.Results: Measures of entropy demonstrated that some topics generated from the un-supervised modeling approach aligned with the clinical general event type categories that were originally selected by the individual entering the report. Importantly, several new latent topics emerged that were not originally identified. The new topics provide additional insights into the patient safety event data that would not otherwise easily be detected.Conclusion: The topic modeling approach provides a method to identify topics or themes that may not be immediately apparent and has the potential to allow for automatic reclassification of events that are ambiguously classified by the event reporter.


10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


Sign in / Sign up

Export Citation Format

Share Document