scholarly journals A deep database of medical abbreviations and acronyms for natural language processing

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Lisa Grossman Liu ◽  
Raymond H. Grossman ◽  
Elliot G. Mitchell ◽  
Chunhua Weng ◽  
Karthik Natarajan ◽  
...  

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.

Heart ◽  
2021 ◽  
pp. heartjnl-2021-319769
Author(s):  
Meghan Reading Turchioe ◽  
Alexander Volodarskiy ◽  
Jyotishman Pathak ◽  
Drew N Wright ◽  
James Enlou Tcheng ◽  
...  

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.


2019 ◽  
Vol 26 (11) ◽  
pp. 1272-1278 ◽  
Author(s):  
Dmitriy Dligach ◽  
Majid Afshar ◽  
Timothy Miller

Abstract Objective Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks. Materials and Methods Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task. Results We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. Discussion We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task. Conclusions We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.


2017 ◽  
Vol 25 (3) ◽  
pp. 331-336 ◽  
Author(s):  
Ergin Soysal ◽  
Jingqi Wang ◽  
Min Jiang ◽  
Yonghui Wu ◽  
Serguei Pakhomov ◽  
...  

Abstract Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.


Author(s):  
Yanshan Wang ◽  
Sunyang Fu ◽  
Feichen Shen ◽  
Sam Henry ◽  
Ozlem Uzuner ◽  
...  

BACKGROUND Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of <i>r</i>=.9010, <i>r</i>=.8967, and <i>r</i>=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.


2021 ◽  
Author(s):  
Marika Cusick ◽  
Sumithra Velupillai ◽  
Johnny Downs ◽  
Thomas Campion ◽  
Rina Dutta ◽  
...  

Abstract In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. In this study, we developed a process to share NLP approaches that were individually developed at King’s College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.


2019 ◽  
Vol 28 (8) ◽  
pp. 1143-1151 ◽  
Author(s):  
Brian Hazlehurst ◽  
Carla A. Green ◽  
Nancy A. Perrin ◽  
John Brandes ◽  
David S. Carrell ◽  
...  

2020 ◽  
Vol 34 (02) ◽  
pp. 1741-1748 ◽  
Author(s):  
Meng-Hsuan Yu ◽  
Juntao Li ◽  
Danyang Liu ◽  
Dongyan Zhao ◽  
Rui Yan ◽  
...  

Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e.g., the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multi-pass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.


2017 ◽  
Vol 25 (1) ◽  
pp. 81-87 ◽  
Author(s):  
Gaurav Trivedi ◽  
Phuong Pham ◽  
Wendy W Chapman ◽  
Rebecca Hwa ◽  
Janyce Wiebe ◽  
...  

Abstract The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the “appendiceal-orifice” variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for “biopsy” ranged between 0.88 and 0.94 (−1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.


Sign in / Sign up

Export Citation Format

Share Document