Biomedical Corpora and Natural Language Processing on Clinical Text in Languages Other Than English: A Systematic Review

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.

Download Full-text

Clinical Text Engineering Using Natural Language Processing Tools in Healthcare Domain: A Systematic Review

SSRN Electronic Journal ◽

10.2139/ssrn.3562946 ◽

2020 ◽

Author(s):

Vaishali M. Kumbhakarnaa ◽

Sonali Kulkarni ◽

Apurva D.Dhawaleb

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Text

Download Full-text

A deep database of medical abbreviations and acronyms for natural language processing

Scientific Data ◽

10.1038/s41597-021-00929-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Lisa Grossman Liu ◽

Raymond H. Grossman ◽

Elliot G. Mitchell ◽

Chunhua Weng ◽

Karthik Natarajan ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

American English ◽

Substantial Improvement ◽

Future Application ◽

Multiple Sources ◽

High Coverage ◽

Clinical Text ◽

Automated Quality Control

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Natural language processing for urban research: A systematic review

Heliyon ◽

10.1016/j.heliyon.2021.e06322 ◽

2021 ◽

Vol 7 (3) ◽

pp. e06322

Author(s):

Meng Cai

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Urban Research

Download Full-text

A Strategy for Deploying Secure Cloud-Based Natural Language Processing Systems for Applied Research Involving Clinical Text

2011 44th Hawaii International Conference on System Sciences ◽

10.1109/hicss.2011.32 ◽

2011 ◽

Cited By ~ 4

Author(s):

D Carrell

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Applied Research ◽

Clinical Text

Download Full-text

Making Natural Language Processing More Accessible for Analysis of Clinical Text

SciVee ◽

10.4016/30705.01 ◽

2011 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Text

Download Full-text

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz072 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1272-1278 ◽

Cited By ~ 2

Author(s):

Dmitriy Dligach ◽

Majid Afshar ◽

Timothy Miller

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Substance Misuse ◽

Research Direction ◽

Clinical Text ◽

Universal Properties ◽

Classification Tasks ◽

Performance Gains ◽

Clinical Natural Language Processing

Abstract Objective Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks. Materials and Methods Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task. Results We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. Discussion We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task. Conclusions We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.

Download Full-text

CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx132 ◽

2017 ◽

Vol 25 (3) ◽

pp. 331-336 ◽

Cited By ~ 64

Author(s):

Ergin Soysal ◽

Jingqi Wang ◽

Min Jiang ◽

Yonghui Wu ◽

Serguei Pakhomov ◽

...

Keyword(s):

Natural Language Processing ◽

User Interface ◽

Natural Language ◽

Language Processing ◽

High Performance ◽

Smoking Status ◽

Entity Recognition ◽

Graphic User Interface ◽

Clinical Text ◽

Clinical Natural Language Processing

Abstract Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

Download Full-text