Natural language processing for structuring clinical text data on depression using UK-CRIS

BackgroundUtilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.ObjectiveOur aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.MethodsWe used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.FindingsResults show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.ConclusionsThis study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.Clinical implicationsReal-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.

Download Full-text

The portability of natural language processing methods to detect suicidality from unstructured clinical text in US and UK electronic health records

10.21203/rs.3.rs-155430/v1 ◽

2021 ◽

Author(s):

Marika Cusick ◽

Sumithra Velupillai ◽

Johnny Downs ◽

Thomas Campion ◽

Rina Dutta ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Healthcare Systems ◽

Risk Identification ◽

Health Records ◽

Clinical Text ◽

Academic Medical ◽

Electronic Health

Abstract In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. In this study, we developed a process to share NLP approaches that were individually developed at King’s College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.

Download Full-text

Using natural language processing of clinical text to enhance identification of opioid‐related overdoses in electronic health records data

Pharmacoepidemiology and Drug Safety ◽

10.1002/pds.4810 ◽

2019 ◽

Vol 28 (8) ◽

pp. 1143-1151 ◽

Cited By ~ 1

Author(s):

Brian Hazlehurst ◽

Carla A. Green ◽

Nancy A. Perrin ◽

John Brandes ◽

David S. Carrell ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Health Records ◽

Clinical Text ◽

Electronic Health

Download Full-text

IDENTIFYING REASONS FOR STATIN NONADHERENCE IN A DIVERSE, REAL-WORLD POPULATION USING ELECTRONIC HEALTH RECORDS AND NATURAL LANGUAGE PROCESSING

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(21)03021-7 ◽

2021 ◽

Vol 77 (18) ◽

pp. 1665

Author(s):

Ashish Sarraju ◽

Jean Coquet ◽

Antonia Chan ◽

Summer Ngo ◽

Juan Antonio Lossio-Ventura ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Real World ◽

World Population ◽

Health Records ◽

Electronic Health

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

Abstract PO-050: Identifying de novo stage IV breast cancer (DNIV) cases in Electronic Health Records (EHR) using natural language processing

10.1158/1557-3265.adi21-po-050 ◽

2021 ◽

Author(s):

Liwei Wang ◽

Karthik Giridhar ◽

Kimberly Corbin ◽

Brenda Ernst ◽

Sadia Choudhery ◽

...

Keyword(s):

Breast Cancer ◽

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Stage Iv ◽

Health Records ◽

Stage Iv Breast Cancer ◽

Electronic Health

Download Full-text

Correction: Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

JMIR Medical Informatics ◽

10.2196/30153 ◽

2021 ◽

Vol 9 (5) ◽

pp. e30153

Author(s):

Maciej Rybinski ◽

Xiang Dai ◽

Sonit Singh ◽

Sarvnaz Karimi ◽

Anthony Nguyen

Keyword(s):

Natural Language Processing ◽

Family History ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Family History Information ◽

Health Records ◽

History Information ◽

Electronic Health

Download Full-text

Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability

Archives of Physical Medicine and Rehabilitation ◽

10.1016/j.apmr.2020.04.024 ◽

2020 ◽

Vol 101 (10) ◽

pp. 1739-1746 ◽

Cited By ~ 3

Author(s):

Nicole D. Agaronnik ◽

Charlotta Lindvall ◽

Areej El-Jawahri ◽

Wei He ◽

Lisa I. Iezzoni

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Processing Method ◽

Mobility Disability ◽

Health Records ◽

Electronic Health

Download Full-text

Tu1031 Natural Language Processing of Electronic Health Records Accurately Identifies Right Colon Hyperplastic Polyps for Potential Surveillance Reclassification

Gastroenterology ◽

10.1016/s0016-5085(14)62654-8 ◽

2014 ◽

Vol 146 (5) ◽

pp. S-732

Author(s):

Meena A. Prasad ◽

William Thompson ◽

Rajesh N. Keswani ◽

Abel Kho ◽

Ikuo Hirano ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Hyperplastic Polyps ◽

Right Colon ◽

Health Records ◽

Electronic Health

Download Full-text

Realizing the full potential of electronic health records: the role of natural language processing

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2011-000501 ◽

2011 ◽

Vol 18 (5) ◽

pp. 539-539 ◽

Cited By ~ 31

Author(s):

Lucila Ohno-Machado

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Full Potential ◽

Health Records ◽

Electronic Health

Download Full-text

Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing

Applied Sciences ◽

10.3390/app11198812 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8812

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Extraction Methods ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Electronic Health

Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research.

Download Full-text