Psychiatric stressor recognition from clinical notes to reveal association with suicide

Suicide takes the lives of nearly a million people each year and it is a tremendous economic burden globally. One important type of suicide risk factor is psychiatric stress. Prior studies mainly use survey data to investigate the association between suicide and stressors. Very few studies have investigated stressor data in electronic health records, mostly due to the data being recorded in narrative text. This study takes the initiative to automatically extract and classify psychiatric stressors from clinical text using natural language processing–based methods. Suicidal behaviors were also identified by keywords. Then, a statistical association analysis between suicide ideations/attempts and stressors extracted from a clinical corpus is conducted. Experimental results show that our natural language processing method could recognize stressor entities with an F-measure of 89.01 percent. Mentions of suicidal behaviors were identified with an F-measure of 97.3 percent. The top three significant stressors associated with suicide are health, pressure, and death, which are similar to previous studies. This study demonstrates the feasibility of using natural language processing approaches to unlock information from psychiatric notes in electronic health record, to facilitate large-scale studies about associations between suicide and psychiatric stressors.

Download Full-text

Systematic review of current natural language processing methods and applications in cardiology

Heart ◽

10.1136/heartjnl-2021-319769 ◽

2021 ◽

pp. heartjnl-2021-319769

Author(s):

Meghan Reading Turchioe ◽

Alexander Volodarskiy ◽

Jyotishman Pathak ◽

Drew N Wright ◽

James Enlou Tcheng ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Care ◽

Real World Data ◽

Clinical Text ◽

Clinical Notes ◽

Artery Disease ◽

Automated Methods

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview (Preprint)

10.2196/preprints.23375 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yanshan Wang ◽

Sunyang Fu ◽

Feichen Shen ◽

Sam Henry ◽

Ozlem Uzuner ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Shared Task ◽

Data Set ◽

Clinical Text ◽

Clinical Notes ◽

Clinical Domain ◽

Semantic Textual Similarity

BACKGROUND Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.

Download Full-text

The portability of natural language processing methods to detect suicidality from unstructured clinical text in US and UK electronic health records

10.21203/rs.3.rs-155430/v1 ◽

2021 ◽

Author(s):

Marika Cusick ◽

Sumithra Velupillai ◽

Johnny Downs ◽

Thomas Campion ◽

Rina Dutta ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Healthcare Systems ◽

Risk Identification ◽

Health Records ◽

Clinical Text ◽

Academic Medical ◽

Electronic Health

Abstract In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. In this study, we developed a process to share NLP approaches that were individually developed at King’s College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.

Download Full-text

Using natural language processing of clinical text to enhance identification of opioid‐related overdoses in electronic health records data

Pharmacoepidemiology and Drug Safety ◽

10.1002/pds.4810 ◽

2019 ◽

Vol 28 (8) ◽

pp. 1143-1151 ◽

Cited By ~ 1

Author(s):

Brian Hazlehurst ◽

Carla A. Green ◽

Nancy A. Perrin ◽

John Brandes ◽

David S. Carrell ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Health Records ◽

Clinical Text ◽

Electronic Health

Download Full-text

Abstract P259: Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records

Circulation ◽

10.1161/circ.141.suppl_1.p259 ◽

2020 ◽

Vol 141 (Suppl_1) ◽

Author(s):

Yiqing Zhao ◽

Sunyang Fu ◽

Suzette J Bielinski ◽

Paul Decker ◽

Alanna M Chamberlain ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Language Processing ◽

Stroke Incidence ◽

Clinical Notes ◽

Feature Sets ◽

Electronic Health

Background: The focus of most existing phenotyping algorithms based on electronic health record (EHR) data has been to accurately identify cases and non-cases of specific diseases. However, a more challenging task is to accurately identify disease incidence, as identifying the first occurrence of disease is more important for efficient and valid clinical and epidemiological research. Moreover, stroke is a challenging phenotype due to diagnosis difficulty and common miscoding. This task generally requires utilization of multiple types of EHR data (e.g., diagnoses and procedure codes, unstructured clinical notes) and a more robust algorithm integrating both natural language processing and machine learning. In this study, we developed and validated an EHR-based classifier to accurately identify stroke incidence among a cohort of atrial fibrillation (AF) patients Methods: We developed a stroke phenotyping algorithm using International Classification of Diseases, Ninth Revision (ICD-9) codes, Current Procedural Terminology (CPT) codes, and expert-provided keywords as model features. Structured data was extracted from Rochester Epidemiology Project (REP) database. Natural Language Processing (NLP) was used to extract and validate keyword occurrence in clinical notes. A window of ±30 days was considered when including/excluding keywords/codes into the input vector. Frequencies of keywords/codes were used as input feature sets for model training. Multiple competing models were trained using various combinations of feature sets and two machine learning algorithms: logistic regression and random forest. Training data were provided by two nurse abstractors and included validated stroke incidences from a previously established atrial fibrillation cohort. Precision, recall, and F-score of the algorithm were calculated to assess and compare model performances. Results: Among 4,914 patients with atrial fibrillation, 1,773 patients were screened. 3,141 patients had no stroke-related codes or keywords and were presumed to be free of stroke during follow-up. Among the screened patients, 740 had validated strokes and 1,033 did not have a stroke based on review of the EHR by trained nurse abstractors. The best performing stroke incidence phenotyping classifier utilized Keywords+ICD-9+CPT features using a random forest classifier, achieving a precision of 0.942, recall of 0.943, and F-score of 0.943. Conclusion: In conclusion, we developed and validated a stroke algorithm that performed well for identifying stroke incidence in an enriched population (AF cohort), which extends beyond the typical binary case/non-case stroke identification problem. Future work will involve testing the generalizability of this algorithm in a general population.

Download Full-text

Large-Scale Identification of Aortic Stenosis and its Severity Using Natural Language Processing on Electronic Health Records

Cardiovascular Digital Health Journal ◽

10.1016/j.cvdhj.2021.03.003 ◽

2021 ◽

Author(s):

Matthew D. Solomon ◽

Grace Tabada ◽

Amanda Allen ◽

Sue Hee Sung ◽

Alan S. Go

Keyword(s):

Natural Language Processing ◽

Aortic Stenosis ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Health Records ◽

Electronic Health

Download Full-text

Natural language processing for structuring clinical text data on depression using UK-CRIS

Evidence-Based Mental Health ◽

10.1136/ebmental-2019-300134 ◽

2020 ◽

Vol 23 (1) ◽

pp. 21-26 ◽

Cited By ~ 6

Author(s):

Nemanja Vaci ◽

Qiang Liu ◽

Andrey Kormilitzin ◽

Franco De Crescenzo ◽

Ayse Kurtulmus ◽

...

Keyword(s):

Natural Language Processing ◽

Active Learning ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Real World ◽

Statistical Models ◽

Health Records ◽

Clinical Text ◽

Electronic Health

BackgroundUtilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.ObjectiveOur aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.MethodsWe used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.FindingsResults show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.ConclusionsThis study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.Clinical implicationsReal-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.

Download Full-text

Best Paper Selection

Yearbook of Medical Informatics ◽

10.1055/s-0038-1641129 ◽

2017 ◽

Vol 26 (01) ◽

pp. e21-e22

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Clinical Evidence ◽

Scale Analysis ◽

Clinical Text ◽

Large Scale Analysis ◽

Open Source Framework

Althoff, T, Clark K, Leskovec, J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist 2016(4):463-76 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361062/ Kilicoglu, H, Demner-Fushman, D. Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text. PLoS One. 2016 Mar 2;11(3):e0148538 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0148538 Morid, MA, Fiszman, M, Raja, K, Jonnalagadda, SR, Del Fiol, G. Classification of clinically useful sentences in clinical evidence resources. J Biomed Inform. 2016 Apr;60:14-22 http://www.sciencedirect.com/science/article/pii/S1532046416000046?via%3Dihub Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM. Identification, characterization, and grounding of gradable terms in clinical text. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 2016:17-26 https://www.semanticscholar.org/paper/Identification-characterization-and-grounding-of-g-Shivade-Marneffe/c00ba120de1964b444807255030741d199ba6e04 Wu, Y, Denny, JC, Rosenbloom, ST, Miller, RA, Giuse, DA, Wang, L, Blanquicett, C, Soysal, E, Xu, J, Xu, H. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017 Apr 1;24(e1):e79-e86 https://academic.oup.com/jamia/article-abstract/24/e1/e79/2631496/A-long-journey-to-short-abbreviations-developing?redirectedFrom=fulltext

Download Full-text