Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies

Stance detection is an important research direction which attempts to automatically determine the attitude (positive, negative, or neutral) of the author of text (such as tweets), towards a target. Nowadays, a number of frameworks have been proposed using deep learning techniques that show promising results in application domains such as automatic speech recognition and computer vision, as well as natural language processing (NLP). This article shows a novel deep learning-based fast stance detection framework in bipolar affinities on Twitter. It is noted that millions of tweets regarding Clinton and Trump were produced per day on Twitter during the 2016 United States presidential election campaign, and thus it is used as a test use case because of its significant and unique counter-factual properties. In addition, stance detection can be utilized to imply the political tendency of the general public. Experimental results show that the proposed framework achieves high accuracy results when compared to several existing stance detection methods.

Download Full-text

Best Paper Selection

Yearbook of Medical Informatics ◽

10.1055/s-0038-1641129 ◽

2017 ◽

Vol 26 (01) ◽

pp. e21-e22

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Clinical Evidence ◽

Scale Analysis ◽

Clinical Text ◽

Large Scale Analysis ◽

Open Source Framework

Althoff, T, Clark K, Leskovec, J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist 2016(4):463-76 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361062/ Kilicoglu, H, Demner-Fushman, D. Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text. PLoS One. 2016 Mar 2;11(3):e0148538 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0148538 Morid, MA, Fiszman, M, Raja, K, Jonnalagadda, SR, Del Fiol, G. Classification of clinically useful sentences in clinical evidence resources. J Biomed Inform. 2016 Apr;60:14-22 http://www.sciencedirect.com/science/article/pii/S1532046416000046?via%3Dihub Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM. Identification, characterization, and grounding of gradable terms in clinical text. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 2016:17-26 https://www.semanticscholar.org/paper/Identification-characterization-and-grounding-of-g-Shivade-Marneffe/c00ba120de1964b444807255030741d199ba6e04 Wu, Y, Denny, JC, Rosenbloom, ST, Miller, RA, Giuse, DA, Wang, L, Blanquicett, C, Soysal, E, Xu, J, Xu, H. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017 Apr 1;24(e1):e79-e86 https://academic.oup.com/jamia/article-abstract/24/e1/e79/2631496/A-long-journey-to-short-abbreviations-developing?redirectedFrom=fulltext

Download Full-text

Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system

BMJ Open ◽

10.1136/bmjopen-2018-023232 ◽

2019 ◽

Vol 9 (4) ◽

pp. e023232 ◽

Cited By ~ 7

Author(s):

Beata Fonferko-Shadrach ◽

Arron S Lacey ◽

Angus Roberts ◽

Ashley Akbari ◽

Simon Thompson ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Seizure Frequency ◽

Extraction System ◽

Health Board ◽

Free Text ◽

Specific Information ◽

Clinical Text ◽

Routinely Collected Data

ObjectiveRoutinely collected healthcare data are a powerful research resource but often lack detailed disease-specific information that is collected in clinical free text, for example, clinic letters. We aim to use natural language processing techniques to extract detailed clinical information from epilepsy clinic letters to enrich routinely collected data.DesignWe used the general architecture for text engineering (GATE) framework to build an information extraction system, ExECT (extraction of epilepsy clinical text), combining rule-based and statistical techniques. We extracted nine categories of epilepsy information in addition to clinic date and date of birth across 200 clinic letters. We compared the results of our algorithm with a manual review of the letters by an epilepsy clinician.SettingDe-identified and pseudonymised epilepsy clinic letters from a Health Board serving half a million residents in Wales, UK.ResultsWe identified 1925 items of information with overall precision, recall and F1 score of 91.4%, 81.4% and 86.1%, respectively. Precision and recall for epilepsy-specific categories were: epilepsy diagnosis (88.1%, 89.0%), epilepsy type (89.8%, 79.8%), focal seizures (96.2%, 69.7%), generalised seizures (88.8%, 52.3%), seizure frequency (86.3%–53.6%), medication (96.1%, 94.0%), CT (55.6%, 58.8%), MRI (82.4%, 68.8%) and electroencephalogram (81.5%, 75.3%).ConclusionsWe have built an automated clinical text extraction system that can accurately extract epilepsy information from free text in clinic letters. This can enhance routinely collected data for research in the UK. The information extracted with ExECT such as epilepsy type, seizure frequency and neurological investigations are often missing from routinely collected data. We propose that our algorithm can bridge this data gap enabling further epilepsy research opportunities. While many of the rules in our pipeline were tailored to extract epilepsy specific information, our methods can be applied to other diseases and also can be used in clinical practice to record patient information in a structured manner.

Download Full-text

Agile Natural Language Processing Model for Pathology Knowledge Extraction and Integration with Clinical Enterprise Data Warehouse

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) ◽

10.1109/snams.2019.8931828 ◽

2019 ◽

Author(s):

Ahmad Baghal ◽

Shaymaa Al-Shukri ◽

Annu Kumari

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Data Warehouse ◽

Language Processing ◽

Knowledge Extraction ◽

Enterprise Data Warehouse

Download Full-text

Efficient Large-Scale Stance Detection in Tweets

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2018070101 ◽

2018 ◽

Vol 9 (3) ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yilin Yan ◽

Jonathan Chen ◽

Mei-Ling Shyu

Keyword(s):

Deep Learning ◽

Language Processing ◽

Large Scale ◽

Research Direction ◽

Detection Methods ◽

Use Case ◽

Learning Techniques ◽

Test Use ◽

Presidential Election Campaign ◽

Important Research Direction

Stance detection is an important research direction which attempts to automatically determine the attitude (positive, negative, or neutral) of the author of text (such as tweets), towards a target. Nowadays, a number of frameworks have been proposed using deep learning techniques that show promising results in application domains such as automatic speech recognition and computer vision, as well as natural language processing (NLP). This article shows a novel deep learning-based fast stance detection framework in bipolar affinities on Twitter. It is noted that millions of tweets regarding Clinton and Trump were produced per day on Twitter during the 2016 United States presidential election campaign, and thus it is used as a test use case because of its significant and unique counter-factual properties. In addition, stance detection can be utilized to imply the political tendency of the general public. Experimental results show that the proposed framework achieves high accuracy results when compared to several existing stance detection methods.

Download Full-text

High-throughput Multimodal Automated Phenotyping (MAP) with Application to PheWAS

10.1101/587436 ◽

2019 ◽

Cited By ~ 2

Author(s):

Katherine P. Liao ◽

Jiehuan Sun ◽

Tianrun A. Cai ◽

Nicholas Link ◽

Chuan Hong ◽

...

Keyword(s):

Language Processing ◽

High Throughput ◽

Large Scale ◽

International Classification Of Diseases ◽

Mapping Method ◽

Map Algorithm ◽

Phenotype Definition ◽

Classification Of Diseases ◽

Icd Codes ◽

Map Approach

AbstractObjectiveElectronic health records (EHR) linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP).MethodWe developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the UMLS. Aggregated ICD and NLP counts along with healthcare utilization were jointly analyzed by fitting an ensemble of latent mixture models. The MAP algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying subjects with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort PheWAS for two SNPs with known associations.ResultsThe MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes.ConclusionThe MAP approach increased the accuracy of phenotype definition while maintaining scalability, facilitating use in studies requiring large scale phenotyping, such as PheWAS.

Download Full-text

Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts

JAMIA Open ◽

10.1093/jamiaopen/ooaa064 ◽

2020 ◽

Author(s):

Julian C Hong ◽

Andrew T Fairchild ◽

Jarred P Tanksley ◽

Manisha Palta ◽

Jessica D Tenenbaum

Keyword(s):

Radiation Therapy ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

High Accuracy ◽

Radiation Dermatitis ◽

Clinical Text ◽

Clinical Notes ◽

Oncology Research

Abstract Objectives Expert abstraction of acute toxicities is critical in oncology research but is labor-intensive and variable. We assessed the accuracy of a natural language processing (NLP) pipeline to extract symptoms from clinical notes compared to physicians. Materials and Methods Two independent reviewers identified present and negated National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) v5.0 symptoms from 100 randomly selected notes for on-treatment visits during radiation therapy with adjudication by a third reviewer. A NLP pipeline based on Apache clinical Text Analysis Knowledge Extraction System was developed and used to extract CTCAE terms. Accuracy was assessed by precision, recall, and F1. Results The NLP pipeline demonstrated high accuracy for common physician-abstracted symptoms, such as radiation dermatitis (F1 0.88), fatigue (0.85), and nausea (0.88). NLP had poor sensitivity for negated symptoms. Conclusion NLP accurately detects a subset of documented present CTCAE symptoms, though is limited for negated symptoms. It may facilitate strategies to more consistently identify toxicities during cancer therapy.

Download Full-text

Best Paper Selection

Yearbook of Medical Informatics ◽

10.1055/s-0037-1606508 ◽

2017 ◽

Vol 26 (01) ◽

pp. 233-234

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Clinical Evidence ◽

Scale Analysis ◽

Clinical Text ◽

Large Scale Analysis ◽

Open Source Framework

Althoff, T, Clark K, Leskovec, J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist 2016(4):463-76 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361062/ Kilicoglu, H, Demner-Fushman, D. Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text. PLoS One. 2016 Mar 2;11(3):e0148538 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0148538 Morid, MA, Fiszman, M, Raja, K, Jonnalagadda, SR, Del Fiol, G. Classification of clinically useful sentences in clinical evidence resources. J Biomed Inform. 2016 Apr;60:14-22 http://www.sciencedirect.com/science/article/pii/S1532046416000046?via%3Dihub Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM. Identification, characterization, and grounding of gradable terms in clinical text. Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 2016:17-26 https://www.semanticscholar.org/paper/Identification-characterization-and-grounding-of-g-Shivade-Marneffe/c00ba120de1964b444807255030741d199ba6e04 Wu, Y, Denny, JC, Rosenbloom, ST, Miller, RA, Giuse, DA, Wang, L, Blanquicett, C, Soysal, E, Xu, J, Xu, H. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017 Apr 1;24(e1):e79-e86 https://academic.oup.com/jamia/article-abstract/24/e1/e79/2631496/A-long-journey-to-short-abbreviations-developing?redirectedFrom=fulltext

Download Full-text

Psychiatric stressor recognition from clinical notes to reveal association with suicide

Health Informatics Journal ◽

10.1177/1460458218796598 ◽

2018 ◽

Vol 25 (4) ◽

pp. 1846-1862 ◽

Cited By ~ 3

Author(s):

Yaoyun Zhang ◽

Olivia R Zhang ◽

Rui Li ◽

Aaron Flores ◽

Salih Selek ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Suicidal Behaviors ◽

Statistical Association ◽

Clinical Text ◽

Clinical Notes ◽

Electronic Health ◽

F Measure

Suicide takes the lives of nearly a million people each year and it is a tremendous economic burden globally. One important type of suicide risk factor is psychiatric stress. Prior studies mainly use survey data to investigate the association between suicide and stressors. Very few studies have investigated stressor data in electronic health records, mostly due to the data being recorded in narrative text. This study takes the initiative to automatically extract and classify psychiatric stressors from clinical text using natural language processing–based methods. Suicidal behaviors were also identified by keywords. Then, a statistical association analysis between suicide ideations/attempts and stressors extracted from a clinical corpus is conducted. Experimental results show that our natural language processing method could recognize stressor entities with an F-measure of 89.01 percent. Mentions of suicidal behaviors were identified with an F-measure of 97.3 percent. The top three significant stressors associated with suicide are health, pressure, and death, which are similar to previous studies. This study demonstrates the feasibility of using natural language processing approaches to unlock information from psychiatric notes in electronic health record, to facilitate large-scale studies about associations between suicide and psychiatric stressors.

Download Full-text