scholarly journals Evaluation of Smoking Status Identification Using Electronic Health Records and Open-Text Information in a Large Mental Health Case Register

PLoS ONE ◽  
2013 ◽  
Vol 8 (9) ◽  
pp. e74262 ◽  
Author(s):  
Chia-Yi Wu ◽  
Chin-Kuo Chang ◽  
Debbie Robson ◽  
Richard Jackson ◽  
Shaw-Ji Chen ◽  
...  
PLoS ONE ◽  
2017 ◽  
Vol 12 (2) ◽  
pp. e0171526 ◽  
Author(s):  
Yevgeniya Kovalchuk ◽  
Robert Stewart ◽  
Matthew Broadbent ◽  
Tim J. P. Hubbard ◽  
Richard J. B. Dobson

2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


2020 ◽  
Vol 118 ◽  
pp. 100-106 ◽  
Author(s):  
T. Katrien J. Groenhof ◽  
Laurien R. Koers ◽  
Enja Blasse ◽  
Mark de Groot ◽  
Diederick E. Grobbee ◽  
...  

2020 ◽  
Vol 107 ◽  
pp. 103429
Author(s):  
S.M. Goodday ◽  
A. Kormilitzin ◽  
N. Vaci ◽  
Q. Liu ◽  
A. Cipriani ◽  
...  

2021 ◽  
Vol 11 (19) ◽  
pp. 8812
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research.


Sign in / Sign up

Export Citation Format

Share Document