scholarly journals Data mining information from electronic health records produced high yield and accuracy for current smoking status

2020 ◽  
Vol 118 ◽  
pp. 100-106 ◽  
Author(s):  
T. Katrien J. Groenhof ◽  
Laurien R. Koers ◽  
Enja Blasse ◽  
Mark de Groot ◽  
Diederick E. Grobbee ◽  
...  
2017 ◽  
Vol 6 (4) ◽  
pp. 389-400 ◽  
Author(s):  
Jingfeng Chen ◽  
Wei Wei ◽  
Chonghui Guo ◽  
Lin Tang ◽  
Leilei Sun

2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


PLoS ONE ◽  
2018 ◽  
Vol 13 (4) ◽  
pp. e0195901 ◽  
Author(s):  
Hyunyoung Baek ◽  
Minsu Cho ◽  
Seok Kim ◽  
Hee Hwang ◽  
Minseok Song ◽  
...  

PLoS ONE ◽  
2013 ◽  
Vol 8 (9) ◽  
pp. e74262 ◽  
Author(s):  
Chia-Yi Wu ◽  
Chin-Kuo Chang ◽  
Debbie Robson ◽  
Richard Jackson ◽  
Shaw-Ji Chen ◽  
...  

2014 ◽  
Vol 926-930 ◽  
pp. 1069-1072
Author(s):  
Liu Ning

The popularity and improvement of resident’s electronic health records plays a vital role in the improvement of human’s overall health. The basic situation of the construction of residents electronic health records and the implementation of the project progress in a few representative countries all over the world in recent years were been summarized and analyzed in this paper after thorough investigation and research. Some problems existing in the construction and Application process of residents electronic health records were been pointed out, and the development focus of residents electronic health records project in next few years were been pointed out to be the popularize and strengthen of the application of data mining.


2021 ◽  
Vol 11 (19) ◽  
pp. 8812
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research.


2018 ◽  
Vol 25 (2) ◽  
pp. 105-108 ◽  
Author(s):  
Pablo Millares Martin

BackgroundConsiderable interest exists on using general practice electronic health records (EHRs) for research and other uses. There is also concern on their quality.AimWe suggest a simple test to assess errors of commission and subsequently overall EHR data quality that can be done on a periodical basis.MethodPatient records with simultaneous entries of three different stages on smoking were studied. The codes ‘never smoked tobacco’, ‘smoker’ and ‘ex-smoker’ should follow this chronological order. It should then be possible to extrapolate the overall level of errors of commission for the organisation.ResultsThe smoking test in our sample found errors in 169 patients, with 60 cases where dual errors were discovered. We express it as an estimated error of commission level of 2.6% related to the total population of the practice.ConclusionsConsidering the constant and regular entries on smoking status (83.59% of the entries were done over last month), we can conclude smoking entries analysis can serve as a simple test to periodically assess the overall EHR data quality, and any trends.


2014 ◽  
Vol 32 (15_suppl) ◽  
pp. 6612-6612
Author(s):  
Tina Hernandez-Boussard ◽  
Suzanne Tamang ◽  
James D. Brooks ◽  
Douglas W Blayney ◽  
Nigam Shah

2021 ◽  
pp. 00167-2021
Author(s):  
Shanya Sivakumaran ◽  
Mohammad A. Alsallakh ◽  
Ronan A. Lyons ◽  
Jennifer K. Quint ◽  
Gwyneth A. Davies

Although routinely collected electronic health records (EHR) are widely used to examine outcomes related to chronic obstructive pulmonary disease (COPD), consensus regarding the identification of cases from electronic healthcare databases is lacking. We systematically examine and summarise approaches from the recent literature.MEDLINE via EBSCOhost was searched for COPD-related studies using EHR published from January 1, 2018 to November 30, 2019. Data were extracted relating to the case definition of COPD and determination of COPD severity and phenotypes.From 185 eligible studies, we found widespread variation in the definitions used to identify people with COPD in terms of code sets used (with 20 different code sets in use based on the ICD-10 classification alone) and requirement of additional criteria (relating to age (n=139), medication (n=31), multiplicity of events (n=21), spirometry (n=19) and smoking status (n=9)). Only 7 studies used a case definition which had been validated against a reference standard in the same dataset. Various proxies of disease severity were used since spirometry results and patient-reported outcomes were not often available.To enable the research community to draw reliable insights from electronic health records and aid comparability between studies, clear reporting and greater consistency of the definitions used to identify COPD and related outcome measures is key.


Sign in / Sign up

Export Citation Format

Share Document