scholarly journals Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Author(s):  
Zubair Afzal ◽  
Martijn J Schuemie ◽  
Jan C van Blijderveen ◽  
Elif F Sen ◽  
Miriam CJM Sturkenboom ◽  
...  
2016 ◽  
Vol 23 (5) ◽  
pp. 1007-1015 ◽  
Author(s):  
Elizabeth Ford ◽  
John A Carroll ◽  
Helen E Smith ◽  
Donia Scott ◽  
Jackie A Cassell

Abstract Background Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P  = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P  = .025). Conclusions Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall).


2018 ◽  
Vol 21 ◽  
pp. S372
Author(s):  
K Okamoto ◽  
K Goka ◽  
M Hirose ◽  
T Yamamoto ◽  
S Hiragi ◽  
...  

Author(s):  
Mohamed Abdalla ◽  
Hong Lu ◽  
Bogdan Pinzaru ◽  
Liisa Jaakkimainen

IntroductionReliable information about the time spent waiting for health care services is a critical metric for measuring health system performance. Wait times are a useful measure of access to various health care sectors. Alongside the increased adoption of electronic medical records (EMR) by Canadian family physicians (FP), is the secondary use of FP EMR data for research. However, using FP EMR data can be challenging in its unstructured, free-text format. Objectives and ApproachOur objective was to identify the target specialist physician type from the EMR FP referral note and then calculate wait times from a FP referral to a specialist physician visit. We used FP EMR data and linked to Ontario, Canada health administrative data (called EMRPC). EMRPC collects the entire clinical record from patients including the content of FP referral notes. We used machine learning (ML) methods to identify the type of specialist physician in which the referral was intended. Labels to test the ML methods were created from physicians’ claims data. Wait times were calculated from the FP EMR referral note date to the specialist physician claim date in administrative data. ResultsOur ML models’ ability to classify 2016 FP EMR referral notes to selected medical and surgical specialists achieved sensitivity and positive predictive values ranging from the high 70s to low 80s.Compared to earlier analyses from 2008, we observed a similar relative ordering to see specific specialist physicians. Overall, the median wait times have increased by 14 days on average, with a maximum increase of 28 days to see a gastroenterologist. Conclusion / ImplicationsThe accuracy of ML on unstructured FP EMR data is high, thereby providing a mechanism to “codifying” information in a timely manner. This information can help inform decision makers and providers about which patients or FP practices are experiencing long wait times in seeing specialist physicians.


Sign in / Sign up

Export Citation Format

Share Document