Automated Extraction of VTE Events From Narrative Radiology Reports in Electronic Health Records

Abstract Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Download Full-text

3 - Automated Extraction of Sudden Death Risk Phenotypes from Electronic Health Records of Patients With Hypertrophic Cardiomyopathy

10.26226/morressier.5d19cfb157558b317a10dd7b ◽

2019 ◽

Author(s):

Riea Moon ◽

Sijia Liu, PhD ◽

Sujith Samudrala, MD ◽

Jeffrey B. Geske, MD ◽

Peter Noseworthy ◽

...

Keyword(s):

Hypertrophic Cardiomyopathy ◽

Sudden Death ◽

Electronic Health Records ◽

Automated Extraction ◽

Health Records ◽

Death Risk ◽

Electronic Health

Download Full-text

353 A new era of data extraction: Example of automated extraction PSA values from electronic health records

European Urology Supplements ◽

10.1016/s1569-9056(16)60355-x ◽

2016 ◽

Vol 15 (3) ◽

pp. e353

Author(s):

S-R. Leyh-Bannurah ◽

P. Dell'Oglio ◽

Z. Tian ◽

M. Graefen ◽

H. Huland ◽

...

Keyword(s):

Electronic Health Records ◽

Data Extraction ◽

Automated Extraction ◽

Health Records ◽

New Era ◽

Electronic Health

Download Full-text

Preventing unnecessary imaging in patients suspect of coronary artery disease through machine learning of electronic health records

European Heart Journal - Digital Health ◽

10.1093/ehjdh/ztab103 ◽

2021 ◽

Author(s):

L Malin Overmars ◽

Bram van Es ◽

Floor Groepenhoff ◽

Mark C H De Groot ◽

Gerard Pasterkamp ◽

...

Keyword(s):

Coronary Artery Disease ◽

Coronary Artery ◽

Electronic Health Records ◽

Single Photon ◽

False Negative ◽

Photon Emission ◽

Health Records ◽

Radiology Reports ◽

Electronic Health ◽

Artery Disease

Abstract Introduction With the aging European population, the incidence of coronary artery disease (CAD) is expected to rise. This will likely result in an increased imaging use. Symptom recognition can be complicated, as symptoms caused by CAD can be atypical, particularly in women. Early CAD exclusion may help to optimize use of diagnostic resources and thus improve the sustainability of the healthcare system. Objective To develop sex-stratified algorithms, trained on routinely available electronic health records, raw electrocardiograms, and hematology data to exclude CAD in patients upfront. Methods We trained XGBoost algorithms on data from patients from the Utrecht Patient-Oriented Database, who underwent coronary computed tomography angiography (CCTA), and/or stress cardiac magnetic resonance (CMR) imaging or stress single-photon emission computerized tomography (SPECT) in the UMC Utrecht. Outcomes were extracted from radiology reports. We aimed to maximize negative predictive value (NPV) to minimize the false negative risk with acceptable specificity. Results Of 6,808 CCTA patients (31% female), 1029 females (48%) and 1908 males (45%) had no diagnosis of CAD. Of 3,053 CMR/SPECT patients (45% female), 650 females (47%) and 881 males (48%) had no diagnosis of CAD. On the train and test set, the CCTA models achieved NPVs and specificities of 0.95 and 0.19 (females) and 0.96 and 0.09 (males). The CMR/SPECT models achieved NPVs and specificities of 0.75 and 0.041 (females) and 0.92 and 0.026 (males). Conclusion CAD can be excluded from EHRs with high NPV. Our study demonstrates new possibilities to reduce unnecessary imaging in women and men suspected of CAD.

Download Full-text

State-of-the-art automated extraction of detailed pathological data from narratively written electronic health records

European Urology Supplements ◽

10.1016/s1569-9056(18)31684-1 ◽

2018 ◽

Vol 17 (2) ◽

pp. e1209

Author(s):

S.-R. Leyh-Bannurah ◽

Z. Tian ◽

P.I. Karakiewicz ◽

U. Wolffgang ◽

D. Pehrke ◽

...

Keyword(s):

Electronic Health Records ◽

State Of The Art ◽

Automated Extraction ◽

Health Records ◽

Electronic Health

Download Full-text

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application

Journal of Medical Internet Research ◽

10.2196/10497 ◽

2018 ◽

Vol 20 (11) ◽

pp. e10497 ◽

Cited By ~ 6

Author(s):

Gondy Leroy ◽

Yang Gu ◽

Sydney Pettygrove ◽

Maureen K Galindo ◽

Ananyaa Arora ◽

...

Keyword(s):

Autism Spectrum Disorders ◽

Electronic Health Records ◽

Diagnostic Criteria ◽

Autism Spectrum ◽

Automated Extraction ◽

Health Records ◽

Spectrum Disorders ◽

Electronic Health ◽

Development Evaluation

Download Full-text

MP70-01 STATE-OF-THE-ART AUTOMATED EXTRACTION OF DETAILED PATHOLOGICAL DATA FROM NARRATIVELY WRITTEN ELECTRONIC HEALTH RECORDS

The Journal of Urology ◽

10.1016/j.juro.2018.02.2245 ◽

2018 ◽

Vol 199 (4S) ◽

Author(s):

Sami-Ramzi Leyh-Bannurah ◽

Tian Zhe ◽

Pierre Karakiewicz ◽

Ulrich Wolffgang ◽

Dirk Pehrke ◽

...

Keyword(s):

Electronic Health Records ◽

State Of The Art ◽

Automated Extraction ◽

Health Records ◽

Electronic Health

Download Full-text

De-identifying Spanish medical texts - Named Entity Recognition applied to radiology reports

10.1101/2020.04.09.20058958 ◽

2020 ◽

Author(s):

Irene Pérez-Díez ◽

Raúl Pérez-Moraga ◽

Adolfo López-Cerdán ◽

Jose-Maria Salinas-Serrano ◽

María de la Iglesia-Vayá

Keyword(s):

Electronic Health Records ◽

English Language ◽

Personal Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Medical Texts ◽

Health Records ◽

Named Entity ◽

Radiology Reports ◽

Electronic Health

Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Along-side, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it can be easily extended to other languages and medical texts, such as electronic health records.

Download Full-text