HAI-Proactive: Development of an Automated Surveillance System for Healthcare-Associated Infections in Sweden

Pontus Naucler; Suzanne D. van der Werff; John Valik; Logan Ward; Anders Ternhag; Hideyuki Tanushi; Aikaterini Mougkou; Elda Sparrelid; Mads Mogensen; Aron Henriksson; Hercules Dalianis; Brian Pickering; Vitaly Herasevich; Anders Johansson; Emil Thiman

doi:10.1017/ice.2020.519

HAI-Proactive: Development of an Automated Surveillance System for Healthcare-Associated Infections in Sweden

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.519 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s39-s39

Author(s):

Pontus Naucler ◽

Suzanne D. van der Werff ◽

John Valik ◽

Logan Ward ◽

Anders Ternhag ◽

...

Keyword(s):

Positive Predictive Value ◽

Electronic Health Record ◽

Predictive Value ◽

Surveillance System ◽

Free Text ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health ◽

Healthcare Associated

Background: Healthcare-associated infection (HAI) surveillance is essential for most infection prevention programs and continuous epidemiological data can be used to inform healthcare personal, allocate resources, and evaluate interventions to prevent HAIs. Many HAI surveillance systems today are based on time-consuming and resource-intensive manual reviews of patient records. The objective of HAI-proactive, a Swedish triple-helix innovation project, is to develop and implement a fully automated HAI surveillance system based on electronic health record data. Furthermore, the project aims to develop machine-learning–based screening algorithms for early prediction of HAI at the individual patient level. Methods: The project is performed with support from Sweden’s Innovation Agency in collaboration among academic, health, and industry partners. Development of rule-based and machine-learning algorithms is performed within a research database, which consists of all electronic health record data from patients admitted to the Karolinska University Hospital. Natural language processing is used for processing free-text medical notes. To validate algorithm performance, manual annotation was performed based on international HAI definitions from the European Center for Disease Prevention and Control, Centers for Disease Control and Prevention, and Sepsis-3 criteria. Currently, the project is building a platform for real-time data access to implement the algorithms within Region Stockholm. Results: The project has developed a rule-based surveillance algorithm for sepsis that continuously monitors patients admitted to the hospital, with a sensitivity of 0.89 (95% CI, 0.85–0.93), a specificity of 0.99 (0.98–0.99), a positive predictive value of 0.88 (0.83–0.93), and a negative predictive value of 0.99 (0.98–0.99). The healthcare-associated urinary tract infection surveillance algorithm, which is based on free-text analysis and negations to define symptoms, had a sensitivity of 0.73 (0.66–0.80) and a positive predictive value of 0.68 (0.61–0.75). The sensitivity and positive predictive value of an algorithm based on significant bacterial growth in urine culture only was 0.99 (0.97–1.00) and 0.39 (0.34–0.44), respectively. The surveillance system detected differences in incidences between hospital wards and over time. Development of surveillance algorithms for pneumonia, catheter-related infections and Clostridioides difficile infections, as well as machine-learning–based models for early prediction, is ongoing. We intend to present results from all algorithms. Conclusions: With access to electronic health record data, we have shown that it is feasible to develop a fully automated HAI surveillance system based on algorithms using both structured data and free text for the main healthcare-associated infections.Funding: Sweden’s Innovation Agency and Stockholm County CouncilDisclosures: None

Download Full-text

Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study (Preprint)

10.2196/preprints.23930 ◽

2020 ◽

Author(s):

Tjardo D Maarseveen ◽

Timo Meinderink ◽

Marcel J T Reinders ◽

Johannes Knitza ◽

Tom W J Huizinga ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Electronic Health Record ◽

Support Vector ◽

Free Text ◽

Health Record ◽

Electronic Health Record Data ◽

Data Set ◽

Record Data ◽

Electronic Health

BACKGROUND Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. OBJECTIVE The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. METHODS Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. RESULTS For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). CONCLUSIONS We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.

Download Full-text

Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study

JMIR Medical Informatics ◽

10.2196/23930 ◽

2020 ◽

Vol 8 (11) ◽

pp. e23930

Author(s):

Tjardo D Maarseveen ◽

Timo Meinderink ◽

Marcel J T Reinders ◽

Johannes Knitza ◽

Tom W J Huizinga ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Electronic Health Record ◽

Support Vector ◽

Free Text ◽

Health Record ◽

Electronic Health Record Data ◽

Data Set ◽

Record Data ◽

Electronic Health

Background Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. Objective The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. Methods Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. Results For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). Conclusions We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.

Download Full-text

Automated chronic disease surveillance and visualization using electronic health record data

Emerging Health Threats Journal ◽

10.3402/ehtj.v4i0.11102 ◽

2011 ◽

Vol 4 (0) ◽

Author(s):

Michael Klompas ◽

Chaim Kirby ◽

Jason McVetta ◽

Paul Oppedisano ◽

John Brownstein ◽

...

Keyword(s):

Chronic Disease ◽

Electronic Health Record ◽

Disease Surveillance ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Faculty Opinions recommendation of Evaluating delivery of low tidal volume ventilation in six icus using electronic health record data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734212415.793572533 ◽

2020 ◽

Author(s):

Jeremy Beitler

Keyword(s):

Electronic Health Record ◽

Tidal Volume ◽

Health Record ◽

Electronic Health Record Data ◽

Low Tidal Volume ◽

Low Tidal Volume Ventilation ◽

Volume Ventilation ◽

Record Data ◽

Electronic Health

Download Full-text

Clinical Comparison Between Trial Participants and Potentially Eligible Patients Using Electronic Health Record Data: A Generalizability Assessment Method

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103822 ◽

2021 ◽

pp. 103822

Author(s):

James R. Rogers ◽

George Hripcsak ◽

Ying Kuen Cheung ◽

Chunhua Weng

Keyword(s):

Electronic Health Record ◽

Assessment Method ◽

Health Record ◽

Electronic Health Record Data ◽

Clinical Comparison ◽

Record Data ◽

Electronic Health ◽

Trial Participants

Download Full-text

Leveraging electronic health record data to inform hospital resource management

Health Care Management Science ◽

10.1007/s10729-021-09554-4 ◽

2021 ◽

Author(s):

José Carlos Ferrão ◽

Mónica Duarte Oliveira ◽

Daniel Gartner ◽

Filipe Janela ◽

Henrique M. G. Martins

Keyword(s):

Resource Management ◽

Electronic Health Record ◽

Health Record ◽

Hospital Resource ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Predicting baby feeding method from unstructured electronic health record data

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics - DTMBIO '12 ◽

10.1145/2390068.2390075 ◽

2012 ◽

Cited By ~ 1

Author(s):

Ashwani Rao ◽

Kristin Maiden ◽

Ben Carterette ◽

Deb Ehrenthal

Keyword(s):

Electronic Health Record ◽

Health Record ◽

Electronic Health Record Data ◽

Feeding Method ◽

Record Data ◽

Electronic Health

Download Full-text

Automated Identification of Potential Candidates for Human Immunodeficiency Virus Pre-exposure Prophylaxis Using Electronic Health Record Data

Open Forum Infectious Diseases ◽

10.1093/ofid/ofw194.63 ◽

2016 ◽

Vol 3 (suppl_1) ◽

Cited By ~ 1

Author(s):

Douglas Krakower ◽

Susan Gruber ◽

John T. Menchaca ◽

Judith C. Maro ◽

Noelle Cocoros ◽

...

Keyword(s):

Human Immunodeficiency Virus ◽

Electronic Health Record ◽

Health Record ◽

Automated Identification ◽

Electronic Health Record Data ◽

Immunodeficiency Virus ◽

Record Data ◽

Electronic Health ◽

Exposure Prophylaxis

Download Full-text

Predicting need for hospital-specific interventional care after surgery using electronic health record data

Surgery ◽

10.1016/j.surg.2021.05.005 ◽

2021 ◽

Author(s):

Davy van de Sande ◽

Michel E. van Genderen ◽

C. Verhoef ◽

Jasper van Bommel ◽

Diederik Gommers ◽

...

Keyword(s):

Electronic Health Record ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Robust estimation of heterogeneous treatment effects using electronic health record data

Statistics in Medicine ◽

10.1002/sim.8926 ◽

2021 ◽

Author(s):

Ruohong Li ◽

Honglang Wang ◽

Wanzhu Tu

Keyword(s):

Electronic Health Record ◽

Robust Estimation ◽

Treatment Effects ◽

Health Record ◽

Electronic Health Record Data ◽

Heterogeneous Treatment Effects ◽

Record Data ◽

Electronic Health

Download Full-text