Identification of Atherosclerotic and Cardiovascular Clinical Phenotypes in Spanish Electronic Health Records: Assessment of an Automated Information Extraction System. (Preprint)
BACKGROUND Research efforts to develop strategies to effectively identify patients and to reduce the burden of cardiovascular diseases is essential for the future of the health system. Most research studies have used only coded parts of electronic health records (EHRs) for case-detection, obtaining missed data cases and reducing study quality. Incorporating information from free-text into case-detection through Natural Language Processing (NLP) techniques improves research quality. SAVANA was born as an innovating data-driven system based on NLP and big data techniques designed to retrieve prominent biomedical information from narratives clinic notes and to maximize the huge amount of information contained in Spanish EHRs. OBJECTIVE The aim of this work if to assess the performance of SAVANA when identifying concepts within the cardiovascular domain in Spanish EHRs. METHODS SAVANA is a platform for acceleration of clinical research, based on real-time dynamic exploitation of all the information contained in EHRs corpora that uses its own technology (EHRead) to allow unstructured information contained in EHRs to be analysed and expressed by means of medical concepts that contain the most significant information in the text. RESULTS The evaluation corpus consisted of a stratified random sample of patients from 3 Spanish sites. For site 01, the corpus contained a total of 280 mentions of cardiovascular clinical entities, where 249 were correctly identified, obtaining a P=0.93. In site 02, SAVANA correctly detected 53 mentions of cardiovascular entities among 57 annotations, achieving a P=0.98; and in site 03, among 165 manual annotations, 75 were correctly identified, yielding a P= 0.99. CONCLUSIONS This research clearly demonstrates the ability of SAVANA at identifying mentions of atherosclerotic/cardiovascular clinical phenotype in Spanish EHRs, as well as retrieving patients and records related to this pathology.