scholarly journals Developing HL7 CDA-Based Data Warehouse for the Use of Electronic Health Record Data for Secondary Purposes

ACI Open ◽  
2019 ◽  
Vol 03 (01) ◽  
pp. e44-e62
Author(s):  
Fabrizio Pecoraro ◽  
Daniela Luzi ◽  
Fabrizio L. Ricci

Background The growing availability of clinical and administrative data collected in electronic health records (EHRs) have led researchers and policy makers to implement data warehouses to improve the reuse of EHR data for secondary purposes. This approach can take advantages from a unique source of information that collects data from providers across multiple organizations. Moreover, the development of a data warehouse benefits from the standards adopted to exchange data provided by heterogeneous systems. Objective This article aims to design and implement a conceptual framework that semiautomatically extracts information collected in Health Level 7 Clinical Document Architecture (CDA) documents stored in an EHR and transforms them to be loaded in a target data warehouse. Results The solution adopted in this article supports the integration of the EHR as an operational data store in a data warehouse infrastructure. Moreover, data structure of EHR clinical documents and the data warehouse modeling schemas are analyzed to define a semiautomatic framework that maps the primitives of the CDA with the concepts of the dimensional model. The case study successfully tests this approach. Conclusion The proposed solution guarantees data quality using structured documents already integrated in a large-scale infrastructure, with a timely updated information flow. It ensures data integrity and consistency and has the advantage to be based on a sample size that covers a broad target population. Moreover, the use of CDAs simplifies the definition of extract, transform, and load tools through the adoption of a conceptual framework that load the information stored in the CDA in the data warehouse.

2018 ◽  
pp. 1-12 ◽  
Author(s):  
Ashley Earles ◽  
Lin Liu ◽  
Ranier Bustamante ◽  
Pat Coke ◽  
Julie Lynch ◽  
...  

Purpose Cancer ascertainment using large-scale electronic health records is a challenge. Our aim was to propose and apply a structured approach for evaluating multiple candidate approaches for cancer ascertainment using colorectal cancer (CRC) ascertainment within the US Department of Veterans Affairs (VA) as a use case. Methods The proposed approach for evaluating cancer ascertainment strategies includes assessment of individual strategy performance, comparison of agreement across strategies, and review of discordant diagnoses. We applied this approach to compare three strategies for CRC ascertainment within the VA: administrative claims data consisting of International Classification of Diseases, Ninth Revision (ICD9) diagnosis codes; the VA Central Cancer Registry (VACCR); and the newly accessible Oncology Domain, consisting of cases abstracted by local cancer registrars. The study sample consisted of 1,839,043 veterans with index colonoscopy performed from 1999 to 2014. Strategy-specific performance was estimated based on manual record review of 100 candidate CRC cases and 100 colonoscopy controls. Strategies were further compared using Cohen’s κ and focused review of discordant CRC diagnoses. Results A total of 92,197 individuals met at least one CRC definition. All three strategies had high sensitivity and specificity for incident CRC. However, the ICD9-based strategy demonstrated poor positive predictive value (58%). VACCR and Oncology Domain had almost perfect agreement with each other (κ, 0.87) but only moderate agreement with ICD9-based diagnoses (κ, 0.51 and 0.57, respectively). Among discordant cases reviewed, 15% of ICD9-positive but VACCR- or Oncology Domain–negative cases had incident CRC. Conclusion Evaluating novel strategies for identifying cancer requires a structured approach, including validation against manual record review, agreement among candidate strategies, and focused review of discordant findings. Without careful assessment of ascertainment methods, analyses may be subject to bias and limited in clinical impact.


2020 ◽  
Vol 16 (3) ◽  
pp. 531-540 ◽  
Author(s):  
Thomas H. McCoy ◽  
Larry Han ◽  
Amelia M. Pellegrini ◽  
Rudolph E. Tanzi ◽  
Sabina Berretta ◽  
...  

JAMIA Open ◽  
2019 ◽  
Vol 2 (4) ◽  
pp. 570-579 ◽  
Author(s):  
Na Hong ◽  
Andrew Wen ◽  
Feichen Shen ◽  
Sunghwan Sohn ◽  
Chen Wang ◽  
...  

Abstract Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Rishi J. Desai ◽  
Michael E. Matheny ◽  
Kevin Johnson ◽  
Keith Marsolo ◽  
Lesley H. Curtis ◽  
...  

AbstractThe Sentinel System is a major component of the United States Food and Drug Administration’s (FDA) approach to active medical product safety surveillance. While Sentinel has historically relied on large quantities of health insurance claims data, leveraging longitudinal electronic health records (EHRs) that contain more detailed clinical information, as structured and unstructured features, may address some of the current gaps in capabilities. We identify key challenges when using EHR data to investigate medical product safety in a scalable and accelerated way, outline potential solutions, and describe the Sentinel Innovation Center’s initiatives to put solutions into practice by expanding and strengthening the existing system with a query-ready, large-scale data infrastructure of linked EHR and claims data. We describe our initiatives in four strategic priority areas: (1) data infrastructure, (2) feature engineering, (3) causal inference, and (4) detection analytics, with the goal of incorporating emerging data science innovations to maximize the utility of EHR data for medical product safety surveillance.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Randi Foraker ◽  
Sejal Patel ◽  
Yosef Khan ◽  
Mary Ann Bauman ◽  
Julie Bower

Background: Electronic health records (EHRs) are an increasingly valuable data source for monitoring population health. However, EHR data are rarely shared across health system borders, limiting their utility to researchers and policymakers. The Guideline Advantage™ (TGA) program, a joint initiative by the American Heart Association (AHA), American Cancer Society, and American Diabetes Association, brings together data from EHRs across the country to support disease prevention and management efforts in the outpatient setting. Methods: We analyzed TGA EHR data from >70 clinics comprising 281,837 adult patients from 2010 to 2015. We used the first available measure per patient for each calendar year to characterize trends in the proportion of patients in “ideal”, “intermediate”, and “poor” CVH categories for blood pressure (BP), body mass index (BMI) and smoking. Total cholesterol and fasting glucose values were not reported to TGA. Thus, we used low-density lipoprotein (LDL) and hemoglobin A1c (A1c) treatment guidelines to classify patients into CVH categories for the respective metrics. Results: Patients were an average of 50 years old, and 57.4% were female. Of records with complete data on race, 70.9% of patients were white. Over 6 years of observation, we documented increases in the proportion of patients at ideal levels for BP, smoking, LDL, and A1c, but decreases in the proportion of patients at an ideal level for BMI (Figure). Conclusions: TGA data provide a large-scale perspective of outpatient CVH, yet we acknowledge limitations associated with using EHR data to assess trends in CVH. Specifically, EHR data entry is clinically-driven - BP and BMI values are likely to be updated at each visit for each patient, while smoking status, LDL, and A1c are not. Our analysis lays the groundwork for EHR analyses as these data become less siloed and more accessible to stakeholders. Figure. Trends in CVH from 2010 to 2015: The Guideline Advantage™


2021 ◽  
pp. 103879
Author(s):  
Lin Liu ◽  
Ranier Bustamante ◽  
Ashley Earles ◽  
Joshua Demb ◽  
Karen Messer ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document