Using UMLS for electronic health data standardization and database design

Abstract Objective Patients that undergo medical transfer represent 1 patient population that remains infrequently studied due to challenges in aggregating data across multiple domains and sources that are necessary to capture the entire episode of patient care. To facilitate access to and secondary use of transport patient data, we developed the Transport Data Repository that combines data from 3 separate domains and many sources within our health system. Methods The repository is a relational database anchored by the Unified Medical Language System unique concept identifiers to integrate, map, and standardize the data into a common data model. Primary data domains included sending and receiving hospital encounters, medical transport record, and custom hospital transport log data. A 4-step mapping process was developed: 1) automatic source code match, 2) exact text match, 3) fuzzy matching, and 4) manual matching. Results 431 090 total mappings were generated in the Transport Data Repository, consisting of 69 010 unique concepts with 77% of the data being mapped automatically. Transport Source Data yielded significantly lower mapping results with only 8% of data entities automatically mapped and a significant amount (43%) remaining unmapped. Discussion The multistep mapping process resulted in a majority of data been automatically mapped. Poor matching of transport medical record data is due to the third-party vendor data being generated and stored in a nonstandardized format. Conclusion The multistep mapping process developed and implemented is necessary to normalize electronic health data from multiple domains and sources into a common data model to support secondary use of data.

Download Full-text

Creation of a Multicenter Pediatric Inpatient Data Repository Derived from Electronic Health Records

Applied Clinical Informatics ◽

10.1055/s-0039-1688477 ◽

2019 ◽

Vol 10 (02) ◽

pp. 307-315 ◽

Cited By ~ 1

Author(s):

Christoph Hornik ◽

Andrew Atz ◽

Catherine Bendel ◽

Francis Chan ◽

Kevin Downes ◽

...

Keyword(s):

Electronic Health Records ◽

Data Model ◽

Data Repository ◽

Common Data Model ◽

Data Mapping ◽

Health Records ◽

Data Extract ◽

Pediatric Inpatient ◽

Electronic Health ◽

Data Elements

Background Integration of electronic health records (EHRs) data across sites and access to that data remain limited. Objective We developed an EHR-based pediatric inpatient repository using nine U.S. centers from the National Institute of Child Health and Human Development Pediatric Trials Network. Methods A data model encompassing 147 mandatory and 99 optional elements was developed to provide an EHR data extract of all inpatient encounters from patients <17 years of age discharged between January 6, 2013 and June 30, 2017. Sites received instructions on extractions, transformation, testing, and transmission to the coordinating center. Results We generated 177 staging reports to process all nine sites' 147 mandatory and 99 optional data elements to the repository. Based on 520 prespecified criteria, all sites achieved 0% errors and <2% warnings. The repository includes 386,159 inpatient encounters from 264,709 children to support study design and conduct of future trials in children. Conclusion Our EHR-based data repository of pediatric inpatient encounters utilized a customized data model heavily influenced by the PCORnet format, site-based data mapping, a comprehensive set of data testing rules, and an iterative process of data submission. The common data model, site-based extraction, and technical expertise were key to our success. Data from this repository will be used in support of Pediatric Trials Network studies and the labeling of drugs and devices for children.

Download Full-text

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation (Preprint)

10.2196/preprints.17376 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sijia Liu ◽

Yanshan Wang ◽

Andrew Wen ◽

Liwei Wang ◽

Na Hong ◽

...

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Model ◽

Structured Data ◽

Common Data Model ◽

Concept System ◽

Unstructured Text ◽

Electronic Health

BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.

Download Full-text

Assessment of inter-institutional post-operative hypoparathyroidism status using a common data model (Preprint)

10.2196/preprints.30635 ◽

2021 ◽

Author(s):

Joon-Hyop Lee ◽

Suhyun Kim ◽

Kwangsoo Kim ◽

Young Jun Chai ◽

Hyeong Won Yu ◽

...

Keyword(s):

Data Model ◽

Tumor Resection ◽

Health Data ◽

Common Data Model ◽

Tertiary Institutions ◽

Logistic Analysis ◽

Large Databases ◽

Tertiary Hospitals ◽

Tingling Sensation ◽

Transient Hypoparathyroidism

BACKGROUND Post-thyroidectomy hypoparathyroidism may result in various transient or permanent symptoms, ranging from tingling sensation to severe breathing difficulties. Its incidence varies among surgeons and institutions, making it difficult to determine its actual incidence and associated factors. OBJECTIVE This study attempted to estimate the incidence of post-operative hypoparathyroidism in patients at two tertiary institutions that share a common data model, the Observational Health Data Sciences and Informatics. METHODS This study used the Common Data Model to extract explicitly specified encoding and relationships among concepts using standardized vocabularies. The EDI-codes of various thyroid disorders and thyroid operations were extracted from two separate tertiary hospitals between January 2013 and December 2018. Patients were grouped into no evidence of/transient/permanent hypoparathyroidism groups to analyze the likelihood of hypoparathyroidism occurrence related to operation types and diagnosis RESULTS Of the 4848 eligible patients at the two institutions who underwent thyroidectomy, 1370 (28.26%) experienced transient hypoparathyroidism and 251 (5.18%) experienced persistent hypoparathyroidism. Univariate logistic regression analysis predicted that, relative to total bilateral thyroidectomy, radical tumor resection was associated with a 48% greater likelihood of transient hypoparathyroidism and a 102% greater likelihood of persistent hypoparathyroidism. Moreover, multivariate logistic analysis found that radical tumor resection was associated with a 50% greater likelihood of transient hypoparathyroidism and a 97% greater likelihood of persistent hypoparathyroidism than total bilateral thyroidectomy. CONCLUSIONS These findings, by integrating and analyzing two databases, suggest that this analysis could be expanded to include other large databases that share the same Observational Health Data Sciences and Informatics protocol.

Download Full-text

Identification of patients with hemoglobin SS/Sβ0 thalassemia disease and pain crises within electronic health records

Blood Advances ◽

10.1182/bloodadvances.2018017541 ◽

2018 ◽

Vol 2 (11) ◽

pp. 1172-1179 ◽

Cited By ~ 2

Author(s):

Ashima Singh ◽

Javier Mora ◽

Julie A. Panepinto

Keyword(s):

Electronic Health Records ◽

Acute Care ◽

Sensitivity And Specificity ◽

Data Model ◽

High Sensitivity ◽

Common Data Model ◽

Health Records ◽

Key Points ◽

Electronic Health

Key Points The algorithms have high sensitivity and specificity to identify patients with hemoglobin SS/Sβ0 thalassemia and acute care pain encounters. Codes conforming to common data model are provided to facilitate adoption of algorithms and standardize definitions for EHR-based research.

Download Full-text

Pharmacovigilance and Clinical Environment: Utilizing OMOP-CDM and OHDSI Software Stack to Integrate EHR Data

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210232 ◽

2021 ◽

Author(s):

Vlasios K. Dimitriadis ◽

George I. Gavriilidis ◽

Pantelis Natsiavas

Keyword(s):

Information Technology ◽

Electronic Health Records ◽

Drug Safety ◽

Data Model ◽

Health Data ◽

Common Data Model ◽

Clinical Environment ◽

Drug Reactions ◽

Medical Outcomes ◽

Integrate Data

Information Technology (IT) and specialized systems could have a prominent role towards the support of drug safety processes, both in the clinical context but also beyond that. PVClinical project aims to build an IT platform, enabling the investigation of potential Adverse Drug Reactions (ADRs). In this paper, we outline the utilization of Observational Medical Outcomes Partnership – Common Data Model (OMOP-CDM) and the openly available Observational Health Data Sciences and Informatics (OHDSI) software stack as part of PVClinical platform. OMOP-CDM offers the capacity to integrate data from Electronic Health Records (EHRs) (e.g., encounters, patients, providers, diagnoses, drugs, measurements and procedures) via an accepted data model. Furthermore, the OHDSI software stack provides valuable analytics tools which could be used to address important questions regarding drug safety quickly and efficiently, enabling the investigation of potential ADRs in the clinical environment.

Download Full-text

Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models (Preprint)

10.2196/preprints.19597 ◽

2020 ◽

Cited By ~ 1

Author(s):

Seungho Jeon ◽

Jeongeun Seo ◽

Sukyoung Kim ◽

Jeongmoon Lee ◽

Jong-Ho Kim ◽

...

Keyword(s):

Cloud Computing ◽

Data Model ◽

Personal Information ◽

Health Data ◽

Common Data Model ◽

Patient Privacy ◽

Medical Outcomes ◽

Cloud Computing System ◽

Privacy Models ◽

Identification Strategy

BACKGROUND De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. OBJECTIVE This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. METHODS The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. RESULTS The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. CONCLUSIONS Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Download Full-text

Characterization of Anti-seizure Medication Treatment Pathways in Pediatric Epilepsy Using the Electronic Health Record-Based Common Data Model

Frontiers in Neurology ◽

10.3389/fneur.2020.00409 ◽

2020 ◽

Vol 11 ◽

Author(s):

Hunmin Kim ◽

Sooyoung Yoo ◽

Yonghoon Jeon ◽

Soyoung Yi ◽

Seok Kim ◽

...

Keyword(s):

Electronic Health Record ◽

Data Model ◽

Pediatric Epilepsy ◽

Common Data Model ◽

Health Record ◽

Treatment Pathways ◽

Medication Treatment ◽

Electronic Health

Download Full-text

Analysis of antiseizure drug‐related adverse reactions from the electronic health record using the common data model

Epilepsia ◽

10.1111/epi.16472 ◽

2020 ◽

Vol 61 (4) ◽

pp. 610-616 ◽

Cited By ~ 3

Author(s):

Sun Ah Choi ◽

Hunmin Kim ◽

Seok Kim ◽

Sooyoung Yoo ◽

Soyoung Yi ◽

...

Keyword(s):

Electronic Health Record ◽

Data Model ◽

Adverse Reactions ◽

Common Data Model ◽

Health Record ◽

The Common ◽

Electronic Health

Download Full-text

Use of electronic health records to evaluate treatment pathways - a Common Data Model approach

10.26226/morressier.5e20352214799ab1e7e07ab7 ◽

2020 ◽

Author(s):

Tan Hui Xing ◽

Cynthia Sung ◽

Haroun Chahed ◽

Desmond Teo ◽

Tan Su Yin Doreen ◽

...

Keyword(s):

Electronic Health Records ◽

Data Model ◽

Common Data Model ◽

Health Records ◽

Treatment Pathways ◽

Electronic Health ◽

Model Approach

Download Full-text

Extract, Transform, Load Framework for the Conversion of Health Databases to OMOP

10.1101/2021.04.08.21255178 ◽

2021 ◽

Author(s):

Juan C. Quiroz ◽

Tim Chard ◽

Zhisheng Sa ◽

Angus Ritchie ◽

Louisa Jorm ◽

...

Keyword(s):

Data Model ◽

Web Application ◽

Research Community ◽

Common Data Model ◽

Medical Outcomes ◽

Data Manipulation ◽

Community Needs ◽

Mapping Process ◽

The Cost ◽

Manipulation Language

ABSTRACTObjectiveDevelop an extract, transform, load (ETL) framework for the conversion of health databases to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) that supports transparency of the mapping process, readability, refactoring, and maintainability.Materials and MethodsWe propose an ETL framework that is metadata-driven and generic across source datasets. The ETL framework reads mapping logic for OMOP tables from YAML files, which organize SQL snippets in key-value pairs that define the extract and transform logic to populate OMOP columns.ResultsWe developed a data manipulation language (DML) for writing the mapping logic from health datasets to OMOP, which defines mapping operations on a column-by-column basis. A core ETL pipeline converts the DML in YAML files and generates an ETL script. We provide access to our ETL framework via a web application, allowing users to upload and edit YAML files and obtain an ETL SQL script that can be used in development environments.DiscussionThe structure of the DML and the mapping operations defined in column-by-column operations maximizes readability, refactoring, and maintainability, while minimizing technical debt, and standardizes the writing of ETL operations for mapping to OMOP. Our web application allows institutions and teams to reuse the ETL pipeline by writing their own rules using our DML.ConclusionThe research community needs tools that reduce the cost and time effort needed to map datasets to OMOP. These tools must support transparency of the mapping process for mapping efforts to be reused by different institutions.

Download Full-text