scholarly journals Using UMLS for electronic health data standardization and database design

2020 ◽  
Vol 27 (10) ◽  
pp. 1520-1528 ◽  
Author(s):  
Andrew P Reimer ◽  
Alex Milinovich

Abstract Objective Patients that undergo medical transfer represent 1 patient population that remains infrequently studied due to challenges in aggregating data across multiple domains and sources that are necessary to capture the entire episode of patient care. To facilitate access to and secondary use of transport patient data, we developed the Transport Data Repository that combines data from 3 separate domains and many sources within our health system. Methods The repository is a relational database anchored by the Unified Medical Language System unique concept identifiers to integrate, map, and standardize the data into a common data model. Primary data domains included sending and receiving hospital encounters, medical transport record, and custom hospital transport log data. A 4-step mapping process was developed: 1) automatic source code match, 2) exact text match, 3) fuzzy matching, and 4) manual matching. Results 431 090 total mappings were generated in the Transport Data Repository, consisting of 69 010 unique concepts with 77% of the data being mapped automatically. Transport Source Data yielded significantly lower mapping results with only 8% of data entities automatically mapped and a significant amount (43%) remaining unmapped. Discussion The multistep mapping process resulted in a majority of data been automatically mapped. Poor matching of transport medical record data is due to the third-party vendor data being generated and stored in a nonstandardized format. Conclusion The multistep mapping process developed and implemented is necessary to normalize electronic health data from multiple domains and sources into a common data model to support secondary use of data.

2019 ◽  
Vol 10 (02) ◽  
pp. 307-315 ◽  
Author(s):  
Christoph Hornik ◽  
Andrew Atz ◽  
Catherine Bendel ◽  
Francis Chan ◽  
Kevin Downes ◽  
...  

Background Integration of electronic health records (EHRs) data across sites and access to that data remain limited. Objective We developed an EHR-based pediatric inpatient repository using nine U.S. centers from the National Institute of Child Health and Human Development Pediatric Trials Network. Methods A data model encompassing 147 mandatory and 99 optional elements was developed to provide an EHR data extract of all inpatient encounters from patients <17 years of age discharged between January 6, 2013 and June 30, 2017. Sites received instructions on extractions, transformation, testing, and transmission to the coordinating center. Results We generated 177 staging reports to process all nine sites' 147 mandatory and 99 optional data elements to the repository. Based on 520 prespecified criteria, all sites achieved 0% errors and <2% warnings. The repository includes 386,159 inpatient encounters from 264,709 children to support study design and conduct of future trials in children. Conclusion Our EHR-based data repository of pediatric inpatient encounters utilized a customized data model heavily influenced by the PCORnet format, site-based data mapping, a comprehensive set of data testing rules, and an iterative process of data submission. The common data model, site-based extraction, and technical expertise were key to our success. Data from this repository will be used in support of Pediatric Trials Network studies and the labeling of drugs and devices for children.


Author(s):  
Sijia Liu ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
Liwei Wang ◽  
Na Hong ◽  
...  

BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.


2021 ◽  
Author(s):  
Joon-Hyop Lee ◽  
Suhyun Kim ◽  
Kwangsoo Kim ◽  
Young Jun Chai ◽  
Hyeong Won Yu ◽  
...  

BACKGROUND Post-thyroidectomy hypoparathyroidism may result in various transient or permanent symptoms, ranging from tingling sensation to severe breathing difficulties. Its incidence varies among surgeons and institutions, making it difficult to determine its actual incidence and associated factors. OBJECTIVE This study attempted to estimate the incidence of post-operative hypoparathyroidism in patients at two tertiary institutions that share a common data model, the Observational Health Data Sciences and Informatics. METHODS This study used the Common Data Model to extract explicitly specified encoding and relationships among concepts using standardized vocabularies. The EDI-codes of various thyroid disorders and thyroid operations were extracted from two separate tertiary hospitals between January 2013 and December 2018. Patients were grouped into no evidence of/transient/permanent hypoparathyroidism groups to analyze the likelihood of hypoparathyroidism occurrence related to operation types and diagnosis RESULTS Of the 4848 eligible patients at the two institutions who underwent thyroidectomy, 1370 (28.26%) experienced transient hypoparathyroidism and 251 (5.18%) experienced persistent hypoparathyroidism. Univariate logistic regression analysis predicted that, relative to total bilateral thyroidectomy, radical tumor resection was associated with a 48% greater likelihood of transient hypoparathyroidism and a 102% greater likelihood of persistent hypoparathyroidism. Moreover, multivariate logistic analysis found that radical tumor resection was associated with a 50% greater likelihood of transient hypoparathyroidism and a 97% greater likelihood of persistent hypoparathyroidism than total bilateral thyroidectomy. CONCLUSIONS These findings, by integrating and analyzing two databases, suggest that this analysis could be expanded to include other large databases that share the same Observational Health Data Sciences and Informatics protocol.


2018 ◽  
Vol 2 (11) ◽  
pp. 1172-1179 ◽  
Author(s):  
Ashima Singh ◽  
Javier Mora ◽  
Julie A. Panepinto

Key Points The algorithms have high sensitivity and specificity to identify patients with hemoglobin SS/Sβ0 thalassemia and acute care pain encounters. Codes conforming to common data model are provided to facilitate adoption of algorithms and standardize definitions for EHR-based research.


Author(s):  
Vlasios K. Dimitriadis ◽  
George I. Gavriilidis ◽  
Pantelis Natsiavas

Information Technology (IT) and specialized systems could have a prominent role towards the support of drug safety processes, both in the clinical context but also beyond that. PVClinical project aims to build an IT platform, enabling the investigation of potential Adverse Drug Reactions (ADRs). In this paper, we outline the utilization of Observational Medical Outcomes Partnership – Common Data Model (OMOP-CDM) and the openly available Observational Health Data Sciences and Informatics (OHDSI) software stack as part of PVClinical platform. OMOP-CDM offers the capacity to integrate data from Electronic Health Records (EHRs) (e.g., encounters, patients, providers, diagnoses, drugs, measurements and procedures) via an accepted data model. Furthermore, the OHDSI software stack provides valuable analytics tools which could be used to address important questions regarding drug safety quickly and efficiently, enabling the investigation of potential ADRs in the clinical environment.


Author(s):  
Seungho Jeon ◽  
Jeongeun Seo ◽  
Sukyoung Kim ◽  
Jeongmoon Lee ◽  
Jong-Ho Kim ◽  
...  

BACKGROUND De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. OBJECTIVE This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. METHODS The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. RESULTS The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. CONCLUSIONS Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.


Epilepsia ◽  
2020 ◽  
Vol 61 (4) ◽  
pp. 610-616 ◽  
Author(s):  
Sun Ah Choi ◽  
Hunmin Kim ◽  
Seok Kim ◽  
Sooyoung Yoo ◽  
Soyoung Yi ◽  
...  

2021 ◽  
Author(s):  
Juan C. Quiroz ◽  
Tim Chard ◽  
Zhisheng Sa ◽  
Angus Ritchie ◽  
Louisa Jorm ◽  
...  

ABSTRACTObjectiveDevelop an extract, transform, load (ETL) framework for the conversion of health databases to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) that supports transparency of the mapping process, readability, refactoring, and maintainability.Materials and MethodsWe propose an ETL framework that is metadata-driven and generic across source datasets. The ETL framework reads mapping logic for OMOP tables from YAML files, which organize SQL snippets in key-value pairs that define the extract and transform logic to populate OMOP columns.ResultsWe developed a data manipulation language (DML) for writing the mapping logic from health datasets to OMOP, which defines mapping operations on a column-by-column basis. A core ETL pipeline converts the DML in YAML files and generates an ETL script. We provide access to our ETL framework via a web application, allowing users to upload and edit YAML files and obtain an ETL SQL script that can be used in development environments.DiscussionThe structure of the DML and the mapping operations defined in column-by-column operations maximizes readability, refactoring, and maintainability, while minimizing technical debt, and standardizes the writing of ETL operations for mapping to OMOP. Our web application allows institutions and teams to reuse the ETL pipeline by writing their own rules using our DML.ConclusionThe research community needs tools that reduce the cost and time effort needed to map datasets to OMOP. These tools must support transparency of the mapping process for mapping efforts to be reused by different institutions.


Sign in / Sign up

Export Citation Format

Share Document