scholarly journals MIMIC in the OMOP Common Data Model

Author(s):  
nicolas paris ◽  
adrien parrot

Objectives : In the era of big data, the intensive care unit (ICU) is very likely to benefit from real-time computer analysis and modeling based on close patient mon- itoring and Electronic Health Record data. MIMIC is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools and experience to be shared. OMOP-CDM is spreading all over the world. The objective was to evaluate the difficulty to transform MIMIC into an OMOP (MIMIC-OMOP) database and the benefits of this transformation for analysts. Material & Method: A documented, tested, versioned, exemplified and open repository has been set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48- hour datathon. Result: With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables have been standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon with 160 participants and 15,000 requests executed with a maximum duration of one minute. Conclusion: The resulting MIMIC-OMOP dataset is the first MIMIC-OMOP dataset available free of charge with real disidentified data ready for replicable in- tensive care research. This approach can be generalized to any medical field.

2021 ◽  
Author(s):  
Nicolas Paris ◽  
Antoine Lamer ◽  
Adrien Parrot

BACKGROUND In the era of big data, the intensive care unit (ICU) is very likely to benefit from real-time computer analysis and modeling based on close patient monitoring and Electronic Health Record data. MIMIC is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools and experience to be shared. OMOP-CDM is spreading all over the world. OBJECTIVE The objective was to to transform MIMIC into an OMOP database, and to evaluate the benefits of this transformation for analysts. METHODS We transformed MIMIC (version 1.4.21) in the OMOP format (5.3.3.1), through a semantic and structural mapping. The structural mapping aimed at moving the MIMIC data into the right place in OMOP with some data transformations. It parted into three phases: conception, implementation and evaluation. The conceptual mapping aimed at aligning the MIMIC local terminologies to the OMOP's standard ones. It consisted of three phases: integration, alignment and evaluation. A documented, tested, versioned, exemplified and open repository has been set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48-hour datathon. RESULTS With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables have been standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon with 160 participants and 15,000 requests executed with a maximum duration of one minute. CONCLUSIONS The resulting MIMIC-OMOP dataset is the first MIMIC-OMOP dataset available free of charge with real disidentified data ready for replicable intensive care research. This approach can be generalized to any medical field.


Author(s):  
Eugenia Rinaldi ◽  
Sylvia Thun

HiGHmed is a German Consortium where eight University Hospitals have agreed to the cross-institutional data exchange through novel medical informatics solutions. The HiGHmed Use Case Infection Control group has modelled a set of infection-related data in the openEHR format. In order to establish interoperability with the other German Consortia belonging to the same national initiative, we mapped the openEHR information to the Fast Healthcare Interoperability Resources (FHIR) format recommended within the initiative. FHIR enables fast exchange of data thanks to the discrete and independent data elements into which information is organized. Furthermore, to explore the possibility of maximizing analysis capabilities for our data set, we subsequently mapped the FHIR elements to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). The OMOP data model is designed to support the conduct of research to identify and evaluate associations between interventions and outcomes caused by these interventions. Mapping across standard allows to exploit their peculiarities while establishing and/or maintaining interoperability. This article provides an overview of our experience in mapping infection control related data across three different standards openEHR, FHIR and OMOP CDM.


2020 ◽  
Author(s):  
Tjardo D Maarseveen ◽  
Timo Meinderink ◽  
Marcel J T Reinders ◽  
Johannes Knitza ◽  
Tom W J Huizinga ◽  
...  

BACKGROUND Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. OBJECTIVE The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. METHODS Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. RESULTS For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). CONCLUSIONS We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.


2015 ◽  
Vol 06 (03) ◽  
pp. 536-547 ◽  
Author(s):  
F.S. Resnic ◽  
S.L. Robbins ◽  
J. Denton ◽  
L. Nookala ◽  
D. Meeker ◽  
...  

SummaryBackground: Adoption of a common data model across health systems is a key infrastructure requirement to allow large scale distributed comparative effectiveness analyses. There are a growing number of common data models (CDM), such as Mini-Sentinel, and the Observational Medical Outcomes Partnership (OMOP) CDMs.Objective: In this case study, we describe the challenges and opportunities of a study specific use of the OMOP CDM by two health systems and describe three comparative effectiveness use cases developed from the CDM.Methods: The project transformed two health system databases (using crosswalks provided) into the OMOP CDM. Cohorts were developed from the transformed CDMs for three comparative effectiveness use case examples. Administrative/billing, demographic, order history, medication, and laboratory were included in the CDM transformation and cohort development rules.Results: Record counts per person month are presented for the eligible cohorts, highlighting differences between the civilian and federal datasets, e.g. the federal data set had more outpatient visits per person month (6.44 vs. 2.05 per person month). The count of medications per person month reflected the fact that one system‘s medications were extracted from orders while the other system had pharmacy fills and medication administration records. The federal system also had a higher prevalence of the conditions in all three use cases. Both systems required manual coding of some types of data to convert to the CDM.Conclusion: The data transformation to the CDM was time consuming and resources required were substantial, beyond requirements for collecting native source data. The need to manually code subsets of data limited the conversion. However, once the native data was converted to the CDM, both systems were then able to use the same queries to identify cohorts. Thus, the CDM minimized the effort to develop cohorts and analyze the results across the sites.FitzHenry F, Resnic FS, Robbins SL, Denton J, Nookala L, Meeker D, Ohno-Machado L, Matheny ME. A Case Report on Creating a Common Data Model for Comparative Effectiveness with the Observational Medical Outcomes Partnership. Appl Clin Inform 2015; 6: 536–547http://dx.doi.org/10.4338/ACI-2014-12-CR-0121


2021 ◽  
pp. 256-265
Author(s):  
Julien Guérin ◽  
Yec'han Laizet ◽  
Vincent Le Texier ◽  
Laetitia Chanas ◽  
Bastien Rance ◽  
...  

PURPOSE Many institutions throughout the world have launched precision medicine initiatives in oncology, and a large amount of clinical and genomic data is being produced. Although there have been attempts at data sharing with the community, initiatives are still limited. In this context, a French task force composed of Integrated Cancer Research Sites (SIRICs), comprehensive cancer centers from the Unicancer network (one of Europe's largest cancer research organization), and university hospitals launched an initiative to improve and accelerate retrospective and prospective clinical and genomic data sharing in oncology. MATERIALS AND METHODS For 5 years, the OSIRIS group has worked on structuring data and identifying technical solutions for collecting and sharing them. The group used a multidisciplinary approach that included weekly scientific and technical meetings over several months to foster a national consensus on a minimal data set. RESULTS The resulting OSIRIS set and event-based data model, which is able to capture the disease course, was built with 67 clinical and 65 omics items. The group made it compatible with the HL7 Fast Healthcare Interoperability Resources (FHIR) format to maximize interoperability. The OSIRIS set was reviewed, approved by a National Plan Strategic Committee, and freely released to the community. A proof-of-concept study was carried out to put the OSIRIS set and Common Data Model into practice using a cohort of 300 patients. CONCLUSION Using a national and bottom-up approach, the OSIRIS group has defined a model including a minimal set of clinical and genomic data that can be used to accelerate data sharing produced in oncology. The model relies on clear and formally defined terminologies and, as such, may also benefit the larger international community.


10.2196/23930 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e23930
Author(s):  
Tjardo D Maarseveen ◽  
Timo Meinderink ◽  
Marcel J T Reinders ◽  
Johannes Knitza ◽  
Tom W J Huizinga ◽  
...  

Background Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. Objective The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. Methods Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. Results For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). Conclusions We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.


Author(s):  
Hanning Wang ◽  
Weixiang Xu ◽  
Chaolong Jia

Railway distributed system integration needs to realize information exchange, resources sharing and coordination process across fields, departments and application systems. And railway data integration is essential to implement this integration. In order to resolve the problem of heterogeneity of data models among data sources of different railway operation systems, this paper presents a novel integration data model of spatial structure, a XML-oriented 3-dimension common data model. The proposed model accommodates both the flexibility of level relationship and syntax expression in data integration. In this model, a spatial data pattern is used to describe and express the characteristic relationship of data items among all types of data. Based on the data model with rooted directed graph and the organization of level as well as the flexibility of the expression, the model can represent the mapping between different data models, including relationship model and object-oriented model. A consistent concept and algebraic description of the data set is given to function as the metadata in data integration, so that the algebraic manipulation of data integration is standardized to support the data integration of distributed system.


2021 ◽  
Vol 12 (01) ◽  
pp. 057-064
Author(s):  
Christian Maier ◽  
Lorenz A. Kapsner ◽  
Sebastian Mate ◽  
Hans-Ulrich Prokosch ◽  
Stefan Kraus

Abstract Background The identification of patient cohorts for recruiting patients into clinical trials requires an evaluation of study-specific inclusion and exclusion criteria. These criteria are specified depending on corresponding clinical facts. Some of these facts may not be present in the clinical source systems and need to be calculated either in advance or at cohort query runtime (so-called feasibility query). Objectives We use the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) as the repository for our clinical data. However, Atlas, the graphical user interface of OMOP, does not offer the functionality to perform calculations on facts data. Therefore, we were in search for a different approach. The objective of this study is to investigate whether the Arden Syntax can be used for feasibility queries on the OMOP CDM to enable on-the-fly calculations at query runtime, to eliminate the need to precalculate data elements that are involved with researchers' criteria specification. Methods We implemented a service that reads the facts from the OMOP repository and provides it in a form which an Arden Syntax Medical Logic Module (MLM) can process. Then, we implemented an MLM that applies the eligibility criteria to every patient data set and outputs the list of eligible cases (i.e., performs the feasibility query). Results The study resulted in an MLM-based feasibility query that identifies cases of overventilation as an example of how an on-the-fly calculation can be realized. The algorithm is split into two MLMs to provide the reusability of the approach. Conclusion We found that MLMs are a suitable technology for feasibility queries on the OMOP CDM. Our method of performing on-the-fly calculations can be employed with any OMOP instance and without touching existing infrastructure like the Extract, Transform and Load pipeline. Therefore, we think that it is a well-suited method to perform on-the-fly calculations on OMOP.


2019 ◽  
Author(s):  
Yue Yu ◽  
Kathryn Ruddy ◽  
Aaron Mansfield ◽  
Nansu Zong ◽  
Andrew Wen ◽  
...  

BACKGROUND Immune checkpoint inhibitors are associated with unique immune-related adverse events (irAEs). As most of the immune checkpoint inhibitors are new to the market, it is important to conduct studies using real-world data sources to investigate their safety profiles. OBJECTIVE The aim of the study was to develop a framework for signal detection and filtration of novel irAEs for 6 Food and Drug Administration–approved immune checkpoint inhibitors. METHODS In our framework, we first used the Food and Drug Administration’s Adverse Event Reporting System (FAERS) standardized in an Observational Health Data Sciences and Informatics (OHDSI) common data model (CDM) to collect immune checkpoint inhibitor-related event data and conducted irAE signal detection. OHDSI CDM is a standard-driven data model that focuses on transforming different databases into a common format and standardizing medical terms to a common representation. We then filtered those already known irAEs from drug labels and literature by using a customized text-mining pipeline based on clinical text analysis and knowledge extraction system with Medical Dictionary for Regulatory Activities (MedDRA) as a dictionary. Finally, we classified the irAE detection results into three different categories to discover potentially new irAE signals. RESULTS By our text-mining pipeline, 490 irAE terms were identified from drug labels, and 918 terms were identified from the literature. In addition, of the 94 positive signals detected using CDM-based FAERS, 53 signals (56%) were labeled signals, 10 (11%) were unlabeled published signals, and 31 (33%) were potentially new signals. CONCLUSIONS We demonstrated that our approach is effective for irAE signal detection and filtration. Moreover, our CDM-based framework could facilitate adverse drug events detection and filtration toward the goal of next-generation pharmacovigilance that seamlessly integrates electronic health record data for improved signal detection.


Sign in / Sign up

Export Citation Format

Share Document