The Pipeline for Standardizing Russian Unstructured Allergy Anamnesis Using FHIR AllergyIntolerance Resource

Author(s):  
Iuliia D. Lenivtceva ◽  
Georgy Kopanitsa

Abstract Background The larger part of essential medical knowledge is stored as free text which is complicated to process. Standardization of medical narratives is an important task for data exchange, integration, and semantic interoperability. Objectives The article aims to develop the end-to-end pipeline for structuring Russian free-text allergy anamnesis using international standards. Methods The pipeline for free-text data standardization is based on FHIR (Fast Healthcare Interoperability Resources) and SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) to ensure semantic interoperability. The pipeline solves common tasks such as data preprocessing, classification, categorization, entities extraction, and semantic codes assignment. Machine learning methods, rule-based, and dictionary-based approaches were used to compose the pipeline. The pipeline was evaluated on 166 randomly chosen medical records. Results AllergyIntolerance resource was used to represent allergy anamnesis. The module for data preprocessing included the dictionary with over 90,000 words, including specific medication terms, and more than 20 regular expressions for errors correction, classification, and categorization modules resulted in four dictionaries with allergy terms (total 2,675 terms), which were mapped to SNOMED CT concepts. F-scores for different steps are: 0.945 for filtering, 0.90 to 0.96 for allergy categorization, 0.90 and 0.93 for allergens reactions extraction, respectively. The allergy terminology coverage is more than 95%. Conclusion The proposed pipeline is a step to ensure semantic interoperability of Russian free-text medical records and could be effective in standardization systems for further data exchange and integration.

2010 ◽  
Vol 49 (02) ◽  
pp. 186-195 ◽  
Author(s):  
P. Hanzlícek ◽  
P. Precková ◽  
A. Ríha ◽  
M. Dioszegi ◽  
L. Seidl ◽  
...  

Summary Objectives: The data interchange in the Czech healthcare environment is mostly based on national standards. This paper describes a utilization of international standards and nomenclatures for building a pilot semantic interoperability platform (SIP) that would serve to exchange information among electronic health record systems (EHR-Ss) in Czech healthcare. The work was performed by the national research project of the “Information Society” program. Methods: At the beginning of the project a set of requirements the SIP should meet was formulated. Several communication standards (open EHR, HL7 v3, DICOM) were analyzed and HL7 v3 was selected to exchange health records in our solution. Two systems were included in our pilot environment: WinMedicalc 2000 and ADAMEKj EHR. Results: HL7-based local information models were created to describe the information content of both systems. The concepts from our original information models were mapped to coding systems supported by HL7 (LOINC, SNOMED CT and ICD-10) and the data exchange via HL7 v3 messages was implemented and tested by querying patient administration data. As a gateway between local EHR systems and the HL7 message-based infrastructure, a configurable HL7 Broker was developed. Conclusions: A nationwide implementation of a full-scale SIP based on HL7 v3 would include adopting and translating appropriate international coding systems and nomenclatures, and developing implementation guidelines facilitating the migration from national standards to international ones. Our pilot study showed that our approach is feasible but it would demand a huge effort to fully integrate the Czech healthcare system into the European e-health context.


2020 ◽  
Author(s):  
Emma Chavez ◽  
Vanessa Perez ◽  
Angélica Urrutia

BACKGROUND : Currently, hypertension is one of the diseases with greater risk of mortality in the world. Particularly in Chile, 90% of the population with this disease has idiopathic or essential hypertension. Essential hypertension is characterized by high blood pressure rates and it´s cause is unknown, which means that every patient might requires a different treatment, depending on their history and symptoms. Different data, such as history, symptoms, exams, etc., are generated for each patient suffering from the disease. This data is presented in the patient’s medical record, in no order, making it difficult to search for relevant information. Therefore, there is a need for a common, unified vocabulary of the terms that adequately represent the diseased, making searching within the domain more effective. OBJECTIVE The objective of this study is to develop a domain ontology for essential hypertension , therefore arranging the more significant data within the domain as tool for medical training or to support physicians’ decision making will be provided. METHODS The terms used for the ontology were extracted from the medical history of de-identified medical records, of patients with essential hypertension. The Snomed-CT’ collection of medical terms, and clinical guidelines to control the disease were also used. Methontology was used for the design, classes definition and their hierarchy, as well as relationships between concepts and instances. Three criteria were used to validate the ontology, which also helped to measure its quality. Tests were run with a dataset to verify that the tool was created according to the requirements. RESULTS An ontology of 310 instances classified into 37 classes was developed. From these, 4 super classes and 30 relationships were obtained. In the dataset tests, 100% correct and coherent answers were obtained for quality tests (3). CONCLUSIONS The development of this ontology provides a tool for physicians, specialists, and students, among others, that can be incorporated into clinical systems to support decision making regarding essential hypertension. Nevertheless, more instances should be incorporated into the ontology by carrying out further searched in the medical history or free text sections of the medical records of patients with this disease.


Author(s):  
William Van Woensel ◽  
Chad Armstrong ◽  
Malavan Rajaratnam ◽  
Vaibhav Gupta ◽  
Syed Sibte Raza Abidi

Electronic Medical Records (EMRs) are increasingly being deployed at primary points of care and clinics for digital record keeping, increasing productivity and improving communication. In practice, however, there still exists an often incomplete picture of patient profiles, not only because of disconnected EMR systems but also due to incomplete EMR data entry – often caused by clinician time constraints and lack of data entry restrictions. To complete a patient’s partial EMR data, we plausibly infer missing causal associations between medical EMR concepts, such as diagnoses and treatments, for situations that lack sufficient raw data to enable machine learning methods. We follow a knowledge-based approach, where we leverage open medical knowledge sources such as SNOMED-CT and ICD, combined with knowledge-based reasoning with explainable inferences, to infer clinical encounter information from incomplete medical records. To bootstrap this process, we apply a semantic Extract-Transform-Load process to convert an EMR database into an enriched domain-specific Knowledge Graph.


1972 ◽  
Vol 11 (03) ◽  
pp. 152-162 ◽  
Author(s):  
P. GAYNON ◽  
R. L. WONG

With the objective of providing easier access to pathology specimens, slides and kodachromes with linkage to x-ray and the remainder of the patient’s medical records, an automated natural language parsing routine, based on dictionary look-up, was written for Surgical Pathology document-pairs, each consisting of a Request for Examination (authored by clinicians) and its corresponding report (authored by pathologists). These documents were input to the system in free-text English without manual editing or coding.Two types of indices were prepared. The first was an »inverted« file, available for on-line retrieval, for display of the content of the document-pairs, frequency counts of cases or listing of cases in table format. Retrievable items are patient’s and specimen’s identification data, date of operation, name of clinician and pathologist, etc. The English content of the operative procedure, clinical findings and pathologic diagnoses can be retrieved through logical combination of key words. The second type of index was a catalog. Three catalog files — »operation«, »clinical«, and »pathology« — were prepared by alphabetization of lines formed by the rotation of phrases, headed by keywords. These keywords were automatically selected and standardized by the parsing routine and the phrases were extracted from each sentence of each input document. Over 2,500 document-pairs have been entered and are currently being utilized for purpose of medical education.


2018 ◽  
Vol 13 ◽  
pp. 3281-3287
Author(s):  
Ahmed Zamzam ◽  
Alaa A. Qaffas

Without a common format, the financial community has been constantly penalized by the data exchange process. Currently, when financial data are dematerialized, they are exchanged in a multitude of formats: Excel files, free text, PDF, etc. Often not much more suitable for processing than photocopying and limited to certain platforms, these formats prove unsuitable for the exchange of information between programs, for analysis, comparison and presentation of reports to users. So far, despite a strong tendency to use XML syntax, attempts at convergence have struggled to generalize and apply to many sectors and contexts.In recent years, reporting requirements have led to a significant increase in the cost of using financial systems. The use of XBRL technology, using XML syntax, supported by many players in the world of finance, the software industry and adapted to reporting, can constitute the solution. Many governments, regulators, financial institutions are already using XBRL or have pilot projects in place.This document specifically targets XBRL. This technology, which is now recognized by the entire software industry, provides tremendous benefits for data exchange, storage, research and processing.


2020 ◽  
Author(s):  
Julian Sass ◽  
Alexander Bartschke ◽  
Moritz Lehne ◽  
Andrea Essenwanger ◽  
Eugenia Rinaldi ◽  
...  

Background: The current COVID-19 pandemic has led to a surge of research activity. While this research provides important insights, the multitude of studies results in an increasing segmentation of information. To ensure comparability across projects and institutions, standard datasets are needed. Here, we introduce the "German Corona Consensus Dataset" (GECCO), a uniform dataset that uses international terminologies and health IT standards to improve interoperability of COVID-19 data. Methods: Based on previous work (e.g., the ISARIC-WHO COVID-19 case report form) and in coordination with experts from university hospitals, professional associations and research initiatives, data elements relevant for COVID-19 research were collected, prioritized and consolidated into a compact core dataset. The dataset was mapped to international terminologies, and the Fast Healthcare Interoperability Resources (FHIR) standard was used to define interoperable, machine-readable data formats. Results: A core dataset consisting of 81 data elements with 281 response options was defined, including information about, for example, demography, anamnesis, symptoms, therapy, medications or laboratory values of COVID-19 patients. Data elements and response options were mapped to SNOMED CT, LOINC, UCUM, ICD-10-GM and ATC, and FHIR profiles for interoperable data exchange were defined. Conclusion: GECCO provides a compact, interoperable dataset that can help to make COVID-19 research data more comparable across studies and institutions. The dataset will be further refined in the future by adding domain-specific extension modules for more specialized use cases.


2021 ◽  
Author(s):  
Erina Chan ◽  
Serena S Small ◽  
Maeve E Wickham ◽  
Vicki Cheng ◽  
Ellen Balka ◽  
...  

BACKGROUND Existing systems to document adverse drug events often use free text data entry, producing non-standardized, unstructured data prone to misinterpretation. Standardized terminology may improve data quality, but it is unclear which data standard is most appropriate to document adverse drug event symptoms and diagnoses. OBJECTIVE Our objective was to compare the utility, strengths, and weaknesses of different data standards for documenting adverse drug event symptoms and diagnoses. METHODS We performed a mixed-methods sub-study of a multicenter retrospective chart review. We reviewed research records of prospectively diagnosed adverse drug events at 5 Canadian hospitals. Two pharmacy research assistants independently entered symptoms and/or diagnoses for adverse drug events using 4 standards: MedDRA, SNOMED CT, SNOMED Adverse Reaction, and ICD-11. Disagreements between research assistants regarding case-specific utility of data standards were discussed until reaching consensus. We used consensus ratings to determine proportion of adverse drug events covered by a data standard, and coded and analyzed field notes from consensus sessions. RESULTS We reviewed 573 adverse drug events and found MedDRA and ICD-11 had excellent coverage of adverse drug event symptoms and/or diagnoses. While MedDRA had the highest number of matches between the research assistants, ICD-11 had the fewest. SNOMED ADR had the lowest proportion of adverse drug event coverage. Research assistants were most likely to encounter terminological challenges with SNOMED ADR and usability challenges with ICD-11, and least likely with MedDRA. CONCLUSIONS Usability, comprehensiveness, and accuracy are important features of data standards for documenting ADE symptoms and diagnoses. Based on our results, we would recommend the use of MedDRA.


Author(s):  
Apitep Saekow ◽  
Choompol Boonmee

In November 2006, Thai Government announced Thailand electronic government interoperability framework (TH e-GIF) as a collection of technical standards, methodologies, guidelines and policies to enable electronic data exchange across government agencies. The first challenging project was to implement the semantic interoperability for exchanging official electronic letters across 29 government agencies using 15 heterogeneous software systems developed by different vendors. To achieve the project goal, a holistic approach was designed in which many policy-makers and practitioners had to involve in collaborative activities. This chapter explores the approach in details. It includes the process of data harmonization, modeling and standardizations using a number of UN/CEFACT specifications, UMM, CCTS and XML NDR, and other international standards. From this project the first national XML schema standard was produced. This chapter also introduces a methodology of extending the interoperability to legacy systems based on web services technology. Finally, it describes risk managements with the key success factors for the electronic interoperability development in Thailand.


2007 ◽  
Vol 3 ◽  
pp. 117693510700300 ◽  
Author(s):  
Michael Graiser ◽  
Susan G. Moore ◽  
Rochelle Victor ◽  
Ashley Hilliard ◽  
Leroy Hill ◽  
...  

Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.


Sign in / Sign up

Export Citation Format

Share Document