scholarly journals Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models

10.2196/19597 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e19597
Author(s):  
Seungho Jeon ◽  
Jeongeun Seo ◽  
Sukyoung Kim ◽  
Jeongmoon Lee ◽  
Jong-Ho Kim ◽  
...  

Background De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Author(s):  
Seungho Jeon ◽  
Jeongeun Seo ◽  
Sukyoung Kim ◽  
Jeongmoon Lee ◽  
Jong-Ho Kim ◽  
...  

BACKGROUND De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. OBJECTIVE This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. METHODS The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. RESULTS The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. CONCLUSIONS Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.


2020 ◽  
Author(s):  
Seungho Jeon ◽  
Jeongeun Seo ◽  
Sukyoung Kim ◽  
Jeongmoon Lee ◽  
Jongho Kim ◽  
...  

UNSTRUCTURED Common data model (CDM) is a data representation standard that unifies the observational database scheme for each medical institution and allows an analysis using the same tools. Although the analysis for CDM data does not directly examine a medical institution’s original data, it is essential to establish a policy that considers the CDM database operating environment because privacy issues cannot be avoided. The observational medical outcomes partnership common data model (OMOP CDM) defined by Observational Health Data Sciences and Informatics, a nonprofit organization, eliminates most personal information when constructing the database by design principles. When transforming the database of the medical institution to the OMOP CDM structure, the original data “source_value” is maintained to minimize information loss, which may cause the re-identification of the individual. This review presents a de-identification strategy for the original data, which can be considered when operating a CDM database in a public computing environment such as cloud computing. Furthermore, we evaluate the re-identification risk to the CDM database based on the proposed strategy using privacy models such as k-anonymity, l-diversity, and t-closeness. The analysis shows that the CDM database is highly anonymized on average (the highest re-identification record ration is 11.3 %), but every table in the CDM database contains one or more re-identifiable records. It has been confirmed that the risk of re-identification is reduced significantly by applying a de-identification strategy.


Author(s):  
Vlasios K. Dimitriadis ◽  
George I. Gavriilidis ◽  
Pantelis Natsiavas

Information Technology (IT) and specialized systems could have a prominent role towards the support of drug safety processes, both in the clinical context but also beyond that. PVClinical project aims to build an IT platform, enabling the investigation of potential Adverse Drug Reactions (ADRs). In this paper, we outline the utilization of Observational Medical Outcomes Partnership – Common Data Model (OMOP-CDM) and the openly available Observational Health Data Sciences and Informatics (OHDSI) software stack as part of PVClinical platform. OMOP-CDM offers the capacity to integrate data from Electronic Health Records (EHRs) (e.g., encounters, patients, providers, diagnoses, drugs, measurements and procedures) via an accepted data model. Furthermore, the OHDSI software stack provides valuable analytics tools which could be used to address important questions regarding drug safety quickly and efficiently, enabling the investigation of potential ADRs in the clinical environment.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e19358-e19358
Author(s):  
Shohei Burns ◽  
Eric Andrew Collisson

e19358 Background: Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of cancer patients to share their treatment experiences to fuel research, despite potential risks to privacy. The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for dissemination of de-identified clinical and genomic data with a focus on late stage cancer. Methods: We created and piloted a blockchain-authenticated system to enable securely sharing de-identified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHR), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (n = 18), which we uploaded to CGT. EHR data were extracted from both a hospital cancer registry and a common data model format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold-standard source documentation for patients with available data (n = 17). Results: While the total completeness scores were greater for the registry reports than the common data model, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting common data model. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world cancer patient data in a more clinically useful timeframe. We also developed an open-source web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions: Our pilot demonstrates the willingness of cancer patients to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.


2021 ◽  
Author(s):  
Joon-Hyop Lee ◽  
Suhyun Kim ◽  
Kwangsoo Kim ◽  
Young Jun Chai ◽  
Hyeong Won Yu ◽  
...  

BACKGROUND Post-thyroidectomy hypoparathyroidism may result in various transient or permanent symptoms, ranging from tingling sensation to severe breathing difficulties. Its incidence varies among surgeons and institutions, making it difficult to determine its actual incidence and associated factors. OBJECTIVE This study attempted to estimate the incidence of post-operative hypoparathyroidism in patients at two tertiary institutions that share a common data model, the Observational Health Data Sciences and Informatics. METHODS This study used the Common Data Model to extract explicitly specified encoding and relationships among concepts using standardized vocabularies. The EDI-codes of various thyroid disorders and thyroid operations were extracted from two separate tertiary hospitals between January 2013 and December 2018. Patients were grouped into no evidence of/transient/permanent hypoparathyroidism groups to analyze the likelihood of hypoparathyroidism occurrence related to operation types and diagnosis RESULTS Of the 4848 eligible patients at the two institutions who underwent thyroidectomy, 1370 (28.26%) experienced transient hypoparathyroidism and 251 (5.18%) experienced persistent hypoparathyroidism. Univariate logistic regression analysis predicted that, relative to total bilateral thyroidectomy, radical tumor resection was associated with a 48% greater likelihood of transient hypoparathyroidism and a 102% greater likelihood of persistent hypoparathyroidism. Moreover, multivariate logistic analysis found that radical tumor resection was associated with a 50% greater likelihood of transient hypoparathyroidism and a 97% greater likelihood of persistent hypoparathyroidism than total bilateral thyroidectomy. CONCLUSIONS These findings, by integrating and analyzing two databases, suggest that this analysis could be expanded to include other large databases that share the same Observational Health Data Sciences and Informatics protocol.


2019 ◽  
pp. 744-759 ◽  
Author(s):  
Ruchika Asija ◽  
Rajarathnam Nallusamy

Cloud computing is a major technology enabler for providing efficient services at affordable costs by reducing the costs of traditional software and hardware licensing models. As it continues to evolve, it is widely being adopted by healthcare organisations. But hosting healthcare solutions on cloud is challenging in terms of security and privacy of health data. To address these challenges and to provide security and privacy to health data on the cloud, the authors present a Software-as-a-Service (SaaS) application with a data model with built-in security and privacy. This data model enhances security and privacy of the data by attaching security levels in the data itself expressed in the form of XML instead of relying entirely on application level access controls. They also present the performance evaluation of their application using this data model with different scaling indicators. To further investigate the adoption of IT and cloud computing in Indian healthcare industry they have done a survey of some major hospitals in India.


10.2196/16810 ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. e16810 ◽  
Author(s):  
Benjamin Scott Glicksberg ◽  
Shohei Burns ◽  
Rob Currie ◽  
Ann Griffin ◽  
Zhen Jane Wang ◽  
...  

Background Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy. Objective The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer. Methods We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17). Results Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.


2018 ◽  
Vol 7 (3.33) ◽  
pp. 225
Author(s):  
Hee-kyung Moon ◽  
Sung-kook Han ◽  
Chang-ho An

This paper describes Linked Open Data(LOD) development system and its application of medical information standard as Observational Medical Outcomes Partnership(OMOP) Common Data Model(CDM). The OMOP CDM allows for the systematic analysis of disparate observational database in each hospital. This paper describes a LOD instance development system based on SII. It can generate the application-specified instance development system automatically. Therefore, we applied by medical information standard as OMOP CDM to LOD development system. As a result, it was confirmed that there is no problem in applying to the standardization of medical information using the LOD development system.  


2016 ◽  
Vol 6 (3) ◽  
pp. 1-14 ◽  
Author(s):  
Ruchika Asija ◽  
Rajarathnam Nallusamy

Cloud computing is a major technology enabler for providing efficient services at affordable costs by reducing the costs of traditional software and hardware licensing models. As it continues to evolve, it is widely being adopted by healthcare organisations. But hosting healthcare solutions on cloud is challenging in terms of security and privacy of health data. To address these challenges and to provide security and privacy to health data on the cloud, the authors present a Software-as-a-Service (SaaS) application with a data model with built-in security and privacy. This data model enhances security and privacy of the data by attaching security levels in the data itself expressed in the form of XML instead of relying entirely on application level access controls. They also present the performance evaluation of their application using this data model with different scaling indicators. To further investigate the adoption of IT and cloud computing in Indian healthcare industry they have done a survey of some major hospitals in India.


2015 ◽  
Vol 06 (03) ◽  
pp. 536-547 ◽  
Author(s):  
F.S. Resnic ◽  
S.L. Robbins ◽  
J. Denton ◽  
L. Nookala ◽  
D. Meeker ◽  
...  

SummaryBackground: Adoption of a common data model across health systems is a key infrastructure requirement to allow large scale distributed comparative effectiveness analyses. There are a growing number of common data models (CDM), such as Mini-Sentinel, and the Observational Medical Outcomes Partnership (OMOP) CDMs.Objective: In this case study, we describe the challenges and opportunities of a study specific use of the OMOP CDM by two health systems and describe three comparative effectiveness use cases developed from the CDM.Methods: The project transformed two health system databases (using crosswalks provided) into the OMOP CDM. Cohorts were developed from the transformed CDMs for three comparative effectiveness use case examples. Administrative/billing, demographic, order history, medication, and laboratory were included in the CDM transformation and cohort development rules.Results: Record counts per person month are presented for the eligible cohorts, highlighting differences between the civilian and federal datasets, e.g. the federal data set had more outpatient visits per person month (6.44 vs. 2.05 per person month). The count of medications per person month reflected the fact that one system‘s medications were extracted from orders while the other system had pharmacy fills and medication administration records. The federal system also had a higher prevalence of the conditions in all three use cases. Both systems required manual coding of some types of data to convert to the CDM.Conclusion: The data transformation to the CDM was time consuming and resources required were substantial, beyond requirements for collecting native source data. The need to manually code subsets of data limited the conversion. However, once the native data was converted to the CDM, both systems were then able to use the same queries to identify cohorts. Thus, the CDM minimized the effort to develop cohorts and analyze the results across the sites.FitzHenry F, Resnic FS, Robbins SL, Denton J, Nookala L, Meeker D, Ohno-Machado L, Matheny ME. A Case Report on Creating a Common Data Model for Comparative Effectiveness with the Observational Medical Outcomes Partnership. Appl Clin Inform 2015; 6: 536–547http://dx.doi.org/10.4338/ACI-2014-12-CR-0121


Sign in / Sign up

Export Citation Format

Share Document